-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stats): Backend charts & Migration: total_txns and total_addresses #1144
Conversation
d9f20c6
to
5375d8e
Compare
656e22b
to
cd268c8
Compare
5375d8e
to
bdd688c
Compare
f8a3b8b
to
d3060b3
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe changes in this pull request primarily involve updates to configuration files and structural modifications across various modules related to transaction and address metrics. The The test for indexing status has been modified to validate multiple counter IDs rather than checking for a specific counter. In the Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
stats/stats/src/charts/counters/total_txns.rs (1)
77-78
: Handle potential integer conversion errors explicitlyUsing
unwrap_or(0)
when converting withu64::try_from(n)
may silently suppress conversion errors. It's safer to handle possibleTryFromIntError
to prevent unintended data loss.Consider this modification to handle conversion errors:
- .map(|n| u64::try_from(n).unwrap_or(0)) + .map(|n| u64::try_from(n).unwrap_or_else(|e| { + // Log the error and handle accordingly + log::error!("Integer conversion error: {}", e); + 0 + }))stats/stats/src/tests/simple_test.rs (1)
337-370
: Consider adding more specific assertions.While the implementation is correct, the assertion could be more specific than just checking for non-zero values.
Consider adding assertions for:
- Expected value ranges
- Specific error cases
- Edge cases with empty or minimal data
Example enhancement:
- assert_ne!("0", data.value); + let value: i64 = data.value.parse().unwrap(); + assert!(value > 0, "Expected positive value"); + assert!(value <= expected_max, "Value exceeds expected maximum");stats/stats/src/charts/counters/total_addresses.rs (1)
66-82
: Suggestion: Handle Negative Estimated Table Rows ExplicitlyIn the
estimate
method ofTotalAddressesEstimation
, negative values for the estimated table rows are converted to0
usingunwrap_or(0)
. This could silently mask underlying issues if negative estimates occur. Consider handling negative values explicitly, perhaps by logging a warning or returning an error, to ensure any anomalies are not overlooked.Apply this diff to handle negative values explicitly:
.map_err(ChartError::BlockscoutDB)? - .map(|n| u64::try_from(n).unwrap_or(0)) + .and_then(|n| { + if n >= 0 { + u64::try_from(n).ok() + } else { + // Handle the negative value appropriately, e.g., log a warning + None + } + }) .unwrap_or(0);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
stats/config/update_groups.json
(1 hunks)stats/stats-server/src/runtime_setup.rs
(1 hunks)stats/stats-server/tests/it/indexing_status.rs
(2 hunks)stats/stats/src/charts/counters/total_addresses.rs
(2 hunks)stats/stats/src/charts/counters/total_blocks.rs
(2 hunks)stats/stats/src/charts/counters/total_txns.rs
(2 hunks)stats/stats/src/charts/lines/new_txns.rs
(1 hunks)stats/stats/src/tests/simple_test.rs
(2 hunks)stats/stats/src/update_groups.rs
(1 hunks)
🔇 Additional comments (14)
stats/stats/src/charts/counters/total_txns.rs (2)
27-35
: Ensure consistent error handling for database queries
In the query_data
method of TotalTxnsQueryBehaviour
, database query errors are mapped to ChartError::BlockscoutDB
. Verify that this error mapping provides sufficient context for debugging, and consider enriching the error information for better traceability.
104-105
: Verify updated expected value in test update_total_txns
The expected value in the test has changed from "0"
to "48"
. Confirm that this reflects the actual data in the test environment and that the change is intentional.
stats/config/update_groups.json (2)
6-6
: Review the increased frequency of total_addresses_group
updates
The total_addresses_group
schedule has changed to "0 0,30 * * * * *"
, meaning it runs every 30 minutes. Ensure that this increased frequency is necessary and won't impact system performance.
9-9
: Confirm the new schedule for total_txns_group
A new schedule "0 5 */2 * * * *"
is added for total_txns_group
, running every two hours at the 5-minute mark. Verify that this timing aligns with the desired data freshness requirements.
stats/stats-server/tests/it/indexing_status.rs (1)
89-95
: Update test to assert all expected counters are present
The test now collects counter IDs and asserts they include "totalAddresses"
, "totalBlocks"
, and "totalTxns"
. Ensure that these are the correct counters expected in this context and that no additional counters should be included.
stats/stats/src/update_groups.rs (1)
26-26
: Verify the addition of TotalTxns
to singleton_groups!
Adding TotalTxns
to singleton_groups!
changes its update behavior to be independent. Confirm that this is intentional and that it doesn't adversely affect dependencies or related update sequences.
stats/stats/src/charts/lines/new_txns.rs (2)
38-40
: LGTM! Well-documented changes.
The comments clearly explain the rationale for removing the timestamp filter:
- Enables use of the
transactions_block_consensus_index
for better performance - Includes genesis transactions in the count for more accurate metrics
Line range hint 41-57
: Verify index usage in both query paths.
The SQL queries look correct for both denormalized and non-denormalized paths. However, let's verify that the queries are using the intended indexes efficiently.
Also applies to: 58-74
✅ Verification successful
Both query paths are properly indexed
The queries in both denormalized and non-denormalized paths will use appropriate indexes:
-
Denormalized path: The query will use
transactions_block_consensus_index
for theWHERE t.block_consensus = true
condition andtransactions_block_timestamp_index
for the date range filtering and grouping. -
Non-denormalized path: The query will use the join between transactions and blocks efficiently through
transactions_block_hash_index_index
which is a unique index on (block_hash, index).
The schema shows well-designed indexing strategy that supports both query paths efficiently.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify index usage in both query paths
# Expected: EXPLAIN should show usage of transactions_block_consensus_index
# Test denormalized path
echo "Testing denormalized query path:"
echo "EXPLAIN ANALYZE
SELECT
date(t.block_timestamp) as date,
COUNT(*)::TEXT as value
FROM transactions t
WHERE
t.block_consensus = true
GROUP BY date;" | psql $DATABASE_URL
# Test non-denormalized path
echo "Testing non-denormalized query path:"
echo "EXPLAIN ANALYZE
SELECT
date(b.timestamp) as date,
COUNT(*)::TEXT as value
FROM transactions t
JOIN blocks b ON t.block_hash = b.hash
WHERE
b.consensus = true
GROUP BY date;" | psql $DATABASE_URL
Length of output: 786
Script:
#!/bin/bash
# Let's check for any index definitions in the schema files or migrations
fd -e sql | xargs rg -l "CREATE.*INDEX"
# Also check for any index-related configuration or documentation
rg -i "index" -g "!*.lock" -g "!*.svg" -g "!*.json" -A 2 -B 2
Length of output: 69781
Script:
#!/bin/bash
# Let's check for any index definitions in the migration files specifically for transactions table
fd -e sql | xargs rg -l "CREATE.*INDEX.*transactions"
# Also check for any index-related configuration in the schema files
fd -e sql | xargs rg "transactions.*index|index.*transactions" -A 2 -B 2
Length of output: 16224
stats/stats/src/charts/counters/total_blocks.rs (2)
106-108
: LGTM! Clean import organization.
The imports are well-organized and properly scoped for the test module.
Also applies to: 113-114
221-221
: LGTM! Good test refactoring.
The test has been simplified by using the new test_counter_fallback
helper function, improving maintainability and reducing code duplication.
stats/stats/src/tests/simple_test.rs (2)
19-19
: LGTM! Required imports added.
The new imports support the database operations needed in the test helper function.
334-336
: LGTM! Clear documentation.
The function's purpose and prerequisites are well-documented.
stats/stats-server/src/runtime_setup.rs (1)
263-263
: Verify configuration setup for TotalTxnsGroup
Ensure that the necessary configuration is properly set up for this new group.
Let's verify the configuration:
✅ Verification successful
Configuration is properly set up for TotalTxnsGroup
The necessary configuration is correctly set up:
- Schedule configuration in update_groups.json:
"total_txns_group": "0 5 */2 * * * *"
- Charts configuration in charts.json includes the counter definition with title and description
- Group is properly integrated in the runtime setup alongside other metric groups
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify TotalTxnsGroup configuration exists in relevant files
# Check update_groups.json for the schedule configuration
rg -l "total_txns_group" --type json
# Check charts.json for chart configuration
rg -l "total_txns" --type json
# Check for any TODO or FIXME comments related to the implementation
rg -i "todo.*total.?txns|fixme.*total.?txns"
Length of output: 513
Script:
#!/bin/bash
# Let's check the actual configuration content for TotalTxnsGroup
# Check update_groups.json configuration
echo "=== Update Groups Configuration ==="
cat stats/config/update_groups.json | jq '.[] | select(.name | contains("total_txns"))'
# Check charts.json configuration
echo -e "\n=== Charts Configuration ==="
cat stats/config/charts.json | jq '.[] | select(.name | contains("total_txns"))'
# Check layout.json configuration
echo -e "\n=== Layout Configuration ==="
cat stats/config/layout.json | jq '.[] | select(.name | contains("total_txns"))'
# Check the implementation of TotalTxnsGroup
echo -e "\n=== TotalTxnsGroup Implementation ==="
rg -A 10 "struct TotalTxnsGroup"
Length of output: 880
Script:
#!/bin/bash
# Let's try a different approach to check the configurations
# Check raw content of update_groups.json
echo "=== Update Groups Configuration ==="
cat stats/config/update_groups.json
# Check raw content of charts.json
echo -e "\n=== Charts Configuration ==="
cat stats/config/charts.json
# Check TotalTxnsGroup implementation with more context
echo -e "\n=== TotalTxnsGroup Implementation ==="
rg -B 5 -A 15 "TotalTxnsGroup" --type rust
Length of output: 12213
stats/stats/src/charts/counters/total_addresses.rs (1)
64-88
: Implementation of TotalAddressesEstimation
Looks Good
The TotalAddressesEstimation
struct and its estimate
method are correctly implemented. The use of query_estimated_table_rows
aligns with the intended functionality, and the overall integration with DirectPointLocalDbChartSourceWithEstimate
is appropriate.
d3060b3
to
864c804
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
stats/stats/src/charts/counters/total_txns.rs (1)
104-104
: Consider making the test value configurable.The hardcoded expected value "48" in the test could make the test brittle. Consider making this value configurable or deriving it based on test data setup.
- simple_test_counter::<TotalTxns>("update_total_txns", "48", None).await; + let expected_count = std::env::var("TEST_TOTAL_TXNS_COUNT") + .unwrap_or_else(|_| "48".to_string()); + simple_test_counter::<TotalTxns>("update_total_txns", &expected_count, None).await;stats/stats/src/charts/counters/total_addresses.rs (1)
64-82
: Add documentation for the estimation implementationThe implementation looks solid, but would benefit from documentation explaining:
- The estimation approach used
- Why fallback to 0 is safe
- Expected accuracy of the estimation
Add documentation like this:
+/// Provides estimation capabilities for total addresses count +/// Uses table statistics for quick approximation when exact count is not required pub struct TotalAddressesEstimation; +/// Implements estimation logic for total addresses +/// Falls back to 0 if estimation fails to ensure system stability impl ValueEstimation for TotalAddressesEstimation {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
stats/config/update_groups.json
(1 hunks)stats/stats-server/src/runtime_setup.rs
(1 hunks)stats/stats-server/tests/it/indexing_status.rs
(2 hunks)stats/stats/src/charts/counters/total_addresses.rs
(2 hunks)stats/stats/src/charts/counters/total_blocks.rs
(2 hunks)stats/stats/src/charts/counters/total_txns.rs
(2 hunks)stats/stats/src/charts/lines/new_txns.rs
(1 hunks)stats/stats/src/tests/simple_test.rs
(2 hunks)stats/stats/src/update_groups.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- stats/stats-server/src/runtime_setup.rs
- stats/stats/src/update_groups.rs
- stats/stats/src/charts/lines/new_txns.rs
- stats/stats/src/tests/simple_test.rs
- stats/stats-server/tests/it/indexing_status.rs
- stats/config/update_groups.json
- stats/stats/src/charts/counters/total_blocks.rs
🔇 Additional comments (7)
stats/stats/src/charts/counters/total_txns.rs (3)
2-13
: LGTM! Imports are well-organized and necessary.
The imports are logically grouped and all appear to be used within the implementation.
26-42
: Align timespan usage between query and estimation methods.
The query_data
method uses cx.time
while the estimate
method uses Utc::now()
. For consistency and to prevent discrepancies, consider using the same time source in both methods.
86-94
: Verify the status of referenced issue #845.
The comment indicates that this implementation is temporary until #845 is resolved. Let's verify the status of this issue.
stats/stats/src/charts/counters/total_addresses.rs (4)
2-14
: LGTM: Imports are well-organized and support the new estimation functionality
The new imports properly support the added estimation capabilities while maintaining good organization.
84-88
: LGTM: Type definition properly integrates estimation capability
The updated type definition correctly combines remote data source, estimation, and properties while maintaining type safety.
107-107
:
Fix Generic Type in Test Function
The generic type parameter is incorrectly set to TotalTxns
. It should be TotalAddresses
to correctly test the fallback for total addresses.
Apply this diff to correct the generic type:
- test_counter_fallback::<TotalTxns>("total_addresses_fallback").await;
+ test_counter_fallback::<TotalAddresses>("total_addresses_fallback").await;
69-72
: Verify database performance impact
The use of query_estimated_table_rows
relies on table statistics. Let's verify the query execution plan to ensure it's using statistics efficiently.
✅ Verification successful
Database performance impact is minimal
The implementation of query_estimated_table_rows
uses PostgreSQL's internal statistics tables to get a quick row count estimate. The query:
- Uses
pg_class
catalog table which contains table statistics - Calculates the estimate based on table pages and tuples per page
- Does not perform any table scans or heavy operations
- Has test coverage verifying the estimation accuracy
The implementation is safe for production use as it:
- Relies on statistics that PostgreSQL maintains automatically
- Uses a lightweight catalog query instead of COUNT(*)
- Has error handling for cases when statistics are not available
- Is validated through unit tests showing acceptable estimation ranges
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the addresses table has up-to-date statistics
# and verify the query plan for row estimation
# Search for table statistics maintenance
rg -A 5 "ANALYZE.*addresses"
# Search for any custom statistics settings
rg -A 5 "ALTER TABLE.*addresses.*SET.*(STATISTICS|AUTOVACUUM)"
Length of output: 95
Script:
#!/bin/bash
# Let's check the implementation of query_estimated_table_rows function
# and any related database query execution code
# Search for the query_estimated_table_rows implementation
ast-grep --pattern 'fn query_estimated_table_rows($$$)'
# Search for any database configuration or statistics related code
rg -l "statistics|analyze|vacuum|autovacuum" --type rust
# Look for any SQL query execution related code
rg -l "execute|query_one|query_raw" --type rust
Length of output: 13208
Script:
#!/bin/bash
# Let's examine the implementation details of the estimation query
# Check the implementation in utils.rs which likely contains query_estimated_table_rows
cat stats/stats/src/utils.rs
# Check the database interaction code
cat stats/stats/src/charts/db_interaction/read.rs
Length of output: 53094
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
stats/stats/src/charts/counters/total_addresses.rs (2)
64-84
: Add documentation for the estimation logicThe
TotalAddressesEstimation
implementation would benefit from documentation explaining:
- The estimation approach used
- Why estimated counts are acceptable vs exact counts
- Performance implications
Add documentation like this:
+/// Provides estimated total address counts using table statistics +/// This approach is more performant than exact counting for large tables +/// as it uses PostgreSQL's internal statistics pub struct TotalAddressesEstimation;
77-78
: Consider logging overflow casesThe silent handling of conversion overflow with
unwrap_or(0)
could hide issues with very large address counts.Consider adding logging:
- .map(|n| u64::try_from(n).unwrap_or(0)) - .unwrap_or(0); + .map(|n| match u64::try_from(n) { + Ok(val) => val, + Err(e) => { + tracing::warn!("Address count overflow: {}", e); + 0 + } + }) + .unwrap_or_else(|| { + tracing::warn!("No estimate available, defaulting to 0"); + 0 + });stats/stats/src/charts/counters/total_txns.rs (3)
82-84
: Enhance comment about timestamp usage.While the comment explains why
now()
is used instead ofcx.time
, it could be more detailed about potential implications.Consider expanding the comment:
- // `now()` is more relevant when taken right before the query rather than - // `cx.time` measured a bit earlier. + // Using `now()` instead of `cx.time` provides a more accurate timestamp + // for the estimation, as it's captured immediately before the query. + // This helps minimize any time drift between the timestamp and the + // actual data state, especially important for high-frequency updates.
100-107
: Enhance TODO comment with more context.The TODO comment about issue #845 could provide more context about the implications and future plans.
Consider expanding the comment:
-// We will need it to update on not fully indexed data soon, therefore this counter is -// separated from `NewTxns`. -// -// Separate query not reliant on previous computation helps this counter to work in such -// environments. -// -// todo: make it dependant again if #845 is resolved +// This counter is currently separated from `NewTxns` to support updates on +// partially indexed data. The separation allows for independent querying +// without relying on previous computations, making it more resilient in +// environments with incomplete data. +// +// The separate implementation comes with a trade-off of potentially +// duplicated logic and maintenance overhead. Once issue #845 is resolved, +// we should consolidate this with `NewTxns` to improve code maintainability +// and reduce duplication. +// +// TODO(#845): Consolidate with `NewTxns` once indexing improvements are implemented
118-118
: Avoid hardcoded test values.The test uses a hardcoded expected value of "48" which might make the test brittle.
Consider making the test more robust:
- Use a test database with known state
- Calculate the expected value dynamically based on test data
- Add comments explaining the expected value's origin
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
stats/stats/src/charts/counters/total_addresses.rs
(2 hunks)stats/stats/src/charts/counters/total_blocks.rs
(3 hunks)stats/stats/src/charts/counters/total_txns.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- stats/stats/src/charts/counters/total_blocks.rs
🔇 Additional comments (5)
stats/stats/src/charts/counters/total_addresses.rs (4)
2-14
: LGTM: Import changes are well-organized and necessary
The new imports support the estimation functionality and database interactions while following Rust's best practices.
86-90
: LGTM: Type definition properly integrates estimation capability
The updated type definition correctly incorporates the estimation functionality while maintaining type safety.
103-107
: LGTM: Test implementation correctly uses TotalAddresses type
The fallback test is properly implemented with the correct generic type parameter, addressing the issue raised in previous reviews.
71-74
: Verify table name consistency
Let's ensure the table name used in the estimation query matches the entity definition.
✅ Verification successful
Let me verify the specific entity being used in the code by checking the imports and entity definition.
Let me check the specific file content to verify the imports.
Table name consistency verified
The table name is consistent across the codebase. The code correctly uses addresses::Entity.table_name()
where:
- The import
use blockscout_db::entity::addresses;
references the entity defined inlibs/blockscout-db/entity/src/addresses.rs
- The entity definition in
libs/blockscout-db/entity/src/addresses.rs
correctly specifies#[sea_orm(table_name = "addresses")]
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistency of addresses table name across the codebase
# Check for table name references in entity definitions and migrations
rg -A 2 "Table::create\(\)" | grep -i "addresses"
rg "table_name.*addresses"
Length of output: 850
Script:
#!/bin/bash
# Check imports and entity usage in the file
rg -A 5 "use.*addresses::Entity" stats/stats/src/charts/counters/total_addresses.rs
# Verify the actual entity definition
ast-grep --pattern 'use $$$addresses::Entity$$$'
Length of output: 26545
Script:
#!/bin/bash
# Check the content of the file to verify imports and entity usage
cat stats/stats/src/charts/counters/total_addresses.rs
Length of output: 3218
stats/stats/src/charts/counters/total_txns.rs (1)
2-21
: LGTM: Imports are well-organized and necessary.
The imports are properly structured and all are being utilized in the implementation.
#1080
Total transactions sources:
Total addresses:
Summary by CodeRabbit
New Features
total_addresses_group
to run at the start of the hour and 30-minute mark.total_txns_group
to execute every two hours at the 5-minute mark.TotalTxnsGroup
to available update groups.Bug Fixes
Tests
total_txns_fallback
and modified existing tests for better validation of counter behaviors.Chores
TotalTxns
to improve organization.