chore: optimize get last transactions action#1229
Conversation
Time Submission Status
|
WalkthroughThe migration rewrites the query to wrap each source table in a derived subquery that orders by Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Migration
participant DB as Database
rect #E8F6FF
Note over Client,DB: 1) Per-table derived queries (parallel logical steps)
end
Client->>DB: SELECT ... FROM (SELECT ... FROM streams ORDER BY created_at DESC LIMIT $limit_size) AS s
Client->>DB: UNION ALL (SELECT ... FROM primitive_events ORDER BY created_at DESC LIMIT $limit_size) AS e
Client->>DB: UNION ALL (other per-table limited subqueries...)
rect #FFF4E6
Note over DB: 2) Combine UNION ALL results, apply ROW_NUMBER() OVER (ORDER BY created_at DESC)
end
DB->>DB: Rank rows, ORDER BY created_at DESC, APPLY outer LIMIT $limit_size
DB-->>Client: Limited, ranked result set
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/migrations/010-get-latest-write-activity.sql(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: acceptance-test
🔇 Additional comments (1)
internal/migrations/010-get-latest-write-activity.sql (1)
20-72: Verify that result deduplication doesn't violate API contract.The per-table LIMIT optimization is sound and should improve query performance. However, the
PARTITION BY created_atlogic combined withWHERE rn = 1deduplicates events at identical timestamps, keeping only one method per timestamp (based on priority). This means the query may return fewer than$limit_sizerows if multiple events share the samecreated_atvalue.For example, if all 6 events in the result set have the same timestamp, only 1 row would be returned instead of 6. Verify that this behavior aligns with the API contract and caller expectations.
To confirm the deduplication impact, run a query against production data to check:
- How frequently do events share identical
created_attimestamps?- What is the typical result count vs.
$limit_sizefor representative data providers?Does the caller expecting exactly
$limit_sizerows, or is the actual result count acceptable?
a6e1434 to
a2f912c
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
internal/migrations/010-get-latest-write-activity.sql (1)
47-54: [Duplicate] Apply missing index on taxonomies for performance.A previous review flagged that the
taxonomiestable lacks an index oncreated_at DESC, which will force a full table scan for theORDER BY t.created_at DESC LIMIT $limit_sizeoperation in this subquery (lines 47–54).Create the index in a new migration file (if not already addressed separately):
CREATE INDEX IF NOT EXISTS tax_created_at_idx ON taxonomies(created_at DESC);Verify this index creation is included in the PR, either in this migration or a separate one. Also confirm similar indexes exist for the other source tables (
streams,primitive_events,metadata) to support their respective per-table LIMIT operations.
🧹 Nitpick comments (1)
internal/migrations/010-get-latest-write-activity.sql (1)
20-21: Improve clarity of comments to document the ROW_NUMBER filtering.The added comments (lines 20–21) explain the per-table LIMIT optimization but do not document the
ROW_NUMBER() ... WHERE rn = 1deduplication logic. Expand the comments to explain:
- Why DISTINCT is used on created_at
- Why ROW_NUMBER() partitions by created_at and filters to priority = 1
- What the impact is on the returned result set
Example:
-- Fetch up to limit_size DISTINCT blocks per table ordered by created_at DESC. -- For each unique created_at, keep only the method with lowest priority (deployStream). -- Then take the top limit_size results overall.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/migrations/010-get-latest-write-activity.sql(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: acceptance-test
🔇 Additional comments (2)
internal/migrations/010-get-latest-write-activity.sql (2)
26-32: Verify that DISTINCT on created_at preserves expected result behavior.Each per-table subquery uses
SELECT DISTINCT s.created_at(and similarly for other tables), which removes duplicate timestamps within each table before applying the LIMIT. This means if multiple records exist at the same timestamp, only one row per timestamp is returned per table.Clarify whether this is intentional or if you meant to select all columns and apply DISTINCT differently. If the goal is to return the most recent activity across all records (not just distinct timestamps), consider whether
DISTINCTis the right approach here.Also applies to: 36-43, 47-54, 58-65
23-23: Significant behavioral change: ROW_NUMBER() filtering reduces output set.The
ROW_NUMBER() OVER (PARTITION BY created_at ORDER BY priority ASC)followed byWHERE rn = 1filters results so that for each uniquecreated_at, only the method with the lowest priority value (deployStream = 1) is retained. This means concurrent methods at the same timestamp are silently dropped.Verify this filtering is intentional and document it clearly in comments, as it represents a material change in what transactions are returned. If all methods should be returned per timestamp, remove the
WHERE rn = 1filter.Also applies to: 68-68
resolves: https://github.com/trufnetwork/indexer/issues/21
Summary by CodeRabbit