Skip to content

chore: optimize get last transactions action#1229

Merged
MicBun merged 1 commit intomainfrom
optimizeListTransaction
Oct 25, 2025
Merged

chore: optimize get last transactions action#1229
MicBun merged 1 commit intomainfrom
optimizeListTransaction

Conversation

@MicBun
Copy link
Member

@MicBun MicBun commented Oct 24, 2025

resolves: https://github.com/trufnetwork/indexer/issues/21

Summary by CodeRabbit

  • Chores
    • Improved database retrieval for latest write activity by limiting results per source and applying an overall result cap to reduce resource use.
    • Maintained existing ranking and filtering while ensuring returned rows are ordered by most recent activity for more consistent, performant queries.

@MicBun MicBun requested a review from outerlook October 24, 2025 17:04
@MicBun MicBun self-assigned this Oct 24, 2025
@holdex
Copy link

holdex bot commented Oct 24, 2025

Time Submission Status

Member Status Time Action Last Update
MicBun ✅ Submitted 4h Update time Oct 24, 2025, 6:47 PM
@outerlook ❌ Missing - ⚠️ Submit time -

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

The migration rewrites the query to wrap each source table in a derived subquery that orders by created_at DESC and applies an inner LIMIT $limit_size, then UNIONs those limited sets, ranks results, and applies a final ORDER BY created_at DESC with an outer LIMIT $limit_size. (49 words)

Changes

Cohort / File(s) Summary
Query Optimization
internal/migrations/010-get-latest-write-activity.sql
Each source SELECT is now a derived table: ORDER BY created_at DESC + inner LIMIT $limit_size, then UNION ALL of those limited subsets, followed by ROW_NUMBER ranking and a final ORDER BY created_at DESC with outer LIMIT $limit_size. Minor comments added.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Migration
  participant DB as Database
  rect #E8F6FF
    Note over Client,DB: 1) Per-table derived queries (parallel logical steps)
  end
  Client->>DB: SELECT ... FROM (SELECT ... FROM streams ORDER BY created_at DESC LIMIT $limit_size) AS s
  Client->>DB: UNION ALL (SELECT ... FROM primitive_events ORDER BY created_at DESC LIMIT $limit_size) AS e
  Client->>DB: UNION ALL (other per-table limited subqueries...)
  rect #FFF4E6
    Note over DB: 2) Combine UNION ALL results, apply ROW_NUMBER() OVER (ORDER BY created_at DESC)
  end
  DB->>DB: Rank rows, ORDER BY created_at DESC, APPLY outer LIMIT $limit_size
  DB-->>Client: Limited, ranked result set
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop through ordered rows at night,
I snip each table to the recent sight,
I stitch them all and pick the best,
A bounded harvest, tidy and blessed. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "chore: optimize get last transactions action" is clear, specific, and directly related to the changeset. The raw summary indicates that the modifications to the SQL migration file involve optimization strategies such as per-table limiting, ORDER BY refinements, and outer query constraints. The title accurately captures the essence of these changes—they are performance optimizations applied to a transaction-fetching mechanism. The terminology is concise and readable, avoiding vague or generic language. The title appropriately highlights the primary change from a developer's perspective.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch optimizeListTransaction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3b21de and a6e1434.

📒 Files selected for processing (1)
  • internal/migrations/010-get-latest-write-activity.sql (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: acceptance-test
🔇 Additional comments (1)
internal/migrations/010-get-latest-write-activity.sql (1)

20-72: Verify that result deduplication doesn't violate API contract.

The per-table LIMIT optimization is sound and should improve query performance. However, the PARTITION BY created_at logic combined with WHERE rn = 1 deduplicates events at identical timestamps, keeping only one method per timestamp (based on priority). This means the query may return fewer than $limit_size rows if multiple events share the same created_at value.

For example, if all 6 events in the result set have the same timestamp, only 1 row would be returned instead of 6. Verify that this behavior aligns with the API contract and caller expectations.

To confirm the deduplication impact, run a query against production data to check:

  • How frequently do events share identical created_at timestamps?
  • What is the typical result count vs. $limit_size for representative data providers?

Does the caller expecting exactly $limit_size rows, or is the actual result count acceptable?

@MicBun MicBun force-pushed the optimizeListTransaction branch from a6e1434 to a2f912c Compare October 24, 2025 17:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/migrations/010-get-latest-write-activity.sql (1)

47-54: [Duplicate] Apply missing index on taxonomies for performance.

A previous review flagged that the taxonomies table lacks an index on created_at DESC, which will force a full table scan for the ORDER BY t.created_at DESC LIMIT $limit_size operation in this subquery (lines 47–54).

Create the index in a new migration file (if not already addressed separately):

CREATE INDEX IF NOT EXISTS tax_created_at_idx ON taxonomies(created_at DESC);

Verify this index creation is included in the PR, either in this migration or a separate one. Also confirm similar indexes exist for the other source tables (streams, primitive_events, metadata) to support their respective per-table LIMIT operations.

🧹 Nitpick comments (1)
internal/migrations/010-get-latest-write-activity.sql (1)

20-21: Improve clarity of comments to document the ROW_NUMBER filtering.

The added comments (lines 20–21) explain the per-table LIMIT optimization but do not document the ROW_NUMBER() ... WHERE rn = 1 deduplication logic. Expand the comments to explain:

  • Why DISTINCT is used on created_at
  • Why ROW_NUMBER() partitions by created_at and filters to priority = 1
  • What the impact is on the returned result set

Example:

-- Fetch up to limit_size DISTINCT blocks per table ordered by created_at DESC.
-- For each unique created_at, keep only the method with lowest priority (deployStream).
-- Then take the top limit_size results overall.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6e1434 and a2f912c.

📒 Files selected for processing (1)
  • internal/migrations/010-get-latest-write-activity.sql (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: acceptance-test
🔇 Additional comments (2)
internal/migrations/010-get-latest-write-activity.sql (2)

26-32: Verify that DISTINCT on created_at preserves expected result behavior.

Each per-table subquery uses SELECT DISTINCT s.created_at (and similarly for other tables), which removes duplicate timestamps within each table before applying the LIMIT. This means if multiple records exist at the same timestamp, only one row per timestamp is returned per table.

Clarify whether this is intentional or if you meant to select all columns and apply DISTINCT differently. If the goal is to return the most recent activity across all records (not just distinct timestamps), consider whether DISTINCT is the right approach here.

Also applies to: 36-43, 47-54, 58-65


23-23: Significant behavioral change: ROW_NUMBER() filtering reduces output set.

The ROW_NUMBER() OVER (PARTITION BY created_at ORDER BY priority ASC) followed by WHERE rn = 1 filters results so that for each unique created_at, only the method with the lowest priority value (deployStream = 1) is retained. This means concurrent methods at the same timestamp are silently dropped.

Verify this filtering is intentional and document it clearly in comments, as it represents a material change in what transactions are returned. If all methods should be returned per timestamp, remove the WHERE rn = 1 filter.

Also applies to: 68-68

@MicBun MicBun requested review from outerlook and removed request for outerlook October 24, 2025 18:04
@MicBun MicBun merged commit 748bb20 into main Oct 25, 2025
6 of 7 checks passed
@MicBun MicBun deleted the optimizeListTransaction branch October 25, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants