fix: digest failing to prune pending days due to nonce errors by MicBun · Pull Request #1188 · trufnetwork/node

MicBun · 2025-10-02T07:36:11Z

Resolves issue where digest operations failed to complete, leaving 596K+ pending prune days unprocessed. The digest scheduler halted midway through processing due to transaction nonce collisions during retries.

Problem:

Digest stopped processing on Sept 30, leaving days 20353-20361+ pending
596,375 days accumulated in pending_prune_days table
Root cause: broadcast timeout → retry used same nonce → "transaction already exists"
Concurrent transactions from same account caused nonce conflicts
System required manual intervention to resume

Solution:
Implemented stateless retry logic that always refetches fresh nonce from database on each attempt. This automatically handles:

Network timeouts
Concurrent transaction activity
Nonce gaps and database state changes

Implementation:

Added BroadcastAutoDigestWithArgsAndRetry() with exponential backoff (5s → 60s max)
Retry on error
Fresh nonce query before each broadcast attempt
Context-aware cancellation support
Maximum 3 retries per digest run

Testing:

Added 6 unit tests covering retry scenarios
Verified fresh nonce refetch on each attempt
Tests for timeout, cancellation, max retries, and transaction failures
Added build tag //go:build kwiltest to integration test
All 19 tests passing

Files Changed:

extensions/tn_digest/internal/engine_ops.go - Core retry logic
extensions/tn_digest/scheduler/scheduler.go - Scheduler integration
extensions/tn_digest/internal/engine_ops_test.go - Comprehensive test coverage
extensions/tn_digest/engine_ops_integration_test.go - Build tag fix

This ensures digest operations continue reliably even during network congestion or concurrent transaction activity, eliminating the nonce collision failure mode observed in production.

resolves: https://github.com/trufnetwork/truf-network/issues/1241

Summary by CodeRabbit

New Features
- Auto-digest broadcasts now retry on failure with exponential backoff and a fresh nonce each attempt, respecting cancellation. Scheduler uses the retry-enabled flow with a fixed retry limit and clearer post-retry error messages for improved robustness.
Tests
- Added comprehensive tests covering immediate success, retry paths, max-retry handling, context cancellation, nonce management, and failure scenarios. Test file now requires an opt-in build tag for integration runs.

Resolves issue where digest operations failed to complete, leaving 596K+ pending prune days unprocessed. The digest scheduler halted midway through processing due to transaction nonce collisions during retries. **Problem:** - Digest stopped processing on Sept 30, leaving days 20353-20361+ pending - 596,375 days accumulated in pending_prune_days table - Root cause: broadcast timeout → retry used same nonce → "transaction already exists" - Concurrent transactions from same account caused nonce conflicts - System required manual intervention to resume **Solution:** Implemented stateless retry logic that always refetches fresh nonce from database on each attempt. This automatically handles: - Network timeouts - Concurrent transaction activity - Nonce gaps and database state changes **Implementation:** - Added `BroadcastAutoDigestWithArgsAndRetry()` with exponential backoff (5s → 60s max) - Retry on error - Fresh nonce query before each broadcast attempt - Context-aware cancellation support - Maximum 3 retries per digest run **Testing:** - Added 6 unit tests covering retry scenarios - Verified fresh nonce refetch on each attempt - Tests for timeout, cancellation, max retries, and transaction failures - Added build tag `//go:build kwiltest` to integration test - All 19 tests passing **Files Changed:** - `extensions/tn_digest/internal/engine_ops.go` - Core retry logic - `extensions/tn_digest/scheduler/scheduler.go` - Scheduler integration - `extensions/tn_digest/internal/engine_ops_test.go` - Comprehensive test coverage - `extensions/tn_digest/engine_ops_integration_test.go` - Build tag fix This ensures digest operations continue reliably even during network congestion or concurrent transaction activity, eliminating the nonce collision failure mode observed in production. resolves: trufnetwork/truf-network#1241

coderabbitai · 2025-10-02T07:36:19Z

Walkthrough

Adds a retry-capable broadcast path for auto_digest transactions that fetches a fresh nonce per attempt, integrates the retry call into the scheduler, guards an integration test with a kwiltest build tag, and adds unit tests exercising retry, nonce handling, and context cancellation.

Changes

Cohort / File(s)	Summary
Retry-enabled auto_digest broadcast `extensions/tn_digest/internal/engine_ops.go`	Adds `EngineOperations.BroadcastAutoDigestWithArgsAndRetry` with exponential backoff, context cancellation handling, retry loop and logging. Introduces internal `broadcastAutoDigestWithFreshNonce` helper to fetch fresh nonce, build/sign, broadcast, validate result code, and parse digest logs.
Scheduler integration `extensions/tn_digest/scheduler/scheduler.go`	Replaces `BroadcastAutoDigestWithArgsAndParse` with `BroadcastAutoDigestWithArgsAndRetry`, supplies a retry limit (3), and updates the error log message to reflect retries.
Unit tests for retry flow `extensions/tn_digest/internal/engine_ops_test.go`	Adds mocks and extensive tests for `BroadcastAutoDigestWithArgsAndRetry`: immediate success, retry-on-error, max-retries exceeded, fresh-nonce per attempt, context cancellation, and transaction failure parsing.
Build tag for integration test `extensions/tn_digest/engine_ops_integration_test.go`	Adds `//go:build kwiltest` build tag to include the integration test only when the `kwiltest` tag is used; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant S as Scheduler
  participant E as EngineOperations
  participant A as Accounts (nonce)
  participant B as Broadcaster (fn)
  participant N as Network

  S->>E: BroadcastAutoDigestWithArgsAndRetry(ctx, dsn, signer, broadcastFn, ...)
  loop retry attempts (<= max) and ctx not done
    E->>A: Request fresh nonce
    A-->>E: Nonce
    E->>E: Build & sign tx with nonce + args
    E->>B: call broadcastFn(ctx, tx, chainID)
    alt broadcast success (OK code)
      B-->>E: (hash, txResult, nil)
      E->>E: Validate code, parse digest from logs
      E-->>S: DigestTxResult (success)
    else broadcast failure / timeout / wrong code
      B-->>E: (hash?, txResult?, err)
      E-->>E: Log attempt, backoff, prepare next attempt
    end
    opt ctx cancelled
      E-->>S: return ctx error
    end
  end
  alt exceeded max retries
    E-->>S: return last error
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: enable leader check & retrigger digest #1143 — Touches tn_digest broadcast path and result parsing; likely overlaps with the new retry-enabled broadcast changes.
chore: enable automated digest scheduler with E2E testing #1140 — Modifies engine_ops nonce handling and accounts wiring; intersects with fresh-nonce logic used by retries.

Suggested reviewers

outerlook

Poem

I twitch my whiskers at each retry,
Fresh nonce hops in, we give it a try.
Backoff drumbeat, patient and spry,
Logs tell the tale, we parse the sky.
Three brave hops — if not, we sigh. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly summarizes the primary fix—addressing digest prune failures caused by nonce errors—and directly reflects the core change in the PR, making it specific and relevant to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch retryTnDigest

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03ed687 and 78c9489.

📒 Files selected for processing (1)

extensions/tn_digest/internal/engine_ops.go (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: lint
GitHub Check: acceptance-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

holdex · 2025-10-02T07:36:25Z

Time Submission Status

Member	Status	Time	Action	Last Update
MicBun	✅ Submitted	6h	Update time	Oct 2, 2025, 10:55 AM
@outerlook	❌ Missing	-	⚠️ Submit time	-

holdex · 2025-10-02T07:36:25Z

Bug Report Checklist

Status	Commit Link	Bug Author
✅ Submitted	commit link	@outerlook

MicBun · 2025-10-02T07:38:46Z

@pr-time-tracker bug commit 0eaef43 && bug author @outerlook

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12405f9 and 03ed687.

📒 Files selected for processing (4)

extensions/tn_digest/engine_ops_integration_test.go (1 hunks)
extensions/tn_digest/internal/engine_ops.go (2 hunks)
extensions/tn_digest/internal/engine_ops_test.go (2 hunks)
extensions/tn_digest/scheduler/scheduler.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

extensions/tn_digest/scheduler/scheduler.go (1)

extensions/tn_digest/scheduler/constants.go (3)

DigestDeleteCap (7-7)

DigestExpectedRecordsPerStream (8-8)

DigestPreservePastDays (9-9)

extensions/tn_digest/internal/engine_ops_test.go (1)

extensions/tn_digest/internal/engine_ops.go (1)

EngineOperations (25-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: acceptance-test
GitHub Check: lint

extensions/tn_digest/internal/engine_ops.go

MicBun requested a review from outerlook October 2, 2025 07:36

MicBun self-assigned this Oct 2, 2025

coderabbitai bot reviewed Oct 2, 2025

View reviewed changes

extensions/tn_digest/internal/engine_ops.go Outdated Show resolved Hide resolved

chore: apply suggestion from coderabbitai

78c9489

outerlook approved these changes Oct 2, 2025

View reviewed changes

MicBun merged commit 8942193 into main Oct 2, 2025
7 of 8 checks passed

MicBun deleted the retryTnDigest branch October 2, 2025 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: digest failing to prune pending days due to nonce errors#1188

fix: digest failing to prune pending days due to nonce errors#1188
MicBun merged 2 commits intomainfrom
retryTnDigest

MicBun commented Oct 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

holdex bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

holdex bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

MicBun commented Oct 2, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MicBun commented Oct 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

holdex bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Time Submission Status

Uh oh!

holdex bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Report Checklist

Uh oh!

MicBun commented Oct 2, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MicBun commented Oct 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 2, 2025 •

edited

Loading

holdex bot commented Oct 2, 2025 •

edited

Loading

holdex bot commented Oct 2, 2025 •

edited

Loading