fix(host-rpc): prevent block gap when buffer exhaustion resets backfill by rswanson · Pull Request #131 · init4tech/node-components

rswanson · 2026-04-03T19:18:36Z

Summary

Root cause: When walk_chain returned WalkResult::Exhausted, the notifier set backfill_from = finalized, which could be ahead of the last delivered block. This created a gap of undelivered blocks (e.g. 14 blocks between host 24800925 and finalized 24800939), causing "parent ru block not present in DB" crashes during initial sync.
Primary fix (host-rpc/notifier.rs): Compute resume_from = min(chain_view.back + 1, finalized) before clearing the buffer, ensuring backfill restarts from where we left off rather than jumping ahead.
Defensive check (node/node.rs): Add gap detection in process_committed_chain — if the first block to process isn't contiguous with the last stored block, bail with a clear error message instead of the cryptic parent-not-found error.

Reproduction scenario

During initial sync of signet-sidecar, the backfill ceiling landed close to the current tip. The first incoming newHead was >64 blocks ahead of the chain_view's latest entry, exhausting the buffer immediately. The notifier then reset to finalized (24800939) while the last delivered block was 24800925, skipping blocks 24800926–24800938.

Test plan

Deploy updated signet-sidecar to a fresh node and verify it syncs through the backfill→frontfill transition without crashing
Verify logs show resume_from in the exhaustion warning when buffer is exhausted
Verify that if a gap is somehow introduced, the node logs the new "notification gap" error instead of the opaque "parent ru block not present in DB"

🤖 Generated with Claude Code

When walk_chain exhausted the buffer, backfill_from was set to cached_finalized, which could be ahead of the last delivered block. This created a gap of undelivered blocks, causing "parent ru block not present in DB" crashes during initial sync. Now computes resume_from as min(chain_view.back + 1, finalized) to ensure continuity. Also adds a defensive gap check in the node's process_committed_chain to bail with a clear error message if a notification gap is ever detected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

prestwich

can this fix break our guarantee that notifications are always contiguous by setting the next notification earlier than the previous?

prestwich · 2026-04-03T20:08:45Z

[Claude Code]

This fix has a correctness issue: comparing against finalized (or chain_view.back()) is the wrong anchor for resuming after buffer exhaustion. Both are proxies for "what have we delivered downstream," but neither actually tracks that.

The bug

resume_from = (chain_view.back() + 1).min(finalized) goes backward whenever finalized lags behind the view tip, which is the normal case. If chain_view has blocks up to 100 and finalized is 95, we set backfill_from = 95 and re-emit blocks 95+ — violating the contiguity guarantee that notifications only move forward.

The original code (backfill_from = finalized) has the same fundamental problem. Finalized is an L1 consensus concept with no relationship to what the notifier has actually delivered. It can be behind, ahead, or sideways relative to our last emission.

The fix

The notifier needs a high-water mark: a field tracking the highest block number it has actually emitted in a HostNotification. On buffer exhaustion, resume from high_water_mark + 1, unconditionally. No comparison to finalized, no comparison to chain_view entries.

This gives us:

Forward-only progress — we never backtrack past what we've delivered
Contiguity preserved — next emission starts exactly where the last left off
Decoupled from L1 tags — finalized/safe are irrelevant to delivery bookkeeping

The chain_view still gets cleared (it's a hash-walk buffer, not a delivery record), and backfill picks up from the delivery watermark.

The guard added in node.rs is good as a defensive check, but it shouldn't be the mechanism that papers over a notifier that can emit non-contiguous or backward-moving notifications.

prestwich · 2026-04-04T20:37:15Z

superseded by #133

rswanson requested a review from a team as a code owner April 3, 2026 19:18

prestwich reviewed Apr 3, 2026

View reviewed changes

prestwich mentioned this pull request Apr 3, 2026

fix(host-rpc): track delivery high-water mark for backfill recovery #133

Open

4 tasks

prestwich closed this Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(host-rpc): prevent block gap when buffer exhaustion resets backfill#131

fix(host-rpc): prevent block gap when buffer exhaustion resets backfill#131
rswanson wants to merge 1 commit intomainfrom
fix/backfill-gap-on-buffer-exhaustion

rswanson commented Apr 3, 2026

Uh oh!

prestwich left a comment

Uh oh!

prestwich commented Apr 3, 2026

Uh oh!

prestwich commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rswanson commented Apr 3, 2026

Summary

Reproduction scenario

Test plan

Uh oh!

prestwich left a comment

Choose a reason for hiding this comment

Uh oh!

prestwich commented Apr 3, 2026

The bug

The fix

Uh oh!

prestwich commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants