fix(host-rpc): prevent block gap when buffer exhaustion resets backfill#131
fix(host-rpc): prevent block gap when buffer exhaustion resets backfill#131
Conversation
When walk_chain exhausted the buffer, backfill_from was set to cached_finalized, which could be ahead of the last delivered block. This created a gap of undelivered blocks, causing "parent ru block not present in DB" crashes during initial sync. Now computes resume_from as min(chain_view.back + 1, finalized) to ensure continuity. Also adds a defensive gap check in the node's process_committed_chain to bail with a clear error message if a notification gap is ever detected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
prestwich
left a comment
There was a problem hiding this comment.
can this fix break our guarantee that notifications are always contiguous by setting the next notification earlier than the previous?
|
[Claude Code] This fix has a correctness issue: comparing against The bug
The original code ( The fixThe notifier needs a high-water mark: a field tracking the highest block number it has actually emitted in a This gives us:
The The guard added in |
|
superseded by #133 |
Summary
walk_chainreturnedWalkResult::Exhausted, the notifier setbackfill_from = finalized, which could be ahead of the last delivered block. This created a gap of undelivered blocks (e.g. 14 blocks between host 24800925 and finalized 24800939), causing"parent ru block not present in DB"crashes during initial sync.host-rpc/notifier.rs): Computeresume_from = min(chain_view.back + 1, finalized)before clearing the buffer, ensuring backfill restarts from where we left off rather than jumping ahead.node/node.rs): Add gap detection inprocess_committed_chain— if the first block to process isn't contiguous with the last stored block, bail with a clear error message instead of the cryptic parent-not-found error.Reproduction scenario
During initial sync of
signet-sidecar, the backfill ceiling landed close to the current tip. The first incomingnewHeadwas >64 blocks ahead of the chain_view's latest entry, exhausting the buffer immediately. The notifier then reset tofinalized(24800939) while the last delivered block was 24800925, skipping blocks 24800926–24800938.Test plan
resume_fromin the exhaustion warning when buffer is exhausted"notification gap"error instead of the opaque"parent ru block not present in DB"🤖 Generated with Claude Code