Skip to content

rpc/jsonrpc, node/ethconfig: enable FcuBackgroundCommit by default#21293

Draft
yperbasis wants to merge 11 commits into
mainfrom
alex/fcu_bg_commit_35
Draft

rpc/jsonrpc, node/ethconfig: enable FcuBackgroundCommit by default#21293
yperbasis wants to merge 11 commits into
mainfrom
alex/fcu_bg_commit_35

Conversation

@yperbasis
Copy link
Copy Markdown
Member

@yperbasis yperbasis commented May 19, 2026

Summary

Enables FcuBackgroundCommit by default so the FCU response returns to the consensus client before the MDBX flush+commit lands. The commit runs in a background goroutine; the FCU semaphore still serializes successive FCUs, so FCU N+1 always reads FCU N's committed state.

Motivated by #21008 — local profiling on mainnet at tip vs release/3.4 showed p50 FCU latency regressed by ~+80 ms (mean ~+90 ms, p99 ~+160 ms) with wider burst tails. Background commit decouples the FCU response from the MDBX fsync so the CL gets ACK on the overlay-published head rather than the disk-flushed head.

The regression itself was caused by #19961 moving TxLookup / Senders / Finish into the synchronous FCU path via PipelineExecutor.RunLoop. Background commit masks this by hiding the fsync from the FCU response; under disk-busy bursts those stages still execute on the FCU path and e.semaphore will block the next FCU on the still-running prior bg goroutine. The root-cause fix (move those stages back to a background StageLoop equivalent to release/3.4's shape, per #21008 comment) is out of scope here.

Changes

Flag flip — node/ethconfig/config.go, cmd/utils/flags.go, docs. Flip FcuBackgroundCommit default to true. The config.go doc block covers the FCU semaphore sequencing, embedded vs. remote rpcdaemon semantics, and the known limitation below. Flag usage and the configuring-erigon docs carry a one-line caveat (remote rpcdaemon 'latest' lags ~50 ms, stays consistent).

Head-sensitive RPC reads — split by what the dependent reads can see:

  • Overlay-aware (head + dependent block-table reads both visible in the published SD's BlockOverlay): bor_api_impl.go (GetSnapshot, GetAuthor, GetSigners, getLatestBlockNum, GetSnapshotProposer, GetSnapshotProposerSequence), eth_block.go GetBlockTransactionCountByNumber, eth_system.go GasPrice/BaseFee/BlobBaseFee/MaxPriorityFeePerGas/FeeHistory (including the GasPriceOracleBackend tx, wrapped once at construction), eth_txs.go GetTransactionByHash pool fallback, graphql_api.go GetLatestBlockNumber, txpool_api.go Content/ContentFrom, debug_api.go SetHead, erigon_block.go GetBlockByTimestamp.

  • Plain tx (dependent reads use SD-temporal data — domain/history/inverted-index — which the block overlay does not expose): eth_call.go getProof/getWitness, eth_simulation.go SimulateV1, eth_receipts.go/erigon_receipts.go GetLogs, overlay_api.go getBeginEnd, trace_filtering.go Filter, debug_execution_witness.go buildExpectedPostState, parity_api.go ListStorageKeys (with a nil guard on ReadCurrentBlockNumber). For these, resolving latest through the overlay would land on N while the dependent reads can only reach N-1 — silently wrong, not just lagging. Block-tag resolution in these files passes nil filters to rpchelper.GetBlockNumber consistently so all bounds, guards, and scans agree on the committed view.

rpc/jsonrpc/eth_call.go Call — already used BlockOverlayTemporalTx from #20195. Now carries a comment documenting that block-table reads see overlay-N during the bg-commit window but state reads (GetLatest, GetAsOf, etc.) evaluate against N-1, because the SD's domain mem batch is not exposed by the overlay. The proper SD-aware temporal view follow-up is tracked in #21314.

Overlay safe-close invariant — db/kv/membatchwithdb/memory_mutation.go, db/state/execctx/domain_shared.go, rpc/rpchelper/filters.go. The pattern "RPC reader acquires an overlay view, then the FCU bg-commit goroutine closes the published SD while the reader is still iterating" is safe today only because the BlockOverlay is backed by NewMemoryBatch (pure-Go memStoreRollback/Close are no-ops on the in-memory data) and views hold their own caller-supplied backing tx. Both properties were load-bearing but undocumented.

  • MemoryMutation.newReadViewMut now panics if m.memTx is not *memStore. Catches the case where a future change swaps in NewMemoryBatchMDBX (which does invalidate cursors on Rollback) without adding refcount/drain logic.
  • MemoryMutation.Rollback, MemoryMutation.NewReadView, SharedDomains.Close, Filters.WithOverlay, and Filters.WithTemporalOverlay carry comments explaining the invariant and the consequence if it is relaxed.

Bg-commit ↔ post-FCU RW tx race fix — cmd/utils/app/import_cmd.go, execution/execmodule/execmoduletester/exec_module_tester.go. Both opened a fresh RW tx (to WriteHeadBlockHash or to read committed state) immediately after UpdateForkChoice returned. Under bg-commit that races the bg goroutine writing the header bytes — HeadBlockHash could land in MDBX before the header itself, crashing the next startup in BlockReader.CurrentBlock. Both now call ExecModule.WaitIdle to drain the FCU semaphore first. (The import_cmd race was caught by a real Hive failure during PR testing.)

Test unskip — execution/execmodule/exec_module_test.go. TestNotificationDispatchBackgroundCommit previously skipped with the message "FCU N+1 reads stale state from DB". The ExecModule semaphore actually serializes successive FCUs (the bg goroutine releases only after Flush+Commit), so the race described in the skip cannot happen. Unskipped; comment rewritten.

Safety

FCU sequencing. forkchoice.go:158 TryAcquire(1) on e.semaphore. In the bg path, shouldReleaseSema=false so the outer defer doesn't release; the bg goroutine has defer e.semaphore.Release(1) (line 658) which fires LIFO after runPostForkchoice returns (commit complete) and after PublishOverlay(nil). AssembleBlock and ValidateChain use the same semaphore, so engine_newPayload / engine_getPayload from the CL serialize behind a pending bg commit — they'll see ExecutionStatusBusy and retry, never a torn state.

Embedded rpcdaemon. Overlay-aware paths read FCU N's block-table writes (canonical hashes, headers, bodies, TDs, stage progress, TxNums, forkchoice markers — all written via the BlockOverlay at forkchoice.go:247 tx = currentContext.BlockOverlay()). Pre-commit: readers see N via overlay. Post-commit (after PublishOverlay(nil)): readers cascade to MDBX which now has N. No stale window.

Remote rpcdaemon. dispatchNotificationsFromOverlay sends the StateChangeBatch over gRPC pre-commit; kvcache.Coherent.OnNewBlock advances. The remote tx (gRPC remotedb) reads PlainStateVersion from committed MDBX, and Coherent.View(tx) keys by that — so during the bg-commit window remote reads resolve to the matching root for the PSV they observe. PlainStateVersion is incremented inside the commit batch (TemporalMemBatch.Flush at db/state/temporal_mem_batch.go:787), so PSV moves atomically with the rest of the state. Remote consumers see a consistent (lagging) state, never divergent.

Known limitation

eth_call(latest), eth_getBalance(latest), eth_getStorageAt(latest), eth_getCode(latest): during the bg-commit window the header resolves to N (via overlay tables) but state reads evaluate against N-1 because the SD's domain mem batch is not exposed by the block overlay. Bounded staleness (~50 ms), not corruption. The proper SD-aware temporal view that would close this gap is tracked in #21314.

Test plan

  • make lint clean (multiple passes, non-deterministic per CLAUDE.md)
  • make erigon integration builds
  • make test-short on rpc/jsonrpc/, execution/execmodule/, db/kv/membatchwithdb/, db/state/execctx/, rpc/rpchelper/
  • TestNotificationDispatchBackgroundCommit (unskipped) passes
  • Sanity-run on mainnet minimal node — confirm FCU latency drops back to release/3.4 levels
  • Compare RPC behaviour during bg-commit window: embedded paths return N's data immediately; remote rpcdaemon sees ~50 ms lag, never divergence; eth_call(latest) lag matches the documented limitation
  • CI: full test + race + hive + kurtosis

🤖 Generated with Claude Code

Routes head-sensitive RPC reads (ReadCurrentHeader, ReadHeadHeaderHash,
GetLatestBlockNumber) through the SharedDomains overlay via existing
Filters.WithOverlay/WithTemporalOverlay, so the FCU response can return
before the MDBX commit lands without RPC consumers seeing stale chain
heads. Coherent cache for remote rpcdaemon already receives pre-commit
StateChanges and serves the matching state-version root.

See #21008 for the motivating regression (~+80ms p50
FCU latency + multi-second burst tails on main vs release/3.4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables FcuBackgroundCommit by default and updates JSON-RPC handlers to read head-sensitive data through the SharedDomains overlay so RPC responses can reflect the new FCU head before the MDBX commit/fsync lands.

Changes:

  • Flip ethconfig.Defaults.FcuBackgroundCommit to true and update the rationale comment to reflect current overlay + notification/coherent-cache wiring.
  • Wrap a set of head-sensitive rawdb.Read* / rpchelper.Get{Latest,Safe,Finalized}BlockNumber call sites with filters.WithOverlay / WithTemporalOverlay.
  • Minor call-site refactors (introducing overlayTx locals) to reuse overlay-backed transactions.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
rpc/jsonrpc/txpool_api.go Read current header via overlay for txpool content endpoints.
rpc/jsonrpc/trace_filtering.go Use overlay-backed head hash/number when ToBlock is omitted.
rpc/jsonrpc/parity_api.go Read current block number via overlay for parity storage-key listing.
rpc/jsonrpc/overlay_api.go Use overlay-backed latest block number when capping ranges.
rpc/jsonrpc/graphql_api.go Resolve “latest” block number via overlay for GraphQL API.
rpc/jsonrpc/eth_txs.go Read current header via overlay for pending-tx responses.
rpc/jsonrpc/eth_system.go Use temporal overlay for gas oracle backend and overlay for head header reads (base fee/blob base fee).
rpc/jsonrpc/eth_simulation.go Compare requested block to overlay-backed latest head to avoid “future block” false positives.
rpc/jsonrpc/eth_receipts.go Use overlay-backed latest block number in max-uint32 capping path.
rpc/jsonrpc/eth_call.go Use overlay-backed latest block number when building proof/witness preconditions.
rpc/jsonrpc/eth_block.go Use overlay-backed latest block number for transaction-count-by-number guard.
rpc/jsonrpc/erigon_receipts.go Use overlay-backed latest block number for log range defaults.
rpc/jsonrpc/erigon_block.go Read current header via overlay for timestamp-based block lookup.
rpc/jsonrpc/debug_execution_witness.go Use overlay-backed latest block number when building expected post-state.
rpc/jsonrpc/debug_api.go Use overlay-backed latest block number for debug_setHead baseline.
rpc/jsonrpc/bor_api_impl.go Use overlay-backed latest header/number in several Bor RPCs.
node/ethconfig/config.go Default FcuBackgroundCommit to true and expand explanatory comment.
Comments suppressed due to low confidence (4)

rpc/jsonrpc/parity_api.go:80

  • bn is read through the overlay, but subsequent reads (_txNumReader.Min and tx.RangeAsOf) still use the original temporal tx. During a background-commit window this can leave parity_listStorageKeys querying state/txnums from the committed DB while targeting the overlay head number, producing stale or inconsistent results. Consider wrapping the temporal tx once (e.g., via filters.WithTemporalOverlay) and using that wrapped tx consistently for the latest-state reader, txnum lookup, and the RangeAsOf scan.
	bn := rawdb.ReadCurrentBlockNumber(api.filters.WithOverlay(tx))
	minTxNum, err := api._txNumReader.Min(ctx, tx, *bn)
	if err != nil {
		return nil, err
	}

rpc/jsonrpc/trace_filtering.go:351

  • toBlock is now derived from an overlay-backed head hash/number, but the rest of the method still passes the original dbtx into filterV3, which computes txnum bounds and scans indexes against the committed DB view. This means the requested range can reference an overlay head while the underlying trace scan is still anchored to the pre-commit state (and the block-range limit check uses the overlay height). Consider using a temporal overlay tx (filters.WithTemporalOverlay(dbtx)) and passing that through to filterV3 (and any txnum/index reads) so the head number and the scanned data come from the same view.
	if req.ToBlock == nil {
		overlayTx := api.filters.WithOverlay(dbtx)
		headNumber, err := api._blockReader.HeaderNumber(ctx, overlayTx, rawdb.ReadHeadHeaderHash(overlayTx))
		if err != nil {
			return err
		}
		toBlock = *headNumber

rpc/jsonrpc/bor_api_impl.go:111

  • latestBlockNum is resolved using an overlay-backed tx, but the subsequent HeaderByNumber call still uses the original tx. During background commit this can cause the latest header lookup to miss the overlay head (returning errUnknownBlock even though the overlay has the header). Consider reusing the same overlay-wrapped tx for the HeaderByNumber/HeaderByHash reads in the “latest” path.
	//nolint:nestif
	if blockNrOrHash == nil {
		latestBlockNum, err2 := rpchelper.GetLatestBlockNumber(api.filters.WithOverlay(tx))
		if err2 != nil {
			return accounts.NilAddress, err2
		}
		header, err = api._blockReader.HeaderByNumber(ctx, tx, latestBlockNum)
	} else {

rpc/jsonrpc/eth_block.go:383

  • This method now uses an overlay-backed tx to compute latestBlockNumber, but it still reads the block body/tx count through the original tx later in the function. In a background-commit window, blockNum can be the overlay head while _blockReader.Body on the committed tx returns nil, so the RPC may still return null for latest. Consider using the same overlay-wrapped view for the subsequent body read when serving head-sensitive queries.
	latestBlockNumber, err := rpchelper.GetLatestBlockNumber(api.filters.WithOverlay(tx))
	if err != nil {
		return nil, err
	}
	if blockNum > latestBlockNumber {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yperbasis and others added 2 commits May 19, 2026 23:43
The skip dated to PR #20195 when bg-commit had a documented "FCU N+1
reads stale state from DB" race. The actual race is in the tester
helper, not the FCU: insertPoSBlocks waits on pre-commit state-change
events from the gRPC stream, then InsertChain opens an roTx and reads
ReadHeader/HeadBlockHash before the bg goroutine has committed.

Fix in InsertChain itself by calling ExecModule.WaitIdle (acquires
then releases the FCU semaphore — the bg goroutine releases it only
after commit). With foreground commit the semaphore is already free,
so WaitIdle is an instant no-op.

FCU sequencing was never the issue: forkchoice.go:158's TryAcquire
+ retryBusy-on-Busy serialises consecutive FCUs against the prior
bg goroutine. The test passes under -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…BlockHash

When erigon runs in import mode under FcuBackgroundCommit=true, the
post-UpdateForkChoice write at the end of import_cmd.go::InsertChain
races the bg-commit goroutine: it can land WriteHeadBlockHash(lvh) in
MDBX before the bg goroutine flushes the overlay containing Headers /
BlockHash entries for the same block. Result on next startup:
ReadHeadBlockHash returns the hash, HeaderNumber(hash) returns nil,
BlockReader.CurrentBlock dereferences the nil and panics
(freezeblocks/block_reader.go:1409). This made hive ethereum/rpc-compat
fail at runner startup of the second erigon process.

The existing wait on the state-change stream only ensures the dispatcher
fired (events are dispatched pre-commit from the overlay), not that
MDBX has flushed. Add ExecutionModule().WaitIdle() before the Update —
WaitIdle acquires/releases the FCU semaphore, blocking until the bg
goroutine completes its flush+commit. No-op when bg-commit is off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis
Copy link
Copy Markdown
Member Author

Manually dispatched the three CI workflows that were skipped because the PR is in draft state:

Why I want them: this PR flips FcuBackgroundCommit to true and re-routes head-sensitive RPC reads through the SD overlay. The earlier auto hive run on commit d7bbe14 (run 26126332101) hit a real bug in the import path (nil-deref in BlockReader.CurrentBlock because the post-FCU WriteHeadBlockHash raced the bg-commit goroutine), fixed in commit 9e61f87. Want the dispatched run to confirm that fix, and to surface any other paths with the same shape across the broader hive + gloas surface.

yperbasis and others added 2 commits May 20, 2026 09:28
Copilot review caught a class of bugs in the FcuBackgroundCommit
migrations where the head lookup was overlay-wrapped but the dependent
read on the same tx was not. During the ~50ms bg-commit window the
head returns N+1 (from the SD overlay) while MDBX is still at N, so
the dependent read references a head the rest of its view can't see.

- bor_api_impl.go (GetAuthor): pass the same overlayTx to
  _blockReader.HeaderByNumber. Previously latestBlockNum was N+1 but
  HeaderByNumber read MDBX for N+1's canonical hash → nil → errUnknownBlock.

- eth_block.go (GetBlockTransactionCountByNumber): pass overlayTx to
  _blockReader.Body. Without this, blockNum can be N+1 while Body
  returns nil from committed MDBX, dropping the RPC to (nil, nil).

- parity_api.go (parity_listStorageKeys): revert to plain tx. There is
  no good overlay-aware version: the block overlay exposes table
  writes (TxNums included) but not the SD domain mem batch, so a
  RangeAsOf over kv.StorageDomain would still bypass the pending
  storage writes for an overlay-derived block number, yielding an
  inconsistent view.

- trace_filtering.go: revert the toBlock-via-overlay change. filterV3
  scans receipts/logs against dbtx; routing only the toBlock through
  the overlay leaves the upper bound past what the scan can see.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep of the remaining migrated callsites for the same head-vs-data
inconsistency Copilot flagged in the first review pass: the head
lookup was routed through the SD overlay but the downstream reads
(log scans, witness/proof builds, setHead, blockWithSenders, binary
search) still operate on the plain committed tx, so during the
bg-commit window the function references a head its own follow-up
reads cannot see.

Reverted to plain tx (consistent committed view) in:

- eth_call.go (GetProof / CreateAccessList witness paths)
- overlay_api.go and erigon_receipts.go and eth_receipts.go
  (log range upper bounds — getLogsV3 scans against plain tx;
  eth_getLogs additionally guards on GetLatestExecutedBlockNumber(tx)
  which would otherwise fire "node is still syncing" for an
  overlay-derived upper bound)
- eth_simulation.go (guard followed by blockWithSenders on plain tx)
- debug_api.go (debug_setHead guard; SetHead acts on committed DB)
- debug_execution_witness.go (latestBlock gates a branch that reads
  txnums + seeks commitment on plain tx)
- erigon_block.go (GetBlockByTimestamp binary search uses
  HeaderByNumber on plain tx)

Kept overlay-aware where the function uses the result in-memory and
does no dependent DB read, or where the dependent read was extended
in the previous commit (bor_api_impl.go GetAuthor,
eth_block.go GetBlockTransactionCountByNumber): eth_system.go
BlockNumber/GasPrice/BaseFee/BlobBaseFee (in-memory) and the
GasPriceOracleBackend tx (wrapped once at construction so all
b.tx reads are overlay-aware), bor_api_impl.go header-only paths,
txpool_api.go Content/ContentFrom, eth_txs.go pool fallback,
graphql_api.go GetLatestBlockNumber, trace_filtering.go was already
reverted in the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis
Copy link
Copy Markdown
Member Author

Re-dispatched the three skipped workflows against HEAD 79ce7d94ef (includes the import_cmd WaitIdle fix from 9e61f87, the bor/eth_block overlay-consistency extensions from d0a8e31, and the head-vs-data revert sweep from 79ce7d9):

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comment thread rpc/jsonrpc/parity_api.go
Pre-existing latent panic: rawdb.ReadCurrentBlockNumber returns *uint64
which is nil when HeadHeaderHash isn't set (fresh DB / pre-head state),
and the *bn dereference on the next line would crash the RPC handler.
Surface a user-facing error instead. Flagged by Copilot review of the
nearby diff in 79ce7d9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis marked this pull request as ready for review May 20, 2026 12:53
@yperbasis yperbasis requested a review from mh0lt as a code owner May 20, 2026 12:53
@yperbasis yperbasis requested a review from taratorio May 20, 2026 12:53
@yperbasis yperbasis added the RPC label May 20, 2026
@yperbasis yperbasis marked this pull request as draft May 20, 2026 13:22
yperbasis and others added 4 commits May 20, 2026 15:53
eth_getLogs, trace_filter.Filter, and overlay_api.getBeginEnd previously
resolved the implicit `latest` reference through nil filters (plain tx,
returns N-1 during the bg-commit window) but routed user-passed
ToBlock/FromBlock="latest" through api.filters (overlay-aware, returns
N). The resulting `end > latest` guard would false-positive
errBlockRangeIntoFuture for ~50ms after every FCU.

Pass nil filters consistently in the explicit-tag resolution so the
upper bound, the lower bound, and the underlying getLogsV3/filterV3 scan
all agree on the committed view.

Pre-existing in foreground-commit mode too, but only observable during
the brief commit window when the CL hadn't yet received the FCU
response; becomes user-facing with FcuBackgroundCommit=true defaulting
on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ByTimestamp and debug_setHead

Two callsites previously reverted to plain tx in 79ce7d9 only touch
block tables (no SD-temporal reads), so overlay-aware reads are fully
consistent with their dependent reads.

- erigon_block.GetBlockByTimestamp: wrap tx once at the top; route
  ReadCurrentHeader, the inner _blockReader.HeaderByNumber binary-search
  probe, and the three buildBlockResponse callsites through overlayTx.

- debug_setHead: read currentHead through the overlay so
  debug_setHead(N) during the bg-commit window doesn't false-positive
  "block N is in the future" for what is effectively a no-op rollback.

The SD-temporal reverts (eth_call/getProof, log scans, simulation,
commitment seek, parity_listStorageKeys) still need the SD-aware
temporal view tracked in #21314.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mmit semantics and known limitation

Document the trade-offs of routing head reads through the SD block
overlay during the bg-commit window:

- node/ethconfig/config.go: expand the FcuBackgroundCommit doc block to
  cover the FCU semaphore sequencing, embedded vs. remote rpcdaemon
  semantics, and the eth_call(latest) header-vs-state lag (~50ms). Link
  to #21314 for the proper SD-aware view follow-up.

- rpc/jsonrpc/eth_call.go: note that BlockOverlayTemporalTx wraps table
  reads but delegates temporal methods to roTx; link to #21314.

- rpc/jsonrpc/eth_simulation.go: comment was wrong — blockWithSenders
  auto-wraps. The real reason latest stays on plain tx is
  NewSharedDomains ties the simulator to the plain tx's domain state.

- rpc/jsonrpc/eth_system.go: explain why GasPriceOracleBackend.Fork
  opens a non-overlay-wrapped tx (downstream helpers re-wrap
  internally).

- execution/execmodule/exec_module_test.go: replace stale "subsequent
  blocks may fail validation" comment on TestNotificationDispatchBackgroundCommit
  — the semaphore serializes FCUs so N+1 always reads N's committed
  state.

- cmd/utils/flags.go + docs (configuring-erigon.mdx, llms-full.txt):
  update --fcu.background.commit usage and default; flag a concise
  rpcdaemon caveat in the description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ay safe-close invariant and assert memStore backing

RPC readers acquire views via Filters.WithOverlay / WithTemporalOverlay /
SharedDomains.BlockOverlayTemporalTx that share memTx with the published
SD's BlockOverlay. The FCU bg-commit goroutine closes that SD while
those readers are still iterating, and the only thing keeping the
pattern safe today is that the BlockOverlay is backed by
membatchwithdb.NewMemoryBatch — a pure-Go memStore whose
Rollback/Close are no-ops on the in-memory data (memory_store.go), and
that views hold their own caller-supplied backing tx. Both properties
were load-bearing but undocumented; flagged in the FcuBackgroundCommit=true
review as a latent risk.

Make the invariant explicit:

- MemoryMutation.Rollback (memory_mutation.go): comment explaining the
  no-op semantics for memStore-backed batches and the consequence for
  concurrent read views.

- MemoryMutation.NewReadView (memory_mutation.go): concurrency note
  pointing to newReadViewMut.

- MemoryMutation.newReadViewMut (memory_mutation.go): runtime panic if
  m.memTx is not *memStore. Catches the case where someone swaps
  NewMemoryBatch for NewMemoryBatchMDBX (which DOES invalidate cursors
  on Rollback) and forgets that the safe-concurrent-close property
  vanishes. The panic message points at the required follow-up
  (refcount/drain) for any MDBX-backed overlay with concurrent readers.

- SharedDomains.Close (domain_shared.go): comment walking through each
  of sd.mem / sd.blockOverlay / sd.sdCtx and why concurrent RPC views
  remain safe under each close path.

- Filters.WithOverlay / WithTemporalOverlay (filters.go): public-API
  documentation of the safe-concurrent-close property and the standard
  caller-tx-lifecycle constraint that still applies.

No behaviour change for the supported (memStore-backed) path; the panic
fires only if an unsupported backing store is wired in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis marked this pull request as ready for review May 20, 2026 14:19
@yperbasis yperbasis requested a review from bloxster as a code owner May 20, 2026 14:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (4)

rpc/jsonrpc/bor_api_impl.go:115

  • When blockNrOrHash.Number() is a tag (e.g. "latest", "pending", "safe", "finalized"), converting it to uint64 yields a huge number and results in an incorrect lookup. This code path should resolve tags explicitly (e.g. via rpchelper.GetBlockNumber or by handling rpc.LatestBlockNumber etc.) before calling HeaderByNumber.
		header, err = api._blockReader.HeaderByNumber(ctx, overlayTx, latestBlockNum)
	} else {
		if blockNr, ok := blockNrOrHash.Number(); ok {
			header, err = api._blockReader.HeaderByNumber(ctx, tx, uint64(blockNr))
		} else {

rpc/jsonrpc/eth_receipts.go:162

  • Calling rpchelper.GetBlockNumber(..., nil) can panic if crit.ToBlock is the pending tag: _GetBlockNumber dereferences filters in the PendingBlockNumber case (filters.LastPendingBlock()). Consider explicitly rejecting pending here (committed-only scan), or ensure pending resolution is handled safely.
			if toBlock > 0 {
				end = uint64(toBlock)
			} else {
				blockNum := rpc.BlockNumber(toBlock)
				end, _, _, err = rpchelper.GetBlockNumber(ctx, rpc.BlockNumberOrHashWithNumber(blockNum), tx, api._blockReader, nil)

rpc/jsonrpc/overlay_api.go:624

  • rpchelper.GetBlockNumber(..., nil) can panic on the pending tag because _GetBlockNumber calls filters.LastPendingBlock() without a nil check. If crit.ToBlock can be pending, either validate/reject it up front or route pending resolution through a non-nil filters implementation that doesn't enable overlay-based latest/safe/finalized.
			toBlock := crit.ToBlock.Int64()
			if toBlock > 0 {
				end = uint64(toBlock)
			} else {
				blockNum := rpc.BlockNumber(toBlock)
				end, _, _, err = rpchelper.GetBlockNumber(ctx, rpc.BlockNumberOrHashWithNumber(blockNum), tx, api._blockReader, nil)
				if err != nil {

rpc/jsonrpc/trace_filtering.go:357

  • rpchelper.GetBlockNumber(..., nil) can panic if req.ToBlock is pending, because _GetBlockNumber dereferences filters in the PendingBlockNumber case (filters.LastPendingBlock()). Consider rejecting pending here or ensure pending resolution is handled safely without a nil filters pointer.
		}
		toBlock = *headNumber
	} else {
		toBlock, _, _, err = rpchelper.GetBlockNumber(ctx, *req.ToBlock, dbtx, api._blockReader, nil)
		if err != nil {

Comment thread rpc/jsonrpc/eth_receipts.go
Comment thread rpc/jsonrpc/overlay_api.go
Comment thread rpc/jsonrpc/trace_filtering.go
85d168b routed eth_getLogs / erigon_getLogs / overlay.* /
trace_filter explicit-tag resolution through rpchelper.GetBlockNumber
with nil filters, so the upper bound stays on the same committed view
that the underlying scan uses. But _GetBlockNumber's PendingBlockNumber
branch dereferenced filters.LastPendingBlock() without a nil check — a
latent crash that was unreachable while every caller passed non-nil
filters. The pre-existing nil-filters call at eth_receipts.go:127 used
the LatestExecutedBlockNumber tag and never hit this branch; my new
nil-filters calls forward user-supplied tags, so a user passing
FromBlock/ToBlock="pending" would now panic.

Fall back to plainStateBlockNumber when filters is nil — same as the
existing path when filters is set but holds no pending block — which
is the right semantics for the log-range scan callsites (they can only
see committed data anyway).

Caught by the Copilot reviewer on the latest PR push (29db691...
review of ab060ab).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis requested a review from wmitsuda May 21, 2026 10:08
Copy link
Copy Markdown
Contributor

@mh0lt mh0lt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR which should land tomorrow which removes the additional semaphore we have arounf the commit in background mode which make the saync commit slower for fcu. When I did this I found a few rpc methods still where not reading the right overlay.

You may want to wait for my PR before merging this. I'll add a pr no once I raise it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants