Skip to content

BAL-driven parallel commitment (PR #5 of the perf stack)#21416

Open
mh0lt wants to merge 4 commits into
mh/perf-statecache-lru-prfrom
mh/perf-bal-compute-pr-v2
Open

BAL-driven parallel commitment (PR #5 of the perf stack)#21416
mh0lt wants to merge 4 commits into
mh/perf-statecache-lru-prfrom
mh/perf-bal-compute-pr-v2

Conversation

@mh0lt
Copy link
Copy Markdown
Contributor

@mh0lt mh0lt commented May 26, 2026

Important

DRAFT — does NOT build on this base. The commits use the typed VersionedWrite[T] API (ValU256, ValU64, ValBytes fields), which lands as part of the versionedio refactor (PR #6 of the perf stack, currently unposted). Build error on the current base:

execution/stagedsync/calc_state.go:368: unknown field ValU256 in struct literal of type state.VersionedWrite
execution/stagedsync/calc_state.go:373: unknown field ValU64 in struct literal of type state.VersionedWrite
execution/stagedsync/calc_state.go:378: unknown field ValBytes in struct literal of type state.VersionedWrite

Pushed for visibility / review of the BAL-driven commitment design. CI will fail until the typed-VW prerequisite lands.

Merge order (when the dep lands): #21380#21386#21387 (already merged) → typed-VW PR → this.

Summary

Pipelines commitment computation alongside EVM execution by feeding the commitment calculator the BAL (block access list) at block-arrival, instead of waiting for execution to publish writes. Closes the structural piece of issue #19791 — "perf: pipeline commitment with execution in parallel path".

Today execution and commitment serialize: EVM finishes a block → publishes the writeset → commitment hashes it. With this PR, the calculator receives requests at block-arrival (BAL declares the post-block leaf set), runs in parallel with the EVM, and folds with the EVM's actual writes when both finish. Steady-state cost shifts from exec + trie to max(exec, trie).

Mechanism — 4 stages

  1. Stage 1 — calcState.LoadFromBAL (8ead839656): the parallel commitment calculator's calcState learns to seed itself from a BAL-declared leaf set, instead of only the exec-published writeset. Lays the foundation for early-start commitment.
  2. Stage 1.5 — engineapi tests under async commit (48eb2b0173): flips the engineapi test harness to run with FcuBackgroundCommit=true by default, so the existing assertoor suite exercises the parallel-commitment path in CI.
  3. Stage 2 — feed commitment calculator block requests (2a57d9cd66): wires the BAL-arrival hook to enqueue per-block commitment requests at the calculator. Calculator begins hashing in parallel with execution.
  4. Stages 3-5 — BAL-driven commitment fold + BAL_SHADOW_COMPUTE (08af2551f5): folds the parallel calculator's root with the EVM's actual writes when both finish; emits BAL_SHADOW_COMPUTE divergence metrics so the parallel path is observable against the serial reference until we trust it in steady state.

Files (7)

File What
execution/stagedsync/calc_state.go LoadFromBAL + parallel calculator state
execution/stagedsync/committer.go wire calculator into commitment stage
execution/stagedsync/exec3.go, exec3_parallel.go block-arrival hook
execution/stagedsync/bal_load_test.go LoadFromBAL unit tests
execution/engineapi/engineapitester/engine_api_tester.go async-commit-default test wiring
common/dbg/experiments.go BAL_SHADOW_COMPUTE env knob

7 files, +221/-33.

Commits (4)

8ead839656 execution/stagedsync: calcState.LoadFromBAL — Stage 1 of parallel commitment
48eb2b0173 execution/engineapi: run engineapi tests under async commit by default
2a57d9cd66 execution/stagedsync: feed commitment calculator block requests (parallel commitment Stage 2)
08af2551f5 execution/stagedsync: BAL-driven commitment fold (parallel commitment Stages 3-5)

Gating + safety

  • Behind BAL_SHADOW_COMPUTE (env knob, default off in production). When unset, the parallel calculator runs shadow-only — its root is computed but not consumed; the serial reference still drives correctness. Divergence is logged so the parallel path is observable against the reference before we cut over.
  • Composes with #21293 (FcuBackgroundCommit default flip) — the engineapi test changes here make that flip easier to verify against the assertoor suite.

Out of scope

  • Default flip of BAL_SHADOW_COMPUTE — stays off in this PR. Cutover lands once shadow-compute observability shows zero divergence in steady state.
  • MergeDuringCompute — aggregator-merge yield-during-execution; separate direction, not in this PR.
  • Eth/71 BAL wire protocol — independent feature; can ship anywhere in the perf-stack path.

Related

🤖 Generated with Claude Code

@mh0lt mh0lt force-pushed the mh/perf-bal-compute-pr-v2 branch from 08af255 to 50e1afd Compare May 26, 2026 09:28
Mark Holt and others added 4 commits May 26, 2026 09:29
…mitment

LoadFromBAL populates the commitment calculator's calcState from an
EIP-7928 Block Access List instead of the per-tx VersionedWrites
stream. The BAL declares the block's post-state up front, so the
calculator can build the trie without waiting for execution to stream
writes tx-by-tx — the prerequisite for running commitment fully
parallel to execution.

For each touched account it takes the block-end value per field — the
highest-tx-indexed change, via the generic finalChange helper — and
feeds the existing ApplyWrites, reusing the SELFDESTRUCT / Deleted /
EIP-161 routing rather than reimplementing it. Storage reads are
ignored (commitment only needs the changed set).

Not yet modelled: the BAL carries no explicit SelfDestructPath or
incarnation field, so account-deletion and fresh-contract-incarnation
blocks diverge from the incremental path — tracked as a Stage-1
follow-up.

TestLoadFromBAL_MatchesApplyWrites is the differential proof: loading
calcState from a BAL produces byte-identical accumulated state to
feeding the equivalent multi-write VersionedWrites stream through
ApplyWrites. TestFinalChange covers the highest-index-wins helper.

This is Stage 1 of the parallel-commitment work; it adds no call site
yet — LoadFromBAL is unused until the calculator wiring lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Default the engineapi test harness ethConfig to FcuBackgroundCommit=true.
Async commit runs the post-FCU flush on a background goroutine, so a
subsequent newPayload may read the parent SD either pre- or post-flush —
the path functional tests must exercise. Sync commit makes every flush
deterministic per-FCU and masks flush-timing bugs. A test needing sync
commit can still opt out via EthConfigTweaker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llel commitment Stage 2)

The commitment calculator inferred block boundaries from the txResult /
blockResult stream. Give it an explicit per-block heads-up: a blockRequest
carrying the block identity and the block's BAL, sent by the dispatch
layer on its own channel — separate from the result fan-out so a request
is never trapped behind a prior block's txResults.

The calculator's loop now multiplexes the result channel and the
blockRequests channel; handleBlockRequest records the per-block mode
(BAL-driven when the block has a BAL and BAL I/O is enabled, else
incremental) into a pending map, cleared on the matching blockResult.

This stage is inert plumbing — the mode is recorded but not yet acted on;
compute behaviour is unchanged. Verified: engineapi reorg test shows an
identical pass rate with and without this change.

Also corrects the LoadFromBAL docstring: account deletion / incarnation
need not be modelled — BALs exist only for Amsterdam+ blocks and
post-EIP-6780 SELFDESTRUCT cannot delete a pre-existing account at block
scope, so this is not a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Stages 3-5)

The commitment calculator can now fold a block from its BAL ahead of the
per-tx result stream, overlapping the block's execution — the parallel
commitment win (max(exec,trie) instead of exec+trie).

Stage 3 — BAL-driven fold. handleBlockRequest selects BAL-driven mode for
a block carrying a BAL (gated on BAL_DRIVEN_COMMITMENT). maybeFoldAhead
folds it once the fold gate is open: block N-1's committed state must be
in sd.mem, which blockResult(N-1) signals (the batch's first block has
its baseline from the prior cycle). foldBlockFromBAL loads a fresh
calcState from the BAL, computes the root via the shared computeRoot
path, and verifies it against the block header's stateRoot.

Stage 4 — calculator failure stops execution. fail() calls the executor's
CancelFunc so the exec loop's ctx.Done branches fire eagerly instead of
running ahead behind the 2048-deep result buffer; the error is also
published to the stage loop.

Stage 5 — incremental fallback + dual-compute shadow mode. With
BAL_DRIVEN_COMMITMENT off (the default), every block stays incremental
and the consensus path is byte-for-byte unchanged. BAL_SHADOW_COMPUTE
recomputes each BAL-driven block the incremental way at blockResult(N)
and asserts the two roots match before publishing — the consistency net;
divergence fails the block.

blockRequest carries lastTxNum so the calculator can position asOfReader
and ComputeCommitment when folding ahead of blockResult(N).

BAL-driven mode is off by default and stays off until the shadow-compute
check has proven the BAL-driven root matches across a full validation
window. Verified: build + lint clean; make test-all green except the
pre-existing async-commit engineapi flake (tracked separately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mh0lt mh0lt force-pushed the mh/perf-bal-compute-pr-v2 branch from 50e1afd to 71a46a4 Compare May 26, 2026 09:29
@mh0lt mh0lt marked this pull request as ready for review May 26, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant