Skip to content

fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost#239

Merged
qu0b merged 1 commit into
masterfrom
qu0b/reth-persistence-threshold
May 22, 2026
Merged

fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost#239
qu0b merged 1 commit into
masterfrom
qu0b/reth-persistence-threshold

Conversation

@qu0b
Copy link
Copy Markdown
Member

@qu0b qu0b commented May 22, 2026

Problem

reth-bal-* stateful suites fail every test — reth boots fine but is stuck in permanent SYNCING. After each per-test ZFS rollback, reth comes up one block behind the chain head, so the test's first newPayload references a parent reth never has. With no peers to backfill it stays SYNCING, exhausts the 10 retries, and forkchoiceUpdatedV3 returns SYNCING instead of VALID → every test fails.

Only suites whose pre-run ends on a heavy block hit this. Here the fixture base 24407728 is the funding block (15k withdrawals → trie-heavy); light suites are unaffected.

Root cause (verified against reth bal-devnet-7 @ d9d028b + the run-9a2b9b5e prepare logs)

reth keeps recent canonical blocks in its in-memory engine tree and only flushes to MDBX once they are persistence_threshold (default 2) deep — should_persist() fires only when canonical_head − last_persisted > threshold (crates/engine/tree/src/tree/mod.rs:2066). Confirmed from the logs: pre-run reached canonical head 24407728 (0xfad0a7f2…) but the snapshot persisted only 24407727 (0x97f16c5f…).

Two compounding reasons the head never lands on disk:

  1. The 20s settle doesn't help. With threshold=2, should_persist never fires while the canonical-vs-persisted gap is ≤2, so reth has no reason to flush during the settle (a longer settle was already tried — no effect).
  2. The shutdown flush is cut off. The only flush attempt is at graceful shutdown (persist_until_completePersistTarget::Head), which recomputes the state root from the DB — heavy for a 15k-withdrawal block — and is bounded by reth's 5s graceful_shutdown_timeout (crates/cli/runner/src/lib.rs:197, not CLI-configurable). On slow ZFS (reth's own log warns about ZFS+MDBX) it doesn't finish; the observed stop took ~5.7s. Light suites flush in time → no gap.

Because reth is forced onto the container-recreate + ZFS strategy (RPC rollback disabled in dc2d25d, RPCRollbackSpec() returns nil), every test rolls back to that block-short snapshot.

Fix — scoped to snapshot creation only

Add a client.Spec.SnapshotPrepareArgs() hook (mirrors the existing RPCRollbackSpec()/DefaultConfigFiles() nil-returning pattern). reth returns --engine.persistence-threshold=0 (+ --engine.memory-block-buffer-target=0); all other clients return nil. In runContainerLifecycle, these args are applied only to the initial container that runs pre-run and is snapshotted, scoped to the container-recreate + zfs case. Per-test recreated containers clone params.ContainerSpec (base command), so they keep reth's default persistence.

Effect: reth persists each block as it becomes canonical during pre-run, so the heavy trie write lands inside the 20s settle window instead of the 5s shutdown box, and the snapshot captures the true head 24407728.

Why scoped, not global

An earlier version put the flag in reth's DefaultCommand (global). That also changes reth's persistence during the measured per-test runs — extra MDBX commits on ZFS that perturb the newPayload/forkchoiceUpdated timings the tool reports, and make reth non-default vs geth/besu. Per-test correctness doesn't need it (each test force-kills + rolls back, discarding persisted state), so the change belongs at snapshot creation only.

Verification

  • go build/go vet/gofmt clean on pkg/client and pkg/runner.
  • Root cause confirmed by block-hash tracing in the run-9a2b9b5e prepare logs (canonical 24407728 reached; only 24407727 snapshotted).
  • reth source semantics confirmed at d9d028b (the exact build in the logs).
  • --engine.persistence-threshold=0/--engine.memory-block-buffer-target=0 accepted by ethpandaops/reth:bal-devnet-7.

Deploy note

No manual snapshot surgery: the benchmarkoor-ready snapshot is rebuilt by the prepare phase each run, so the first run with the new binary produces a correct snapshot.

Follow-up (not in this PR)

reth's graceful-shutdown flush being capped at a 5s timeout it can't meet for heavy blocks on ZFS is arguably a reth-side robustness gap; worth a separate upstream note.

@qu0b-reviewer
Copy link
Copy Markdown

qu0b-reviewer Bot commented May 22, 2026

🤖 qu0b-reviewer

Summary

The PR adds a SnapshotPrepareArgs() method to the client Spec interface, implements it on all clients (returning nil except reth which returns persistence-disabling flags), and applies those args only to the initial (pre-run/snapshot) container when using ZFS + container-recreate rollback strategy. The intent is correct but params.ContainerSpec = containerSpec pre-mutates the stored spec, making the fix ineffective.

Issues

  • 🔴 blocker pkg/runner/lifecycle.go:698params.ContainerSpec = containerSpec is assigned before containerSpec may be reassigned to &prepareSpec. When SnapshotPrepareArgs() returns non-nil and conditions match, containerSpec is overwritten but params.ContainerSpec still holds the original spec without the persistence args. Per-test containers in strategy_container.go clone params.ContainerSpec, so they incorrectly inherit the persistence args that should only be on the initial container. Fix: move params.ContainerSpec = createSpec to after line 715 (after the createSpec is resolved), or set params.ContainerSpec to the appropriate variant inside the conditional.

  • 🟡 concern pkg/runner/lifecycle.go:711append(append([]string{}, cmd...), prepareArgs...) is functionally safe (fresh backing array on first append), but the nested append with a fresh slice is unusual and worth a brief comment explaining the safe copy, since a future refactor that inlines this could reintroduce the aliasing bug. Alternatively, use newCmd := make([]string, len(cmd)+len(prepareArgs)); copy(newCmd, cmd); copy(newCmd[len(cmd):], prepareArgs) for clarity.

Suggestions

  • pkg/client/reth.go:93 — The comment says "persist every block", which is technically the result of forcing zero thresholds. Consider citing the flags directly (e.g., "set persistence-threshold and memory-block-buffer-target to 0") so a reader doesn't need to cross-reference the flag names.
  • pkg/client/besu.go:106 / same pattern elsewhere — The nil stub methods across 7 clients all have identical comments ("X needs no snapshot-only args"). This is boilerplate that could be expressed as a single default on the interface or shared helper, though it's low priority.

Reviewed @ cf0eaee6
"git push --force is a personality test."

@qu0b qu0b force-pushed the qu0b/reth-persistence-threshold branch from 949b16e to 0b0ba62 Compare May 22, 2026 14:17
@qu0b qu0b changed the title fix(client): persist reth blocks immediately so ZFS snapshots capture true head fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost May 22, 2026
… isn't lost

reth-bal-* stateful suites whose pre-run ends on a heavy block (here the
trie-heavy funding block with 15k withdrawals) fail every test: after the
per-test ZFS rollback reth comes up one block behind the chain head, so each
test's newPayload references a missing parent and reth stays SYNCING forever
(retries exhaust; forkchoiceUpdated returns SYNCING, not VALID).

Root cause (verified against reth bal-devnet-7 @ d9d028b and the run logs):
reth keeps recent canonical blocks in its in-memory engine tree and only
flushes to MDBX once they are persistence_threshold (default 2) deep
(should_persist, tree/mod.rs). Pre-run reaches canonical head 24407728 but only
24407727 is persisted. The 20s settle does not help — with threshold 2,
should_persist never fires while the gap is <=2. The only flush attempt is at
graceful shutdown (persist_until_complete), which recomputes the state root and
is cut off by reth's 5s graceful-shutdown timeout — too short for the trie-heavy
block on slow ZFS (observed 5.7s stop). Light suites flush in time, which is why
only funding-block suites break.

Fix: add a Spec.SnapshotPrepareArgs() hook and apply reth's
--engine.persistence-threshold=0 (+ --engine.memory-block-buffer-target=0) ONLY
to the container that builds the ZFS ready snapshot, so reth persists each block
as it becomes canonical during pre-run (the heavy trie write lands inside the
20s settle, not the 5s shutdown box). Per-test measurement containers keep
reth's default persistence, so newPayload/FCU timings and cross-client
comparisons are unaffected. Other client specs return nil.

The ready snapshot is rebuilt by the prepare phase each run, so no manual
snapshot surgery is needed.
@qu0b qu0b force-pushed the qu0b/reth-persistence-threshold branch from 0b0ba62 to cf0eaee Compare May 22, 2026 14:18
@qu0b qu0b merged commit dc05722 into master May 22, 2026
6 checks passed
@qu0b qu0b deleted the qu0b/reth-persistence-threshold branch May 22, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant