fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost by qu0b · Pull Request #239 · ethpandaops/benchmarkoor

qu0b · 2026-05-22T13:42:01Z

Problem

reth-bal-* stateful suites fail every test — reth boots fine but is stuck in permanent SYNCING. After each per-test ZFS rollback, reth comes up one block behind the chain head, so the test's first newPayload references a parent reth never has. With no peers to backfill it stays SYNCING, exhausts the 10 retries, and forkchoiceUpdatedV3 returns SYNCING instead of VALID → every test fails.

Only suites whose pre-run ends on a heavy block hit this. Here the fixture base 24407728 is the funding block (15k withdrawals → trie-heavy); light suites are unaffected.

Root cause (verified against reth `bal-devnet-7` @ `d9d028b` + the run-`9a2b9b5e` prepare logs)

reth keeps recent canonical blocks in its in-memory engine tree and only flushes to MDBX once they are persistence_threshold (default 2) deep — should_persist() fires only when canonical_head − last_persisted > threshold (crates/engine/tree/src/tree/mod.rs:2066). Confirmed from the logs: pre-run reached canonical head 24407728 (0xfad0a7f2…) but the snapshot persisted only 24407727 (0x97f16c5f…).

Two compounding reasons the head never lands on disk:

The 20s settle doesn't help. With threshold=2, should_persist never fires while the canonical-vs-persisted gap is ≤2, so reth has no reason to flush during the settle (a longer settle was already tried — no effect).
The shutdown flush is cut off. The only flush attempt is at graceful shutdown (persist_until_complete → PersistTarget::Head), which recomputes the state root from the DB — heavy for a 15k-withdrawal block — and is bounded by reth's 5s graceful_shutdown_timeout (crates/cli/runner/src/lib.rs:197, not CLI-configurable). On slow ZFS (reth's own log warns about ZFS+MDBX) it doesn't finish; the observed stop took ~5.7s. Light suites flush in time → no gap.

Because reth is forced onto the container-recreate + ZFS strategy (RPC rollback disabled in dc2d25d, RPCRollbackSpec() returns nil), every test rolls back to that block-short snapshot.

Fix — scoped to snapshot creation only

Add a client.Spec.SnapshotPrepareArgs() hook (mirrors the existing RPCRollbackSpec()/DefaultConfigFiles() nil-returning pattern). reth returns --engine.persistence-threshold=0 (+ --engine.memory-block-buffer-target=0); all other clients return nil. In runContainerLifecycle, these args are applied only to the initial container that runs pre-run and is snapshotted, scoped to the container-recreate + zfs case. Per-test recreated containers clone params.ContainerSpec (base command), so they keep reth's default persistence.

Effect: reth persists each block as it becomes canonical during pre-run, so the heavy trie write lands inside the 20s settle window instead of the 5s shutdown box, and the snapshot captures the true head 24407728.

Why scoped, not global

An earlier version put the flag in reth's DefaultCommand (global). That also changes reth's persistence during the measured per-test runs — extra MDBX commits on ZFS that perturb the newPayload/forkchoiceUpdated timings the tool reports, and make reth non-default vs geth/besu. Per-test correctness doesn't need it (each test force-kills + rolls back, discarding persisted state), so the change belongs at snapshot creation only.

Verification

go build/go vet/gofmt clean on pkg/client and pkg/runner.
Root cause confirmed by block-hash tracing in the run-9a2b9b5e prepare logs (canonical 24407728 reached; only 24407727 snapshotted).
reth source semantics confirmed at d9d028b (the exact build in the logs).
--engine.persistence-threshold=0/--engine.memory-block-buffer-target=0 accepted by ethpandaops/reth:bal-devnet-7.

Deploy note

No manual snapshot surgery: the benchmarkoor-ready snapshot is rebuilt by the prepare phase each run, so the first run with the new binary produces a correct snapshot.

Follow-up (not in this PR)

reth's graceful-shutdown flush being capped at a 5s timeout it can't meet for heavy blocks on ZFS is arguably a reth-side robustness gap; worth a separate upstream note.

qu0b-reviewer · 2026-05-22T13:42:45Z

🤖 qu0b-reviewer

Summary

The PR adds a SnapshotPrepareArgs() method to the client Spec interface, implements it on all clients (returning nil except reth which returns persistence-disabling flags), and applies those args only to the initial (pre-run/snapshot) container when using ZFS + container-recreate rollback strategy. The intent is correct but params.ContainerSpec = containerSpec pre-mutates the stored spec, making the fix ineffective.

Issues

🔴 blocker pkg/runner/lifecycle.go:698 — params.ContainerSpec = containerSpec is assigned before containerSpec may be reassigned to &prepareSpec. When SnapshotPrepareArgs() returns non-nil and conditions match, containerSpec is overwritten but params.ContainerSpec still holds the original spec without the persistence args. Per-test containers in strategy_container.go clone params.ContainerSpec, so they incorrectly inherit the persistence args that should only be on the initial container. Fix: move params.ContainerSpec = createSpec to after line 715 (after the createSpec is resolved), or set params.ContainerSpec to the appropriate variant inside the conditional.
🟡 concern pkg/runner/lifecycle.go:711 — append(append([]string{}, cmd...), prepareArgs...) is functionally safe (fresh backing array on first append), but the nested append with a fresh slice is unusual and worth a brief comment explaining the safe copy, since a future refactor that inlines this could reintroduce the aliasing bug. Alternatively, use newCmd := make([]string, len(cmd)+len(prepareArgs)); copy(newCmd, cmd); copy(newCmd[len(cmd):], prepareArgs) for clarity.

Suggestions

pkg/client/reth.go:93 — The comment says "persist every block", which is technically the result of forcing zero thresholds. Consider citing the flags directly (e.g., "set persistence-threshold and memory-block-buffer-target to 0") so a reader doesn't need to cross-reference the flag names.
pkg/client/besu.go:106 / same pattern elsewhere — The nil stub methods across 7 clients all have identical comments ("X needs no snapshot-only args"). This is boilerplate that could be expressed as a single default on the interface or shared helper, though it's low priority.

_{Reviewed @ cf0eaee6}
_{"git push --force is a personality test."}

… isn't lost reth-bal-* stateful suites whose pre-run ends on a heavy block (here the trie-heavy funding block with 15k withdrawals) fail every test: after the per-test ZFS rollback reth comes up one block behind the chain head, so each test's newPayload references a missing parent and reth stays SYNCING forever (retries exhaust; forkchoiceUpdated returns SYNCING, not VALID). Root cause (verified against reth bal-devnet-7 @ d9d028b and the run logs): reth keeps recent canonical blocks in its in-memory engine tree and only flushes to MDBX once they are persistence_threshold (default 2) deep (should_persist, tree/mod.rs). Pre-run reaches canonical head 24407728 but only 24407727 is persisted. The 20s settle does not help — with threshold 2, should_persist never fires while the gap is <=2. The only flush attempt is at graceful shutdown (persist_until_complete), which recomputes the state root and is cut off by reth's 5s graceful-shutdown timeout — too short for the trie-heavy block on slow ZFS (observed 5.7s stop). Light suites flush in time, which is why only funding-block suites break. Fix: add a Spec.SnapshotPrepareArgs() hook and apply reth's --engine.persistence-threshold=0 (+ --engine.memory-block-buffer-target=0) ONLY to the container that builds the ZFS ready snapshot, so reth persists each block as it becomes canonical during pre-run (the heavy trie write lands inside the 20s settle, not the 5s shutdown box). Per-test measurement containers keep reth's default persistence, so newPayload/FCU timings and cross-client comparisons are unaffected. Other client specs return nil. The ready snapshot is rebuilt by the prepare phase each run, so no manual snapshot surgery is needed.

qu0b force-pushed the qu0b/reth-persistence-threshold branch from 949b16e to 0b0ba62 Compare May 22, 2026 14:17

qu0b changed the title ~~fix(client): persist reth blocks immediately so ZFS snapshots capture true head~~ fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost May 22, 2026

qu0b force-pushed the qu0b/reth-persistence-threshold branch from 0b0ba62 to cf0eaee Compare May 22, 2026 14:18

qu0b merged commit dc05722 into master May 22, 2026
6 checks passed

qu0b deleted the qu0b/reth-persistence-threshold branch May 22, 2026 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost#239

fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost#239
qu0b merged 1 commit into
masterfrom
qu0b/reth-persistence-threshold

qu0b commented May 22, 2026 •

edited

Loading

Uh oh!

qu0b-reviewer Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qu0b commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause (verified against reth bal-devnet-7 @ d9d028b + the run-9a2b9b5e prepare logs)

Fix — scoped to snapshot creation only

Why scoped, not global

Verification

Deploy note

Follow-up (not in this PR)

Uh oh!

qu0b-reviewer Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 qu0b-reviewer

Summary

Issues

Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qu0b commented May 22, 2026 •

edited

Loading

Root cause (verified against reth `bal-devnet-7` @ `d9d028b` + the run-`9a2b9b5e` prepare logs)

qu0b-reviewer Bot commented May 22, 2026 •

edited

Loading