fix(client): persist reth blocks during ZFS snapshot prep so the head isn't lost#239
Conversation
🤖 qu0b-reviewerSummaryThe PR adds a Issues
Suggestions
Reviewed @ |
949b16e to
0b0ba62
Compare
… isn't lost reth-bal-* stateful suites whose pre-run ends on a heavy block (here the trie-heavy funding block with 15k withdrawals) fail every test: after the per-test ZFS rollback reth comes up one block behind the chain head, so each test's newPayload references a missing parent and reth stays SYNCING forever (retries exhaust; forkchoiceUpdated returns SYNCING, not VALID). Root cause (verified against reth bal-devnet-7 @ d9d028b and the run logs): reth keeps recent canonical blocks in its in-memory engine tree and only flushes to MDBX once they are persistence_threshold (default 2) deep (should_persist, tree/mod.rs). Pre-run reaches canonical head 24407728 but only 24407727 is persisted. The 20s settle does not help — with threshold 2, should_persist never fires while the gap is <=2. The only flush attempt is at graceful shutdown (persist_until_complete), which recomputes the state root and is cut off by reth's 5s graceful-shutdown timeout — too short for the trie-heavy block on slow ZFS (observed 5.7s stop). Light suites flush in time, which is why only funding-block suites break. Fix: add a Spec.SnapshotPrepareArgs() hook and apply reth's --engine.persistence-threshold=0 (+ --engine.memory-block-buffer-target=0) ONLY to the container that builds the ZFS ready snapshot, so reth persists each block as it becomes canonical during pre-run (the heavy trie write lands inside the 20s settle, not the 5s shutdown box). Per-test measurement containers keep reth's default persistence, so newPayload/FCU timings and cross-client comparisons are unaffected. Other client specs return nil. The ready snapshot is rebuilt by the prepare phase each run, so no manual snapshot surgery is needed.
0b0ba62 to
cf0eaee
Compare
Problem
reth-bal-*stateful suites fail every test — reth boots fine but is stuck in permanentSYNCING. After each per-test ZFS rollback, reth comes up one block behind the chain head, so the test's firstnewPayloadreferences a parent reth never has. With no peers to backfill it staysSYNCING, exhausts the 10 retries, andforkchoiceUpdatedV3returnsSYNCINGinstead ofVALID→ every test fails.Only suites whose pre-run ends on a heavy block hit this. Here the fixture base
24407728is the funding block (15k withdrawals → trie-heavy); light suites are unaffected.Root cause (verified against reth
bal-devnet-7@d9d028b+ the run-9a2b9b5eprepare logs)reth keeps recent canonical blocks in its in-memory engine tree and only flushes to MDBX once they are
persistence_threshold(default 2) deep —should_persist()fires only whencanonical_head − last_persisted > threshold(crates/engine/tree/src/tree/mod.rs:2066). Confirmed from the logs: pre-run reached canonical head 24407728 (0xfad0a7f2…) but the snapshot persisted only 24407727 (0x97f16c5f…).Two compounding reasons the head never lands on disk:
threshold=2,should_persistnever fires while the canonical-vs-persisted gap is ≤2, so reth has no reason to flush during the settle (a longer settle was already tried — no effect).persist_until_complete→PersistTarget::Head), which recomputes the state root from the DB — heavy for a 15k-withdrawal block — and is bounded by reth's 5sgraceful_shutdown_timeout(crates/cli/runner/src/lib.rs:197, not CLI-configurable). On slow ZFS (reth's own log warns about ZFS+MDBX) it doesn't finish; the observed stop took ~5.7s. Light suites flush in time → no gap.Because reth is forced onto the container-recreate + ZFS strategy (RPC rollback disabled in
dc2d25d,RPCRollbackSpec()returns nil), every test rolls back to that block-short snapshot.Fix — scoped to snapshot creation only
Add a
client.Spec.SnapshotPrepareArgs()hook (mirrors the existingRPCRollbackSpec()/DefaultConfigFiles()nil-returning pattern). reth returns--engine.persistence-threshold=0(+--engine.memory-block-buffer-target=0); all other clients returnnil. InrunContainerLifecycle, these args are applied only to the initial container that runs pre-run and is snapshotted, scoped to thecontainer-recreate+zfscase. Per-test recreated containers cloneparams.ContainerSpec(base command), so they keep reth's default persistence.Effect: reth persists each block as it becomes canonical during pre-run, so the heavy trie write lands inside the 20s settle window instead of the 5s shutdown box, and the snapshot captures the true head 24407728.
Why scoped, not global
An earlier version put the flag in reth's
DefaultCommand(global). That also changes reth's persistence during the measured per-test runs — extra MDBX commits on ZFS that perturb thenewPayload/forkchoiceUpdatedtimings the tool reports, and make reth non-default vs geth/besu. Per-test correctness doesn't need it (each test force-kills + rolls back, discarding persisted state), so the change belongs at snapshot creation only.Verification
go build/go vet/gofmtclean onpkg/clientandpkg/runner.9a2b9b5eprepare logs (canonical24407728reached; only24407727snapshotted).d9d028b(the exact build in the logs).--engine.persistence-threshold=0/--engine.memory-block-buffer-target=0accepted byethpandaops/reth:bal-devnet-7.Deploy note
No manual snapshot surgery: the
benchmarkoor-readysnapshot is rebuilt by the prepare phase each run, so the first run with the new binary produces a correct snapshot.Follow-up (not in this PR)
reth's graceful-shutdown flush being capped at a 5s timeout it can't meet for heavy blocks on ZFS is arguably a reth-side robustness gap; worth a separate upstream note.