test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining#23336
Merged
spalladino merged 1 commit intoMay 16, 2026
Conversation
17fdd20 to
ef2e7bc
Compare
ef2e7bc to
26aecd9
Compare
spalladino
approved these changes
May 16, 2026
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR #23253 was dequeued (4th attempt) when
merge-queue-heavycaught ane2e_amm.test.tssetup tx getting dropped by a pipelining-driven chain prune. CI log:baec5a7453c20089.The wait-for-parent gate in
CheckpointProposalJob.waitForValidParentCheckpointOnL1(sequencer-client/src/sequencer/checkpoint_proposal_job.ts:398) should have blocked the discard, but it didn't — because aTestDateProvidertime warp fromAnvilTestWatcher.syncDateProviderToL1IfBehindlanded between the twoepochCachereads inSequencer.work(sequencer.ts:217-218) and broke the pipelining invariant.nowSecondsgetEpochAndSlotInNextL1Slot(slot)17789420791778942080→ slot 18getTargetEpochAndSlotInNextL1Slot(targetSlot)17789420801778942084→ slot 19 →+offset=1→ targetSlot 20Logged confirmation (gap = 2 instead of 1):
With
slotNow = 18, the gate atcheckpoint_proposal_job.ts:402waits onwaitForSyncedL2SlotNumber(slotNow). The archiver had already synced past slot 18 — the wait returns immediately, far too early to see parent ckpt 18 (which lands four seconds later at 14:34:36). The gate then seescheckpointedNumber=17, parentCheckpointNumber=18, declares the parent absent, and discards. Slot 20 expires uncheckpointed, archiver prunes blocks 19/20, the inflight setup tx anchored to block 19 dies withBlock header not found.Full timeline + log evidence: https://gist.github.com/AztecBot/4863d10084dd20587bffcc43fd61dfee
What
Scoped, test-only — per direction from Santiago. The previous "make
checkpointedthe global PXE default" approach is reverted; onlye2e_ammis opted in:The PXE option exists already (
yarn-project/pxe/src/config/index.ts, added in75df5b5d44). This is the same approach every other pipelining-aware test uses (e2e_p2p/*,e2e_epochs/*,e2e_slashing/attested_invalid_proposal). It anchors inflight txs to the L1-confirmed tip so prunes on the proposed tip can't invalidate them.PIPELINING_SETUP_OPTSis left untouched — the pipelining migration ofe2e_ammin #23275 stays.Recommended follow-up (separate PR)
The real bug is the race in
Sequencer.work. Worth fixing properly:EpochCache.getCurrentAndTargetSlotInNextL1Slot()that returns{slot, targetSlot, epoch, targetEpoch, ts, nowSeconds}from a singledateProvider.nowInSeconds()read; replace the two-call site inSequencer.work. Pipelining offset is a constant, so derivingtargetSlot = slot + offsetfrom the same snapshot is trivial.targetSlot - 1.waitForValidParentCheckpointOnL1should key off the parent's expected build slot (targetSlot - 1) instead ofslotNow, so the gate is robust even if the invariant is broken upstream.These aren't in this PR because they touch sequencer production code and want their own review; the test-side workaround unblocks the merge-train without changing the global PXE default.
Test plan
The failure requires
merge-queue-heavy's 10-grind L1 contention to surface reliably (single dev box can't reproduce). Change is a single-arg addition; TS-trivial.Analysis: https://gist.github.com/AztecBot/4863d10084dd20587bffcc43fd61dfee
ClaudeBox log: https://claudebox.work/s/166e664eab264b04?run=3