Skip to content

test(e2e): batch 3 of pipelining e2e test migration#23328

Open
spalladino wants to merge 1 commit into
merge-train/spartanfrom
palla/pipelining-batch-3
Open

test(e2e): batch 3 of pipelining e2e test migration#23328
spalladino wants to merge 1 commit into
merge-train/spartanfrom
palla/pipelining-batch-3

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

@spalladino spalladino commented May 16, 2026

Motivation

Continues the pipelining e2e migration started in #23275. Several tests previously skipped or held back on suspected pipelining-code bugs are now ready to re-enable after the relevant fixes landed on merge-train/spartan. This batch also sharpens the remaining TODO(kill-non-pipelined) markers against the source bugs that aren't yet fixed on this branch.

Approach

For each skipped/held-back test, work out whether it can pass purely with test-side adjustments under the current source. Test motivation was preserved throughout — assertions over-pinned to specific slot numbers or committee identity were rewritten to assert the actual invariant. Where a test still fails because of a source-level bug that has not landed on this branch, the opt-in was held back and the TODO sharpened with the concrete symptom and code pointer.

Changes

Now passing under pipelining

  • e2e_block_building "clears up all nullifiers if tx processing fails": un-skipped. Switched Promise.racePromise.any (now that the world-state fork-close fix has landed, the failed tx is correctly dropped from the pool — race was propagating the rejection of the dropped tx before the surviving tx mined). Replaced getBlockData('latest') (a pipelining anti-pattern — empty pipelined checkpoints can interleave) with the receipt's blockNumber.
  • e2e_block_building > reorgs > detects an upcoming reorg: un-skipped. Added explicit cheatCodes.rollup.markAsProven() to drive the proven tip forward (the AnvilTestWatcher auto-prove loop is dormant under interval mining — unrelated to pipelining, just a footgun this test exposes), bounded the open-ended while+sleep into a retryUntil, and set minTxsPerBlock: 1 to keep the sequential-tx block-number assertions tight under pipelining's empty-checkpoint cadence.
  • e2e_multi_validator/e2e_multi_validator_node test 2 "should attest ONLY with the correct validator keys": un-skipped. Rewrote the over-pinned expect.arrayContaining(validators[0..2]) assertion. The original assumption was deterministic committee identity, but the committee is RNG-sampled over the active validator set, and lagInEpochsForValidatorSet=2 means initiated-withdraw validators (3, 4) often still appear in the committee at attestation time. New assertion preserves the real motivation — "validators who initiated withdraw don't attest" — by checking that no signer is in the withdrawn set. Also bumped jest.setTimeout(15 * 60 * 1000) so waitForProven has wall-clock budget under pipelining's 12s slot cadence.

Assertion rewrite only, pipelining opt-in held back on a separate source bug

  • composed/ha/e2e_ha_full "should coordinate governance voting across HA nodes": replaced the strict l1VoteCount === uniqueSlots.size invariant (broken by design — HA signing intentionally suppresses duplicate duty signatures across nodes, and under pipelining a vote signed in build slot N mines in target slot N+1, so the equality never holds) with an outcome assertion: poll until signalCount >= VALIDATOR_COUNT for our payload, then assert payloadWithMostSignals matches, plus unconditional duty checks (no (slot, validator) double-signs, every duty SIGNED). Pipelining opt-in held back behind a sharpened TODO because the pipelined HA + governance path hits a canProposeAtTime / InvalidProposer cascade that exhausts the publisher — the fix (which threads lastArchiveRoot into the canProposeAt simulation plan and overrides the pending-tip slot number so canPruneAtTime can't bypass the pending override) is on a separate sequencer-side branch and has not yet been forward-ported to merge-train/spartan.

Still blocked on source bugs — TODOs sharpened, no opt-in

  • e2e_blacklist_token_contract/* (7 suites: burn, minting, shielding, transfer_private, transfer_public, unshielding, access_control): the huge-warp problem itself is now solvable under pipelining (working recipe: call cheatCodes.rollup.markAsProven() before the warp so L1's canPruneAtTime doesn't wipe to checkpoint 0; use the L1-only cheatCodes.eth.warp({ resetBlockInterval: true }) rather than warpL2TimeAtLeastTo; retry aztecNode.mineBlock up to 3 times to absorb the one pre-warp in-flight publish failure). However, every suite hits a separate blocker: under pipelining with inboxLag=2, the first simulate() after the warp queries getL1ToL2Messages(proposedCheckpoint+1) and throws L1ToL2MessagesNotReadyError: inbox tree in progress is N, messages not yet sealed. This is the same simulator + inboxLag mismatch in AztecNodeService.simulatePublicCalls (see aztec-node/src/server.ts + archiver/.../message_store.ts) that's blocking the simulator-heavy tests being handled separately. Sharpened TODO on all 7 files pointing to the recipe (preserved in working notes for when the simulator bug is fixed).
  • e2e_publisher_funding_multi: tried opting into pipelining. First funding round fires correctly (Funded 2 publishers logged at ~T+120s after both publishers' balances are forced below threshold). Second round (waiting for organic depletion to fall below threshold) never triggerspublisher:manager is silent from T+120s through to teardown at T+360s, even though balances objectively dropped below the threshold. Two independent agents reproduced. Reverted to non-pipelined with a sharpened TODO. Source-level investigation needed in PublisherManager's RunningPromise cycle or in the L1 balance read path (publisher_manager.ts / l1_tx_utils.ts) — out of scope for a tests-only PR.

Out-of-scope source bugs surfaced

  1. Sequencer / canProposeAt simulation under pipelined parent invalidation — blocks e2e_contract_updates private-ctor, composed/web3signer/e2e_multi_validator_node_key_store, and the e2e_ha_full pipelining opt-in above. Fix exists on a sibling branch but is not on merge-train/spartan.
  2. AztecNodeService.simulatePublicCalls queries L1→L2 messages from a checkpoint that hasn't been sealed yet — blocks the 7 blacklist suites in this PR plus the simulator-heavy tests being handled separately.
  3. Publisher funder second-round silence — needs investigation of the RunningPromise cycle in PublisherManager.

Enables proposer pipelining on additional e2e tests where source-side B-bug fixes (B1, B5, B6) are now landed on merge-train/spartan, and sharpens TODOs for tests still blocked on the remaining source bugs (B2, B7, publisher-funder).
@spalladino spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label May 16, 2026
@AztecBot
Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/b2f37d4ad57fb5e6�b2f37d4ad57fb5e68;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_high_tps_block_building.test.ts (294s) (code: 0) group:e2e-p2p-epoch-flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants