Skip to content

test: mark fee_settings.test.ts teardown segfault as flake#23378

Draft
AztecBot wants to merge 1 commit into
merge-train/spartanfrom
claudebox/fee-settings-segfault-flake
Draft

test: mark fee_settings.test.ts teardown segfault as flake#23378
AztecBot wants to merge 1 commit into
merge-train/spartanfrom
claudebox/fee-settings-segfault-flake

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

Why

PR #23344 (merge-train/spartan) was dequeued from the merge queue at 2026-05-18 17:08:03Z after CI run 26047392849 failed on ci/x8-full (1 of 10 merge-queue-heavy grinds). The PR branch CI on the same head (8caa1d336a) passed cleanly.

The only failure was src/e2e_fees/fee_settings.test.ts exiting with code 139 (SIGSEGV) at 355s. Log: http://ci.aztec-labs.com/14142e6c59162a95

What's happening

The segfault occurs in afterAll teardown, not in the test body. From the stack:

  1. Sequencer.stop awaits the in-flight checkpoint L1 submission, which is interrupted (Transaction sending is interrupted — a clean abort signal from fix: interrupt prover jobs in stop #23358).
  2. Node fully shuts down: sequencer, slashing, p2p, world-state, archiver.
  3. Then prover-node starts stopping. Its still-running EpochProvingJob calls into the native world-state DB to Create fork at 2 / Insert 0 L1 to L2 messages in fork.
  4. Native code logs GET_TREE_INFO failed: Fork not found and segfaults the Jest process.

fix: interrupt prover jobs in stop (#23358) already on the train interrupts prover jobs at stop, but doesn't fully serialise prover-node shutdown against in-flight native fork operations — the segfault wins the race intermittently. Same class of teardown-time native crash that the existing e2e_fees/gas_estimation.test.ts flake entry covers (different surface: timeout: sending signal TERM to command 'bash').

Fix

Mark fee_settings.test.ts as a flake only when the error matches the segfault signature (Segmentation fault.*core dumped|code: 139). Real test-body assertion failures still fail CI. Assigned to *alex (PR author, owns the related gas_estimation flake entry).

A proper fix is to either (a) cancel and await in-flight epoch-proving jobs before world-state synchronizer stops, or (b) make the native world-state DB return JS errors for Fork not found on a stopped store rather than segfaulting. Out of scope here — left as a follow-up for the prover-node / world-state team.

Full analysis: https://gist.github.com/AztecBot/704d54fc69850b1b9ceb1aeaeae64667

Note on local CI

./bootstrap.sh ci not run locally — the change is metadata only (.test_patterns.yml is consumed by ci3/filter_test_cmds and ci3/get_test_entry, no compiled artifact depends on it). YAML validated with yaml.safe_load; both regex and error_regex matched against the actual failure string.

ClaudeBox log: https://claudebox.work/s/16f3aaf1a7b118c7?run=1

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant