test: avoid deadlock in cancellation_empty_by by nomeata · Pull Request #13707 · leanprover/lean4

nomeata · 2026-05-11T14:29:00Z

This PR further attempts to avoid deadlocks in the cancellation_empty_by test.

Drop the `dbg_trace "sync received"` marker in `t1`'s body (and the two corresponding lines from `.out.expected`). In the first elaboration, `tracerSuggestion ready` and `sync received` are causally ordered: `tracerSuggestion` must resolve the sync promise before t1's `wait_for_sync` can return. But in the second elaboration (after the runner's insert), the sync promise is already resolved, so t1's `wait_for_sync` returns immediately -- and t1's body (running on its async theorem-body task) races against the empty-by snapshot task's tracerSuggestion (running on its own snapshot task). Both write to stderr with no causal dependency, so the order of "tracerSuggestion ready" vs. "sync received" from the second pass is not deterministic under CPU load. Stress test (6 parallel instances on a 96-core box, sustained CPU saturation) reproduced this ordering swap at ~3% rate on the prior version; passes 300/300 with the dropped marker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move `dbg_trace "tracerSuggestion ready"` inside the `if let some prom` block so it only fires on the first invocation. On re-elaboration `mkTestTask` returns `none` (the label is already registered) and there is no point printing the "ready" marker again -- only the synchronization side effect (`resolveSyncPromise`, idempotent) is relevant. This eliminates the racing pair from the second elaboration that the previous "harden against scheduling races" commit worked around by deleting `dbg_trace "sync received"`. With the gate in place, `sync received` is now stable to restore as a marker that `t1`'s body actually ran. Also add an `else` branch on the cancel-token check that prints a diagnostic message: if the test is ever wired up so that `tracerSuggestion` sees no cancel token, the promise would be silently dropped and `wait_for_test_task` would report "task dropped without resolution" with no breadcrumb to the actual cause. Note: the second elaboration's `sync received` now competes with the `cancelTokenSet` print from the first elaboration's `onSet` callback (fired during `cancelRec`). Their order in `.out.expected` reflects the empirical ordering caused by snapshot-task stderr capture: the snapshot task buffers `tracerSuggestion`'s `dbg_trace` via `withIsolatedStreams` and only flushes after the snapshot completes, while the `onSet` callback runs outside that capture and reaches stderr immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI has reported `Timeout 1500s` failures for this test with no captured stderr output at all (not even the `dbg_trace` markers). The failure has not been reproduced locally under heavy contention (LEAN_NUM_THREADS=1, CPU-pinned, 8-way parallel, hundreds of runs), so the root cause is still unknown. Two changes to help triage future failures: 1. Add `source_init "$1"` to `tests/server_interactive/run_test.sh` so individual tests can carry an `.init.sh` (matches the convention in `tests/elab/run_test.sh` and friends). This is infrastructure used by the second change. 2. Add `cancellation_empty_by.lean.init.sh` setting `LEAN_NUM_THREADS=2`. Per the runtime in `src/runtime/object.cpp:wait_for`, `IO.wait` will spawn additional workers when called from a pool thread, so thread starvation isn't strictly possible -- but lower thread count shrinks the surface for unobserved scheduling weirdness. If CI starts passing, the cause is in the parallel scheduling. If CI keeps timing out, the cause is elsewhere. 3. Add two `#eval (IO.eprintln ...)` checkpoint markers in the test file: `test: imports done` after the imports, and `test: empty-by command returned` after the empty-by example. These are synchronous command-level prints, independent of the snapshot task and t1's async body, so they tell us "did the file worker process this command at all?" even if the dynamic parts hang. Document the observed-but-non-causal output ordering in the test docstring (`cancelTokenSet` appears before `tracerSuggestion ready` in the captured log; likely an interaction between buffered stderr writes from the snapshot worker thread and direct writes from the thread calling `cancelTk.set`). The order is stable across 300+ local runs and we encode it as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on_empty_by" This reverts commit 77c5333.

mathlib-lean-pr-testing · 2026-05-11T15:54:03Z

Mathlib CI status (docs):

❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 48ad8401cd5624c944890893a38057ae4920e8ef --onto a71f158f7bd96ff9ea846f7ff4cc658de3c8b0f9. You can force Mathlib CI using the force-mathlib-ci label. (2026-05-11 15:54:02)
❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase f52a18a9472f40a864a78d3c8f8cf202751f576b --onto 2229b077d6ebd2ff42d665fe1c32501df8915dff. You can force Mathlib CI using the force-mathlib-ci label. (2026-05-11 20:34:12)

leanprover-bot · 2026-05-11T15:54:05Z

Reference manual CI status:

❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 48ad8401cd5624c944890893a38057ae4920e8ef --onto a71f158f7bd96ff9ea846f7ff4cc658de3c8b0f9. You can force reference manual CI using the force-manual-ci label. (2026-05-11 15:54:04)
❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase f52a18a9472f40a864a78d3c8f8cf202751f576b --onto a71f158f7bd96ff9ea846f7ff4cc658de3c8b0f9. You can force reference manual CI using the force-manual-ci label. (2026-05-11 20:34:14)

This PR adds an option `tactic.try.onlyUserSuggestions` (default `false`). When set, `try?` skips all its built-in suggestion branches (`simple`, `simp`, `grind`, `simp_all`, `fun_induction`/`induction`, and `exact?`) and only runs the tactics produced by user-registered `@[try_suggestion]` generators. The motivation is testing: the cancellation/empty-by snapshot tests only want to exercise the `try?` machinery on a specific tracer suggestion, and don't want to pay the library-search cost. With the option set, the test path is dramatically faster (`3.83s → 1.50s` locally) and uses much less memory, making it practical to stress. The user-tactic dispatch in the new path uses `first | ...`, matching the convention for trying tactics in order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR makes `Lean.Lsp.Ipc.readMessage` throw when the watchdog reports its file worker has crashed (`$/lean/fileProgress` with `fatalError` kind), so that test-runner read loops with no pending request -- in particular `waitForMessage` -- don't sit forever waiting for a message a dead worker will never produce. `collectDiagnostics` and `waitForILeans` already exit on crashes because the watchdog also sends a `responseError` to their pending `waitForDiagnostics` / `waitForILeans` requests. `waitForMessage`, which doesn't issue a request, was silently discarding the `fatalError` notification and blocking until the test framework's outer timeout killed the whole process. With this change, a forced-crash test (`IO.Process.forceExit 9` from inside an `elab` command, then a `--^ waitFor: "..."` directive) returns in ~1 second with `uncaught exception: Lean file worker reported a fatal error (likely crashed)` instead of timing out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nostic) Adds a debug-only env-driven guard in `task_manager::wait_for`: if `LEAN_DEBUG_MAX_WORKERS=N` is set and `m_std_workers.size()` exceeds `N`, the process writes a one-line diagnostic to fd 2 (pid, pool counts) and exits via `_exit(123)`. This was useful for tracking down stress-test memory blowups in the cancellation_empty_by test: the lean task pool ratchets under wait pressure (`m_max_std_workers++` on each in-pool `wait_for`, with `spawn_worker()` if no idle worker is available), each new worker holds its own mimalloc arena, and process RSS climbs accordingly -- which `-M` then trips. The guard makes the symptom reproducible at a specific threshold. Uses raw `write(2, ...)` and `_exit` rather than `std::cerr` / `std::abort` / `print_backtrace`, because the check is inside `m_mutex`; the usual print/abort paths can deadlock or hang flushing buffered IO while other task workers are blocked on the same mutex. Not wired up by default and a no-op without the env var; intentionally not a long-term feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds diagnostic anchors to make it possible to tell, from a failed `.out.produced`, which stage of the test the file worker reached: - `#eval` of `IO.eprintln "test: imports done"` after imports (sync, fires once, deterministically first). - `#eval` markers before and after the empty-`by` example. - `dbg_trace` at the entry of `tracerSuggestion` and right before its return. - `dbg_trace` at the entry of `t1`'s tactic body. Also flips `set_option tactic.try.onlyUserSuggestions true` so the snapshot task's `try?` skips its library-search branches (paired with the corresponding feature commit). Without it the test path spends most of its time in `exact?`/`grind`/`simp_all`, none of which the test exercises. The expected output is updated to match the new traces. The order between async-task and command-level prints is empirical; this is a test of the snapshot/cancellation plumbing, not of trace ordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR makes `Lean.Lsp.Ipc.readMessage` throw when the watchdog reports its file worker has crashed (`$/lean/fileProgress` with `fatalError` kind), so that test-runner read loops with no pending request -- in particular `waitForMessage` -- don't sit forever waiting for a message a dead worker will never produce. `collectDiagnostics` and `waitForILeans` already exit on crashes because the watchdog also sends a `responseError` to their pending `waitForDiagnostics` / `waitForILeans` requests. `waitForMessage`, which doesn't issue a request, was silently discarding the `fatalError` notification and blocking until the test framework's outer timeout killed the whole process. With this change, a forced-crash test (`IO.Process.forceExit 9` from inside an `elab` command, then a `--^ waitFor: "..."` directive) returns in ~1 second with `uncaught exception: Lean file worker reported a fatal error (likely crashed)` instead of timing out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR follows up on the previous IPC fatalError-abort fix to address a false-positive trigger and adds a regression test. The previous version checked `$/lean/fileProgress` with `fatalError` kind inside `Ipc.readMessage`, which fires for *every* IPC consumer. That breaks tests like `importCompletion.lean`, which intentionally elaborates a file with an incomplete `import` line; the worker stays alive but reports `fileProgress fatalError` because header processing failed, and the test then uses the still-alive worker to obtain completion items. The blanket abort prevented the completions from ever being read. Move the check into `waitForMessage` itself, where it actually matters: that loop has no pending request, so the watchdog's `responseError` (which `collectDiagnostics` / `waitForILeans` already handle) doesn't help, and without this check it would block until the outer test timeout. All other IPC paths are unaffected. Also wire up `TEST_EXIT` in `tests/server_interactive/run_test.sh` so individual tests can declare expected exit codes via `<file>.init.sh`, matching the convention in `tests/compile/`. The new `worker_crash` test exercises the abort path: `IO.Process.forceExit` from inside an `elab` command, a `waitFor` directive on the line after, `TEST_EXIT=1`, and an `.out.expected` matching the `uncaught exception: waitForMessage: ...` output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR moves `waitForMessage` out of `Lean.Lsp.Ipc` and into `Lean.Server.Test.Runner`, its only caller. The function exists to implement the `waitFor` test directive and is not a general-purpose IPC primitive (it discards all received messages other than the one it's waiting for, which is an actively unsafe pattern outside of a test driver). Per Sebastian's review of the prior commits, keeping it inside `Ipc` was the wrong layer. No behavior change: the fatalError-abort check added in the previous commit moves along with the function. The new home is right next to `processWaitFor`, which is the sole caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…achim/empty-by-test2

This PR restores the `cancellation_empty_by.lean` and `cancellation_par.lean` server-interactive tests that were deleted in #13711 ("delete flaky tests for now"). With #13710 in place, a hung worker now surfaces as a prompt `waitForMessage` abort rather than a 1500s CTest timeout, so the tests are tractable to keep enabled. Trim diagnostic tracing in `cancellation_empty_by.lean` to the minimum that's actually causally ordered against the test's synchronisation points: * `test: imports done` -- synchronous `#eval` at the top of the file, fires once before any async task is spawned. * `tracerSuggestion ready` -- gated to the first `tracerSuggestion` invocation via `mkTestTask`, fires exactly once. * `cancelTokenSet` -- inside the `cancelTk.onSet` callback, fires exactly once when `cancelRec` reaches the snapshot. * `sync received` (x2) -- in `t1`'s body after `wait_for_sync` returns, once per elaboration. Removed traces that were either every-invocation (and therefore raced non-deterministically with the snapshot task's other output) or that fired at command-elab boundaries where their position relative to async stderr buffers was unstable: `tracerSuggestion: entered`, `tracerSuggestion: returning candidate`, `t1: body entered`, `test: before/after empty-by example`, `test: file end`. Also drop the `LEAN_DEBUG_MAX_WORKERS` runtime hack added during investigation: it served its purpose (confirming the wait-pressure-driven worker pool ratchet in `task_manager::wait_for`) but is not appropriate as a long-lived debug knob in the runtime. Stress: 24/24 parallel runs pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nomeata and others added 3 commits May 11, 2026 13:34

nomeata added the merge-ci Enable merge queue CI checks for PR. In particular, produce artifacts for all major platforms. label May 11, 2026

Revert "test: add CI tracing and low-parallelism check for cancellati…

3583a16

…on_empty_by" This reverts commit 77c5333.

github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label May 11, 2026

nomeata and others added 11 commits May 11, 2026 19:13

Merge branch 'joachim/ipc-fatal-error-abort' into joachim/empty-by-test2

d09970a

Merge branch 'joachim/ipc-fatal-error-abort' into joachim/empty-by-test2

2dcb669

Merge branch 'nightly' of https://github.com/leanprover/lean4 into jo…

7cd9ea2

…achim/empty-by-test2

nomeata added release-ci Enable all CI checks for a PR, like is done for releases and removed merge-ci Enable merge queue CI checks for PR. In particular, produce artifacts for all major platforms. labels May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: avoid deadlock in cancellation_empty_by#13707

test: avoid deadlock in cancellation_empty_by#13707
nomeata wants to merge 15 commits into
masterfrom
joachim/empty-by-test2

nomeata commented May 11, 2026

Uh oh!

mathlib-lean-pr-testing Bot commented May 11, 2026 •

edited

Loading

Uh oh!

leanprover-bot commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nomeata commented May 11, 2026

Uh oh!

mathlib-lean-pr-testing Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leanprover-bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mathlib-lean-pr-testing Bot commented May 11, 2026 •

edited

Loading

leanprover-bot commented May 11, 2026 •

edited

Loading