test: avoid deadlock in cancellation_empty_by#13707
Draft
nomeata wants to merge 15 commits into
Draft
Conversation
Drop the `dbg_trace "sync received"` marker in `t1`'s body (and the two corresponding lines from `.out.expected`). In the first elaboration, `tracerSuggestion ready` and `sync received` are causally ordered: `tracerSuggestion` must resolve the sync promise before t1's `wait_for_sync` can return. But in the second elaboration (after the runner's insert), the sync promise is already resolved, so t1's `wait_for_sync` returns immediately -- and t1's body (running on its async theorem-body task) races against the empty-by snapshot task's tracerSuggestion (running on its own snapshot task). Both write to stderr with no causal dependency, so the order of "tracerSuggestion ready" vs. "sync received" from the second pass is not deterministic under CPU load. Stress test (6 parallel instances on a 96-core box, sustained CPU saturation) reproduced this ordering swap at ~3% rate on the prior version; passes 300/300 with the dropped marker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move `dbg_trace "tracerSuggestion ready"` inside the `if let some prom` block so it only fires on the first invocation. On re-elaboration `mkTestTask` returns `none` (the label is already registered) and there is no point printing the "ready" marker again -- only the synchronization side effect (`resolveSyncPromise`, idempotent) is relevant. This eliminates the racing pair from the second elaboration that the previous "harden against scheduling races" commit worked around by deleting `dbg_trace "sync received"`. With the gate in place, `sync received` is now stable to restore as a marker that `t1`'s body actually ran. Also add an `else` branch on the cancel-token check that prints a diagnostic message: if the test is ever wired up so that `tracerSuggestion` sees no cancel token, the promise would be silently dropped and `wait_for_test_task` would report "task dropped without resolution" with no breadcrumb to the actual cause. Note: the second elaboration's `sync received` now competes with the `cancelTokenSet` print from the first elaboration's `onSet` callback (fired during `cancelRec`). Their order in `.out.expected` reflects the empirical ordering caused by snapshot-task stderr capture: the snapshot task buffers `tracerSuggestion`'s `dbg_trace` via `withIsolatedStreams` and only flushes after the snapshot completes, while the `onSet` callback runs outside that capture and reaches stderr immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI has reported `Timeout 1500s` failures for this test with no captured stderr output at all (not even the `dbg_trace` markers). The failure has not been reproduced locally under heavy contention (LEAN_NUM_THREADS=1, CPU-pinned, 8-way parallel, hundreds of runs), so the root cause is still unknown. Two changes to help triage future failures: 1. Add `source_init "$1"` to `tests/server_interactive/run_test.sh` so individual tests can carry an `.init.sh` (matches the convention in `tests/elab/run_test.sh` and friends). This is infrastructure used by the second change. 2. Add `cancellation_empty_by.lean.init.sh` setting `LEAN_NUM_THREADS=2`. Per the runtime in `src/runtime/object.cpp:wait_for`, `IO.wait` will spawn additional workers when called from a pool thread, so thread starvation isn't strictly possible -- but lower thread count shrinks the surface for unobserved scheduling weirdness. If CI starts passing, the cause is in the parallel scheduling. If CI keeps timing out, the cause is elsewhere. 3. Add two `#eval (IO.eprintln ...)` checkpoint markers in the test file: `test: imports done` after the imports, and `test: empty-by command returned` after the empty-by example. These are synchronous command-level prints, independent of the snapshot task and t1's async body, so they tell us "did the file worker process this command at all?" even if the dynamic parts hang. Document the observed-but-non-causal output ordering in the test docstring (`cancelTokenSet` appears before `tracerSuggestion ready` in the captured log; likely an interaction between buffered stderr writes from the snapshot worker thread and direct writes from the thread calling `cancelTk.set`). The order is stable across 300+ local runs and we encode it as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on_empty_by" This reverts commit 77c5333.
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
This PR adds an option `tactic.try.onlyUserSuggestions` (default `false`). When set, `try?` skips all its built-in suggestion branches (`simple`, `simp`, `grind`, `simp_all`, `fun_induction`/`induction`, and `exact?`) and only runs the tactics produced by user-registered `@[try_suggestion]` generators. The motivation is testing: the cancellation/empty-by snapshot tests only want to exercise the `try?` machinery on a specific tracer suggestion, and don't want to pay the library-search cost. With the option set, the test path is dramatically faster (`3.83s → 1.50s` locally) and uses much less memory, making it practical to stress. The user-tactic dispatch in the new path uses `first | ...`, matching the convention for trying tactics in order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR makes `Lean.Lsp.Ipc.readMessage` throw when the watchdog reports its file worker has crashed (`$/lean/fileProgress` with `fatalError` kind), so that test-runner read loops with no pending request -- in particular `waitForMessage` -- don't sit forever waiting for a message a dead worker will never produce. `collectDiagnostics` and `waitForILeans` already exit on crashes because the watchdog also sends a `responseError` to their pending `waitForDiagnostics` / `waitForILeans` requests. `waitForMessage`, which doesn't issue a request, was silently discarding the `fatalError` notification and blocking until the test framework's outer timeout killed the whole process. With this change, a forced-crash test (`IO.Process.forceExit 9` from inside an `elab` command, then a `--^ waitFor: "..."` directive) returns in ~1 second with `uncaught exception: Lean file worker reported a fatal error (likely crashed)` instead of timing out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nostic) Adds a debug-only env-driven guard in `task_manager::wait_for`: if `LEAN_DEBUG_MAX_WORKERS=N` is set and `m_std_workers.size()` exceeds `N`, the process writes a one-line diagnostic to fd 2 (pid, pool counts) and exits via `_exit(123)`. This was useful for tracking down stress-test memory blowups in the cancellation_empty_by test: the lean task pool ratchets under wait pressure (`m_max_std_workers++` on each in-pool `wait_for`, with `spawn_worker()` if no idle worker is available), each new worker holds its own mimalloc arena, and process RSS climbs accordingly -- which `-M` then trips. The guard makes the symptom reproducible at a specific threshold. Uses raw `write(2, ...)` and `_exit` rather than `std::cerr` / `std::abort` / `print_backtrace`, because the check is inside `m_mutex`; the usual print/abort paths can deadlock or hang flushing buffered IO while other task workers are blocked on the same mutex. Not wired up by default and a no-op without the env var; intentionally not a long-term feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds diagnostic anchors to make it possible to tell, from a failed `.out.produced`, which stage of the test the file worker reached: - `#eval` of `IO.eprintln "test: imports done"` after imports (sync, fires once, deterministically first). - `#eval` markers before and after the empty-`by` example. - `dbg_trace` at the entry of `tracerSuggestion` and right before its return. - `dbg_trace` at the entry of `t1`'s tactic body. Also flips `set_option tactic.try.onlyUserSuggestions true` so the snapshot task's `try?` skips its library-search branches (paired with the corresponding feature commit). Without it the test path spends most of its time in `exact?`/`grind`/`simp_all`, none of which the test exercises. The expected output is updated to match the new traces. The order between async-task and command-level prints is empirical; this is a test of the snapshot/cancellation plumbing, not of trace ordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR makes `Lean.Lsp.Ipc.readMessage` throw when the watchdog reports its file worker has crashed (`$/lean/fileProgress` with `fatalError` kind), so that test-runner read loops with no pending request -- in particular `waitForMessage` -- don't sit forever waiting for a message a dead worker will never produce. `collectDiagnostics` and `waitForILeans` already exit on crashes because the watchdog also sends a `responseError` to their pending `waitForDiagnostics` / `waitForILeans` requests. `waitForMessage`, which doesn't issue a request, was silently discarding the `fatalError` notification and blocking until the test framework's outer timeout killed the whole process. With this change, a forced-crash test (`IO.Process.forceExit 9` from inside an `elab` command, then a `--^ waitFor: "..."` directive) returns in ~1 second with `uncaught exception: Lean file worker reported a fatal error (likely crashed)` instead of timing out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR follows up on the previous IPC fatalError-abort fix to address a false-positive trigger and adds a regression test. The previous version checked `$/lean/fileProgress` with `fatalError` kind inside `Ipc.readMessage`, which fires for *every* IPC consumer. That breaks tests like `importCompletion.lean`, which intentionally elaborates a file with an incomplete `import` line; the worker stays alive but reports `fileProgress fatalError` because header processing failed, and the test then uses the still-alive worker to obtain completion items. The blanket abort prevented the completions from ever being read. Move the check into `waitForMessage` itself, where it actually matters: that loop has no pending request, so the watchdog's `responseError` (which `collectDiagnostics` / `waitForILeans` already handle) doesn't help, and without this check it would block until the outer test timeout. All other IPC paths are unaffected. Also wire up `TEST_EXIT` in `tests/server_interactive/run_test.sh` so individual tests can declare expected exit codes via `<file>.init.sh`, matching the convention in `tests/compile/`. The new `worker_crash` test exercises the abort path: `IO.Process.forceExit` from inside an `elab` command, a `waitFor` directive on the line after, `TEST_EXIT=1`, and an `.out.expected` matching the `uncaught exception: waitForMessage: ...` output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR moves `waitForMessage` out of `Lean.Lsp.Ipc` and into `Lean.Server.Test.Runner`, its only caller. The function exists to implement the `waitFor` test directive and is not a general-purpose IPC primitive (it discards all received messages other than the one it's waiting for, which is an actively unsafe pattern outside of a test driver). Per Sebastian's review of the prior commits, keeping it inside `Ipc` was the wrong layer. No behavior change: the fatalError-abort check added in the previous commit moves along with the function. The new home is right next to `processWaitFor`, which is the sole caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…achim/empty-by-test2
This PR restores the `cancellation_empty_by.lean` and `cancellation_par.lean` server-interactive tests that were deleted in #13711 ("delete flaky tests for now"). With #13710 in place, a hung worker now surfaces as a prompt `waitForMessage` abort rather than a 1500s CTest timeout, so the tests are tractable to keep enabled. Trim diagnostic tracing in `cancellation_empty_by.lean` to the minimum that's actually causally ordered against the test's synchronisation points: * `test: imports done` -- synchronous `#eval` at the top of the file, fires once before any async task is spawned. * `tracerSuggestion ready` -- gated to the first `tracerSuggestion` invocation via `mkTestTask`, fires exactly once. * `cancelTokenSet` -- inside the `cancelTk.onSet` callback, fires exactly once when `cancelRec` reaches the snapshot. * `sync received` (x2) -- in `t1`'s body after `wait_for_sync` returns, once per elaboration. Removed traces that were either every-invocation (and therefore raced non-deterministically with the snapshot task's other output) or that fired at command-elab boundaries where their position relative to async stderr buffers was unstable: `tracerSuggestion: entered`, `tracerSuggestion: returning candidate`, `t1: body entered`, `test: before/after empty-by example`, `test: file end`. Also drop the `LEAN_DEBUG_MAX_WORKERS` runtime hack added during investigation: it served its purpose (confirming the wait-pressure-driven worker pool ratchet in `task_manager::wait_for`) but is not appropriate as a long-lived debug knob in the runtime. Stress: 24/24 parallel runs pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR further attempts to avoid deadlocks in the
cancellation_empty_bytest.