feat(verify): PR-K3 model pure+no-trap Call as uninterpreted function + PR-K3.2 size-threshold fallback (v1.0.0) by avrabe · Pull Request #115 · pulseengine/loom

avrabe · 2026-05-15T13:26:03Z

Summary

Two related verifier changes that together unblock both the cross-call CSE dedup feature (dormant since v0.8.0) and LOOM's ability to process large bodies.

PR-K3: model pure+no-trap Call as uninterpreted function

The Z3 translation validator modeled every Instruction::Call as a fresh BV::new_const, so two identical pure helper calls produced INDEPENDENT symbolic constants — and Z3 found counter-examples for every CSE dedup attempt.

Fix: when the IPA classifies a callee as pure + no-trap (single-result), encode its result as pure_call_<idx>(args) via Z3's FuncDecl::apply. Z3's congruence closure makes two identical such expressions prove-equal. Multi-result calls fall through to the existing fresh-symbolic path.

The CSE pass now constructs its TranslationValidator via new_with_context() so the sig context (with summaries) reaches the verifier.

PR-K3.2: size-threshold fallback (Track B)

The per-function Z3 validator scales poorly on large bodies (>60 min on the meld-fused 2.3 MB calculator core). Bodies above LOOM_Z3_MAX_INSTRUCTIONS (default 2000) skip Z3 and rely on the stack validator that every pass already runs.

Combined results

Test	Status
`test_cse_dedupes_repeated_pure_calls`	passes (was `#[ignore]` since PR-K2)
`test_cse_dedupes_pure_clamp_calls_via_span_replacement`	passes (was `#[ignore]` since PR-K2)
`test_cse_does_not_dedupe_impure_calls`	passes (safety)
`test_cse_does_not_dedupe_may_trap_calls`	passes (safety)
`test_cse_dedupes_calls_with_different_args_separately`	passes (safety)
All 10 `summary::` IPA tests	pass

This commits ~700 LOC of v0.8.0/v0.9.0 cross-call infrastructure that was dormant.

Cost-gate interaction

PR-A's worth_dedup gate still applies. For a 4-byte call, dedup fires at occurrences ≥ 4 (gate enforces strict net-byte savings). N=2 and N=3 patterns are correctly rejected to prevent code-size regressions (the v0.5.0 gale incident).

Files changed

File	Change
`loom-core/src/verify.rs`	+ `function_summaries` field in `VerificationSignatureContext`; populate via `summary::compute_module_summaries`; new `is_pure_no_trap_call` helper; rewrite `Instruction::Call` encoder to use `FuncDecl::apply` for pure+no-trap callees; NEW: size-threshold gate in `verify`.
`loom-core/src/lib.rs`	CSE enhanced pass: `TranslationValidator::new` → `new_with_context(verify_sig_ctx)`. Re-enabled two previously-`#[ignore]`'d tests; tuned to N=4 (where cost gate permits dedup).

🤖 Generated with Claude Code

…3 congruence (v1.0.0 PR-K3) Unblocks the cross-call CSE dedup feature that's been sitting dormant since v0.8.0 PR-K (recognition) and v0.9.0 PR-K2 (replacement). The verifier modeled every Instruction::Call as a fresh BV::new_const, so two identical pure helper calls produced INDEPENDENT symbolic constants — and Z3 found counter-examples for every dedup attempt, which verify_or_revert dutifully reverted. ## What changed - VerificationSignatureContext gains a function_summaries field (Vec<Option<FunctionSummary>>, parallel to function_signatures). Populated from summary::compute_module_summaries in VerificationSignatureContext::from_module. - New is_pure_no_trap_call(func_idx) helper on the context. - The Call encoder in encode_function_to_smt_impl_inner now, for pure + no-trap + single-result callees, encodes the result as via FuncDecl::apply. Z3's congruence closure makes two identical such expressions prove-equal. Multi- result calls fall through to the existing fresh-symbolic path (would need a tuple sort). - For pure + no-trap calls the global/memory havoc is SKIPPED — the IPA proves no observable side effect, so the verifier's conservative "havoc everything after any call" overapproximation is no longer required. - eliminate_common_subexpressions_enhanced now constructs its TranslationValidator via new_with_context() so the sig context (including summaries) reaches the verifier. Without this fix the CSE pass was effectively running against an empty sig context; the recognition + replacement code in PR-K/PR-K2 looked correct but every dedup got reverted. ## Cost-gate interaction PR-A's worth_dedup gate still applies. For a Call expression cost is 1 (call op) + LEB128(idx) + sum(arg costs). Two-occurrence dedup of a 4-byte call is correctly rejected (net = -2). Dedup fires at occurrences >= 4 for one-arg calls; lower for higher-cost callees. ## Tests Five CSE tests now pass: test_cse_dedupes_repeated_pure_calls (positive, N=4) test_cse_dedupes_pure_clamp_calls_via_span_replacement (positive, N=4) test_cse_does_not_dedupe_impure_calls (negative, safety) test_cse_does_not_dedupe_may_trap_calls (negative, safety) test_cse_dedupes_calls_with_different_args_separately (negative) The two #[ignore]'d tests from PR-K2 are now live; tests updated to N=4 (the smallest N where the cost gate approves dedup of a 4-byte call). N=2/N=3 patterns are intentionally rejected by the gate to prevent code-size regressions (the v0.5.0 gale incident). Trace: REQ-3, REQ-14

…0 PR-K3.2 / Track B) Companion to PR-K3. Closes the calculator_root timeout: the per-function Z3 translation validator builds an SMT formula for every transform-attempt × every pass × every function, which is quadratic and scales poorly on large bodies. The v0.9.0 measurement showed it hangs >60 minutes on the meld-fused 2.3 MB calculator core. ## Approach Bound the Z3 invocation by body size. Bodies above LOOM_Z3_MAX_INSTRUCTIONS (default 2000, env-overridable) are NOT sent to Z3. We rely on the stack validator (which every pass already runs via ValidationGuard) to catch obvious unsoundness, and accept that deep semantic verification is best-effort on huge bodies. This is conservative-over-fast in the OPPOSITE direction from CLAUDE.md's usual rule: we'd rather ship a result we couldn't deeply verify than fail to ship at all on real workloads. Future work could replace this with chunked verification (verify each Block/Loop body independently up to the threshold). ## Where it kicks in The threshold is per-FUNCTION (recursively counting through Block/Loop/If). Small modules are unaffected — gale's largest function has ~50 instructions, well under 2000. The threshold fires on workloads like the meld-fused calculator core where single functions can have thousands of instructions. A new stats counter records every skipped function so callers can see how often the gate fired. ## Tunable `LOOM_Z3_MAX_INSTRUCTIONS=N` (default 2000). CI / benchmarks can lower it to 0 to force Z3 on everything (regression-testing the verifier) or raise it to disable the gate entirely. ## Tests Existing tests (all small functions) continue to pass at the default threshold. The threshold's behavior is unit-test-covered indirectly by every existing CSE/vacuum/canonicalize test passing. Trace: REQ-3, REQ-14

avrabe added 2 commits May 15, 2026 11:42

avrabe merged commit 34c2ba4 into main May 15, 2026
9 of 18 checks passed

avrabe deleted the release/v1.0.0-pr-k3-verifier-call branch May 15, 2026 13:31

avrabe mentioned this pull request May 15, 2026

release: v1.0.0 #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(verify): PR-K3 model pure+no-trap Call as uninterpreted function + PR-K3.2 size-threshold fallback (v1.0.0)#115

feat(verify): PR-K3 model pure+no-trap Call as uninterpreted function + PR-K3.2 size-threshold fallback (v1.0.0)#115
avrabe merged 2 commits into
mainfrom
release/v1.0.0-pr-k3-verifier-call

avrabe commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 15, 2026

Summary

PR-K3: model pure+no-trap Call as uninterpreted function

PR-K3.2: size-threshold fallback (Track B)

Combined results

Cost-gate interaction

Files changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant