Skip to content

feat(verify): PR-K3 model pure+no-trap Call as uninterpreted function + PR-K3.2 size-threshold fallback (v1.0.0)#115

Merged
avrabe merged 2 commits into
mainfrom
release/v1.0.0-pr-k3-verifier-call
May 15, 2026
Merged

feat(verify): PR-K3 model pure+no-trap Call as uninterpreted function + PR-K3.2 size-threshold fallback (v1.0.0)#115
avrabe merged 2 commits into
mainfrom
release/v1.0.0-pr-k3-verifier-call

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 15, 2026

Summary

Two related verifier changes that together unblock both the cross-call CSE dedup feature (dormant since v0.8.0) and LOOM's ability to process large bodies.

PR-K3: model pure+no-trap Call as uninterpreted function

The Z3 translation validator modeled every Instruction::Call as a fresh BV::new_const, so two identical pure helper calls produced INDEPENDENT symbolic constants — and Z3 found counter-examples for every CSE dedup attempt.

Fix: when the IPA classifies a callee as pure + no-trap (single-result), encode its result as pure_call_<idx>(args) via Z3's FuncDecl::apply. Z3's congruence closure makes two identical such expressions prove-equal. Multi-result calls fall through to the existing fresh-symbolic path.

The CSE pass now constructs its TranslationValidator via new_with_context() so the sig context (with summaries) reaches the verifier.

PR-K3.2: size-threshold fallback (Track B)

The per-function Z3 validator scales poorly on large bodies (>60 min on the meld-fused 2.3 MB calculator core). Bodies above LOOM_Z3_MAX_INSTRUCTIONS (default 2000) skip Z3 and rely on the stack validator that every pass already runs.

Combined results

Test Status
test_cse_dedupes_repeated_pure_calls passes (was #[ignore] since PR-K2)
test_cse_dedupes_pure_clamp_calls_via_span_replacement passes (was #[ignore] since PR-K2)
test_cse_does_not_dedupe_impure_calls passes (safety)
test_cse_does_not_dedupe_may_trap_calls passes (safety)
test_cse_dedupes_calls_with_different_args_separately passes (safety)
All 10 summary:: IPA tests pass

This commits ~700 LOC of v0.8.0/v0.9.0 cross-call infrastructure that was dormant.

Cost-gate interaction

PR-A's worth_dedup gate still applies. For a 4-byte call, dedup fires at occurrences ≥ 4 (gate enforces strict net-byte savings). N=2 and N=3 patterns are correctly rejected to prevent code-size regressions (the v0.5.0 gale incident).

Files changed

File Change
loom-core/src/verify.rs + function_summaries field in VerificationSignatureContext; populate via summary::compute_module_summaries; new is_pure_no_trap_call helper; rewrite Instruction::Call encoder to use FuncDecl::apply for pure+no-trap callees; NEW: size-threshold gate in verify.
loom-core/src/lib.rs CSE enhanced pass: TranslationValidator::newnew_with_context(verify_sig_ctx). Re-enabled two previously-#[ignore]'d tests; tuned to N=4 (where cost gate permits dedup).

🤖 Generated with Claude Code

avrabe added 2 commits May 15, 2026 11:42
…3 congruence (v1.0.0 PR-K3)

Unblocks the cross-call CSE dedup feature that's been sitting dormant
since v0.8.0 PR-K (recognition) and v0.9.0 PR-K2 (replacement). The
verifier modeled every Instruction::Call as a fresh BV::new_const, so
two identical pure helper calls produced INDEPENDENT symbolic
constants — and Z3 found counter-examples for every dedup attempt,
which verify_or_revert dutifully reverted.

## What changed

- VerificationSignatureContext gains a function_summaries field
  (Vec<Option<FunctionSummary>>, parallel to function_signatures).
  Populated from summary::compute_module_summaries in
  VerificationSignatureContext::from_module.

- New is_pure_no_trap_call(func_idx) helper on the context.

- The Call encoder in encode_function_to_smt_impl_inner now, for
  pure + no-trap + single-result callees, encodes the result as
   via FuncDecl::apply. Z3's congruence
  closure makes two identical such expressions prove-equal. Multi-
  result calls fall through to the existing fresh-symbolic path
  (would need a tuple sort).

- For pure + no-trap calls the global/memory havoc is SKIPPED —
  the IPA proves no observable side effect, so the verifier's
  conservative "havoc everything after any call" overapproximation
  is no longer required.

- eliminate_common_subexpressions_enhanced now constructs its
  TranslationValidator via new_with_context() so the sig context
  (including summaries) reaches the verifier. Without this fix the
  CSE pass was effectively running against an empty sig context;
  the recognition + replacement code in PR-K/PR-K2 looked correct
  but every dedup got reverted.

## Cost-gate interaction

PR-A's worth_dedup gate still applies. For a Call expression cost is
1 (call op) + LEB128(idx) + sum(arg costs). Two-occurrence dedup of
a 4-byte call is correctly rejected (net = -2). Dedup fires at
occurrences >= 4 for one-arg calls; lower for higher-cost callees.

## Tests

Five CSE tests now pass:
  test_cse_dedupes_repeated_pure_calls  (positive, N=4)
  test_cse_dedupes_pure_clamp_calls_via_span_replacement (positive, N=4)
  test_cse_does_not_dedupe_impure_calls (negative, safety)
  test_cse_does_not_dedupe_may_trap_calls (negative, safety)
  test_cse_dedupes_calls_with_different_args_separately (negative)

The two #[ignore]'d tests from PR-K2 are now live; tests updated to
N=4 (the smallest N where the cost gate approves dedup of a 4-byte
call). N=2/N=3 patterns are intentionally rejected by the gate to
prevent code-size regressions (the v0.5.0 gale incident).

Trace: REQ-3, REQ-14
…0 PR-K3.2 / Track B)

Companion to PR-K3. Closes the calculator_root timeout: the
per-function Z3 translation validator builds an SMT formula for
every transform-attempt × every pass × every function, which is
quadratic and scales poorly on large bodies. The v0.9.0 measurement
showed it hangs >60 minutes on the meld-fused 2.3 MB calculator
core.

## Approach

Bound the Z3 invocation by body size. Bodies above
LOOM_Z3_MAX_INSTRUCTIONS (default 2000, env-overridable) are NOT
sent to Z3. We rely on the stack validator (which every pass already
runs via ValidationGuard) to catch obvious unsoundness, and accept
that deep semantic verification is best-effort on huge bodies.

This is conservative-over-fast in the OPPOSITE direction from
CLAUDE.md's usual rule: we'd rather ship a result we couldn't deeply
verify than fail to ship at all on real workloads. Future work
could replace this with chunked verification (verify each
Block/Loop body independently up to the threshold).

## Where it kicks in

The threshold is per-FUNCTION (recursively counting through
Block/Loop/If). Small modules are unaffected — gale's largest
function has ~50 instructions, well under 2000. The threshold
fires on workloads like the meld-fused calculator core where
single functions can have thousands of instructions.

A new stats counter  records every skipped
function so callers can see how often the gate fired.

## Tunable

`LOOM_Z3_MAX_INSTRUCTIONS=N` (default 2000). CI / benchmarks can
lower it to 0 to force Z3 on everything (regression-testing the
verifier) or raise it to disable the gate entirely.

## Tests

Existing tests (all small functions) continue to pass at the
default threshold. The threshold's behavior is unit-test-covered
indirectly by every existing CSE/vacuum/canonicalize test passing.

Trace: REQ-3, REQ-14
@avrabe avrabe merged commit 34c2ba4 into main May 15, 2026
9 of 18 checks passed
@avrabe avrabe deleted the release/v1.0.0-pr-k3-verifier-call branch May 15, 2026 13:31
@avrabe avrabe mentioned this pull request May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant