Skip to content

feat(ci): LS-N verification gate (spar-pattern port)#161

Merged
avrabe merged 3 commits into
mainfrom
feat/ls-verification-gate
May 17, 2026
Merged

feat(ci): LS-N verification gate (spar-pattern port)#161
avrabe merged 3 commits into
mainfrom
feat/ls-verification-gate

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 16, 2026

Summary

Adapts spar's rivet-driven verification gate
(pulseengine/spar@ba329f3d)
to meld's STPA loss-scenario artifacts.

PR-time gate that enforces meld's test-naming contract: every
status: approved entry in safety/stpa/loss-scenarios.yaml must
have at least one #[test] fn ls_<letter>_<num>_* in meld-core
(e.g. LS-A-11ls_a_11_*). Posts a single sticky PR comment
with passed / failed / missing counts.

Bucket semantics

Bucket Meaning Gate behaviour
Passed ≥1 matching test, all green ✅ verified
Failed ≥1 matching test failed block merge
Missing zero ls_<>_<n>_* tests ⚠️ advisory only

Missing is advisory (warning, not block) so older approved entries
with ad-hoc test names can be migrated incrementally rather than
blocking every PR.

Gate state after this PR

19 approved LS entries, 15 passed / 0 failed / 4 missing.

The 5 newly-passing entries got thin convention aliases (last
commit) so their pre-existing regression tests are discoverable:

LS Original test Convention alias
LS-P-4 test_canonical_abi_size_fixed_size_list_saturates_on_overflow ls_p_4_canonical_abi_size_saturates_on_overflow
LS-P-5 test_parser_rejects_truncated_module_section_issue_118 ls_p_5_parser_rejects_truncated_module_section
LS-R-10 test_issue112_item5_intra_adapter_preserves_from_import_module ls_r_10_intra_adapter_preserves_from_import_module
LS-CP-3 test_issue112_item4_sort_adapter_sites_is_canonical ls_cp_3_sort_adapter_sites_is_canonical (adapter-sites half only)
LS-A-10 cabi_alignment_stackful_retptr_writes_i64_at_offset_8 ls_a_10_cabi_align_retptr_writeback

The 4 still-missing genuinely lack regression tests and will
be addressed in follow-up PRs (one per subsystem):

  • LS-CP-4 — DWARF passthrough emits address-incorrect debug info
  • LS-A-8 — Inner-list rep_func selected by HashMap iteration order
  • LS-A-9 — Async callback POLL falls through to YIELD path
  • LS-A-19 — Resource import dedup uses ends_with() suffix match
  • (also: LS-CP-3 caller_encoding_fallback half — same family)

Files

  • tools/run_ls_verification.py — runner (stdlib + PyYAML); local-runnable
  • tools/post_verification_comment.py — sticky comment upsert (pure stdlib urllib)
  • .github/workflows/verification-gate.yml — workflow (PR + workflow_dispatch)
  • meld-core/src/{parser,resolver,adapter/fact}.rs — 5 convention aliases
  • AGENTS.md — new "LS-N verification gate" section under Mythos pipeline
  • CHANGELOG.md — Unreleased / Added entry
  • .gitignore — ignore local verification-results.json

Local run

python3 tools/run_ls_verification.py

Test plan

  • CI green (Format, Test, Clippy, Coverage, Bench, Fuzz Smoke, Mythos gate)
  • New LS-N verification gate runs and posts sticky comment showing 15 passed / 4 missing / 0 failed
  • No security-injection risk (workflow inputs are integer/metadata only)

🤖 Generated with Claude Code

avrabe and others added 2 commits May 16, 2026 18:51
PR-time gate that enforces meld's STPA test-naming contract: every
`status: approved` entry in `safety/stpa/loss-scenarios.yaml` must
have at least one `#[test] fn ls_<letter>_<num>_*` regression test
in `meld-core` (e.g. LS-A-11 -> `ls_a_11_*`).

Adapted from spar's rivet-driven verification gate
(pulseengine/spar@ba329f3d). meld has no rivet-style executable
artifact, but loss-scenarios pair with regression tests by the
established naming convention; this gate makes that pairing a
verifiable contract.

Three files:

- tools/run_ls_verification.py — Python (stdlib + PyYAML). Iterates
  approved LS IDs, runs `cargo test --lib --no-fail-fast <prefix>`
  per ID, buckets results as passed / failed / missing, writes
  verification-results.json.
- tools/post_verification_comment.py — Marker-tagged sticky PR
  comment upsert via GitHub REST API. Pure stdlib (urllib). First
  run creates the comment, subsequent runs PATCH the body. Marker:
  `<!-- meld-ls-verification-gate -->`.
- .github/workflows/verification-gate.yml — PR + workflow_dispatch
  trigger. Fail-on-failure but advisory-on-missing so the 10 older
  approved entries with ad-hoc test names (e.g. PR #114's
  `test_canonical_abi_size_fixed_size_list_saturates_on_overflow`
  for LS-P-4) can be migrated incrementally rather than blocking
  every PR.

Smoke-tested locally against current main: 19 approved LS, 10
passed (LS-A-7/11/15/17/18/20/12/13/14/16), 9 missing (the older
v0.7.0-era and PR-#114-era entries). No failures.

Same script runs locally:

    python3 tools/run_ls_verification.py

Inputs are integer/metadata only (PR number via env, head_ref in
concurrency); no untrusted free-form text from PR titles/bodies/
comments is read in run: blocks.

AGENTS.md gains a "LS-N verification gate" section under "Mythos
Bug-Hunt Pipeline".

Refs: pulseengine/spar@ba329f3d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-hosted runners (Debian/Ubuntu Python 3.12) enforce PEP 668 and
reject `pip install --user pyyaml` with "externally-managed-environment".
`--break-system-packages` is the documented PEP 668 opt-out for CI
environments where the runner's Python install is disposable per
workflow run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

LS-N verification gate

⚠️ 15/19 verified — 4 missing regression tests

count
Passed (≥1 test, all green) 15
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 4

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests
  • LS-CP-4
  • LS-A-8
  • LS-A-9
  • LS-A-19

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

The LS-N verification gate (this PR) discovered 9 approved
loss-scenarios without a matching `ls_<letter>_<num>_*` regression
test. Five of those already had regression tests pinning the fix
under historical names; this commit adds thin convention aliases so
the gate's discovery query finds them.

The original tests stay in place (single source of truth, preserves
git blame / grep continuity); each alias is a `#[test] fn` that
delegates to the original test body.

| LS  | Original test | Alias |
|-----|---------------|-------|
| LS-P-4  | test_canonical_abi_size_fixed_size_list_saturates_on_overflow | ls_p_4_canonical_abi_size_saturates_on_overflow |
| LS-P-5  | test_parser_rejects_truncated_module_section_issue_118        | ls_p_5_parser_rejects_truncated_module_section |
| LS-R-10 | test_issue112_item5_intra_adapter_preserves_from_import_module | ls_r_10_intra_adapter_preserves_from_import_module |
| LS-CP-3 | test_issue112_item4_sort_adapter_sites_is_canonical           | ls_cp_3_sort_adapter_sites_is_canonical |
| LS-A-10 | cabi_alignment_stackful_retptr_writes_i64_at_offset_8         | ls_a_10_cabi_align_retptr_writeback |

Gate result drops from 10 passed / 9 missing to 15 passed / 4 missing.

The remaining four (LS-CP-4, LS-A-8, LS-A-9, LS-A-19) genuinely
lack regression tests and land in follow-up PRs:
- LS-CP-4: DWARF passthrough emits address-incorrect debug info
- LS-A-8 : Inner-list rep_func selected by HashMap iteration order
- LS-A-9 : Async callback POLL falls through to YIELD path
- LS-A-19: Resource import dedup uses ends_with() suffix match

The LS-CP-3 alias only covers the adapter_sites-order half of the
scenario; the caller_encoding_fallback half also still needs a
dedicated regression test (tracked alongside LS-A-8/9/19/CP-4).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

Mythos delta-pass required

This PR modifies one or more Tier-5 source files (per
scripts/mythos/rank.md):

meld-core/src/adapter/fact.rs
meld-core/src/parser.rs
meld-core/src/resolver.rs

Before merge, run the Mythos discover protocol on the
modified Tier-5 files:

  1. Follow scripts/mythos/discover.md
    — one fresh agent session per touched Tier-5 file.
  2. For each finding, the agent must produce both a Kani
    harness and a failing PoC test (per the protocol's
    "if you cannot produce both, do not report" rule).
  3. Attach a comment on this PR with either the findings
    (formatted per discover.md's output schema) or
    NO FINDINGS.
  4. Add the mythos-pass-done label to this PR.

Why this gate exists: LS-A-10
(CABI alignment padding in async-lift retptr writeback) was
found by the v0.8.0 pre-release Mythos pass — but it had
lived in the callback emitter since #128, across six
releases. A PR-time gate would have caught it at review
time instead of at the release boundary.

The gate check on this PR will pass once the label is
applied.

@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 16, 2026

Mythos delta-pass: NO FINDINGS

The latest commit (7fd3ed0) touches Tier-5 files (parser.rs,
resolver.rs, adapter/fact.rs) but the change is test-aliases
only
— five new #[test] fn ls_<>_NN_* functions that delegate
to pre-existing regression tests:

#[test]
fn ls_p_4_canonical_abi_size_saturates_on_overflow() {
    test_canonical_abi_size_fixed_size_list_saturates_on_overflow();
}

No production code path is modified. No new logic to scan. The
Mythos discover protocol applies to fusion-correctness code that
can carry silent wrong-by-construction bugs; a test function that
calls another test function has no surface for that bug class.

Adding mythos-pass-done to clear the gate.

@avrabe avrabe added the mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR label May 16, 2026
@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 17, 2026

Admin-merge per #139 (smithy capacity)

8 of 11 checks green; the 3 remaining Fuzz Smoke jobs have been
queued without a runner for ~70 minutes against the
[self-hosted, linux, x64, rust-cpu] pool, which is currently 7/7
busy on org-wide work (cross-repo contention, the pattern documented
in #139 §4).

This is the documented #139
admin-merge case:

Until then, releases are explicitly authorized to merge with --admin
for known-infra failures, documented in the release PR body.

Same handling as PR #159 earlier today (cap-starved fuzz queue, 50+
min unpicked; in that case capacity returned naturally before merge —
this case did not). PR #161's prior CI cycle on SHA de03dab ran
11/11 green including all four fuzz smoke targets, so this isn't a
real CI failure being papered over — it's purely smithy fleet
availability.

Admin-merge counter for #139:

Tracking the reset back into the issue separately.

@avrabe avrabe merged commit 2841325 into main May 17, 2026
11 of 12 checks passed
@avrabe avrabe deleted the feat/ls-verification-gate branch May 17, 2026 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant