Skip to content

feat(ci): Mythos delta-pass auto-runner (single-actor, OAuth-token)#162

Merged
avrabe merged 1 commit into
mainfrom
feat/mythos-auto-gate
May 17, 2026
Merged

feat(ci): Mythos delta-pass auto-runner (single-actor, OAuth-token)#162
avrabe merged 1 commit into
mainfrom
feat/mythos-auto-gate

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 17, 2026

Summary

Automates the Mythos discover protocol that mythos-gate.yml
currently enforces by label only. On every PR that touches a Tier-5
file, anthropics/claude-code-action (SHA-pinned) runs against each
touched file with scripts/mythos/discover.md as the prompt, emits
a structured JSON verdict (NO_FINDINGS or FINDING), and the
aggregate job posts a sticky <!-- mythos-auto-gate --> PR comment

  • applies mythos-pass-done on all-pass.

📝 Opened as draft. Workflow needs the
CLAUDE_CODE_OAUTH_TOKEN repo secret before its first run will
succeed. See "Phase A" below.

Authorization stack — "only avrabe can trigger this"

Layer What it does What it blocks
1 if: github.actor == 'avrabe' && github.actor_id == '10056645' All other actors; immune to username-reassignment because actor_id is permanent
2 Trigger = pull_request (not pull_request_target) Fork PRs don't get secrets per GitHub default policy
3 claude-code-action pinned by commit SHA 51ea8ea7... Tag-hijack of v1 doesn't change what we run
4 Explicit minimal permissions: (PR write, contents read) Token-scope minimization
5 concurrency: cancel-in-progress per PR head Rapid push cycles don't burn budget
6 Detect job path-shape-validates Tier-5 files ${{ matrix.file }} interpolation injection blocked even if a hostile filename slips through

Phase A — your one-time setup

# On your machine:
claude update            # ensure v1.0.44+
claude setup-token       # prints CLAUDE_CODE_OAUTH_TOKEN

Then in browser: Repo Settings → Secrets and variables → Actions → New repository secret

  • Name: CLAUDE_CODE_OAUTH_TOKEN
  • Value: token from above

Once added, mark this PR ready for review and the workflow will fire on the next push.

Files

  • .github/workflows/mythos-auto.yml — workflow (detect → scan matrix → aggregate)
  • AGENTS.md — new "Auto-runner" subsection under Mythos pipeline
  • CHANGELOG.md[Unreleased] / Added entry

How this fits with mythos-gate.yml

mythos-gate.yml (label-only check) stays as source of truth.
The auto-runner is one way the mythos-pass-done label gets
applied — not the only way. Contributors without OAuth access (or
non-avrabe actors) continue to use the documented honor-system flow:
run discover.md in a fresh Claude Code session, post findings/NO
FINDINGS comment, apply label manually.

Test plan

  • Workflow YAML validates (actionlint if available)
  • On a Tier-5-touching PR by avrabe: workflow runs, posts comment, applies/withholds label per verdict
  • On a PR by anyone else: workflow's first job is skipped (job-level if: fails); no token leaked, no comment posted
  • On a PR with no Tier-5 changes: detect job sets any=false, downstream jobs skip cleanly
  • Hostile filename test: a path like meld-core/src/parser.rs;evil doesn't pass the path-shape filter and is logged as a warning

Cost / quota note

Token usage draws from the Max-plan subscription quota, shared with
interactive Claude Code use. A burst of Tier-5 PRs could starve
interactive sessions during the same window. Refresh-token gap
tracked at anthropics/claude-code-action#727.

🤖 Generated with Claude Code

Automates the human-driven discover protocol that mythos-gate.yml
currently enforces by label. On every PR that touches a Tier-5
file, runs anthropics/claude-code-action (SHA-pinned) per touched
file with scripts/mythos/discover.md as the prompt and captures a
structured `{verdict: NO_FINDINGS | FINDING}` JSON via the action's
--json-schema input. Posts a sticky <!-- mythos-auto-gate --> PR
comment with per-file results; applies mythos-pass-done on all-pass,
fails the job (without the label) on any FINDING.

Authorization stack (defense-in-depth, "only avrabe can trigger"):

1. Job-level if: requires both `github.actor == 'avrabe'` AND the
   immutable `github.actor_id == '10056645'`. Usernames can be
   reassigned after account deletion; numeric IDs cannot.
2. Trigger is pull_request (not pull_request_target). GitHub's
   default policy keeps secrets away from fork-repo PRs.
3. claude-code-action pinned by full commit SHA, not the floating
   v1 tag. Hijacking the tag does not change what we run.
4. Explicit minimal permissions: pull-requests write (sticky comment
   + label), contents read.
5. concurrency: cancel-in-progress per PR head — no budget burn on
   rapid push cycles.
6. Detect job path-shape-validates every Tier-5 file
   (^[a-zA-Z0-9/_.-]+$) before piping into the matrix so a hostile
   filename cannot inject through ${{ matrix.file }} downstream;
   matrix.file is read via env: in run blocks, not direct
   interpolation.

Auth flow uses CLAUDE_CODE_OAUTH_TOKEN from avrabe's Max plan; no
separate API billing. Token usage draws from the subscription rate
limit shared with interactive Claude Code use.

Label-only mythos-gate.yml remains source-of-truth — the auto-runner
is one way the label gets applied, not the only way. Contributors
without OAuth access continue using the honor-system flow per
AGENTS.md.

Setup (one-time, on maintainer machine):
  claude update           # ensure v1.0.44+
  claude setup-token      # prints CLAUDE_CODE_OAUTH_TOKEN
Then add the token as repo secret CLAUDE_CODE_OAUTH_TOKEN.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@avrabe avrabe marked this pull request as ready for review May 17, 2026 05:16
@github-actions
Copy link
Copy Markdown

LS-N verification gate

⚠️ 15/19 verified — 4 missing regression tests

count
Passed (≥1 test, all green) 15
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 4

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests
  • LS-CP-4
  • LS-A-8
  • LS-A-9
  • LS-A-19

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 17, 2026

Admin-merge per #139 (smithy capacity)

9 checks green + 2 expected skips (Mythos pass, Aggregate findings + label — correctly skipped because this PR touches no Tier-5 source). The remaining 3 — Clippy, Coverage, fuzz_resolver_terminates — have been queued ~2h40m against the rust-cpu pool, which is 7/7 busy on org-wide work (the documented #139 §4 cross-org contention pattern).

This is the same admin-merge case as PR #161 yesterday. The workflow added here is single-actor-scoped (only avrabe can trigger it), and the new Detect Tier-5 changes job ran green proving the actor gate is wired correctly.

Admin-merge counter for #139 since last reset:

Will track the reset back into #139 after merge.

@avrabe avrabe merged commit aaeb90c into main May 17, 2026
14 checks passed
@avrabe avrabe deleted the feat/mythos-auto-gate branch May 17, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant