Skip to content

docs(gates): pivot Phase 3 from YAML emitter to structural validator#142

Merged
githubrobbi merged 1 commit intomainfrom
docs/gates-manifest-plan-3b-pivot
May 6, 2026
Merged

docs(gates): pivot Phase 3 from YAML emitter to structural validator#142
githubrobbi merged 1 commit intomainfrom
docs/gates-manifest-plan-3b-pivot

Conversation

@githubrobbi
Copy link
Copy Markdown
Collaborator

Documentation-only PR. Updates docs/architecture/gates-manifest-plan.md to reflect a design-time decision made during Phase 3 prep (after PR #141 landed Phase 2). The implementation PR follows once this lands.

Why pivot

The original §4.2 specified that gen-workflow would own the per-gate job blocks in .github/workflows/pr-fast.yml between # >>> generated:gates-manifest <<< markers, with the hand-written classify / required / notify-failure jobs staying outside.

Investigation during Phase 3 prep showed every per-gate job in pr-fast.yml is bespoke:

Job Distinguishing characteristics
gates-drift, file-size bash only, no rust-cache, ~5 lines
hooks-drift cargo run, cache shared with sanity
fmt tiny, no cache, if: rust
sanity free-disk + cache + fetch + check + conditional cargo-vet install AND run
clippy, docs shared-cache from sanity, no free-disk
test-build distinct cache key, free-disk
tests shared-cache from test-build, no free-disk
security distinct cache key, deny + vet
windows-lint windows-latest runner

Eleven distinct shapes for ~thirteen pr-fast-tier gates. Encoding all of that in TOML so the generator can emit it back is a YAML-in-TOML translation problem with no real upside, AND it stakes branch protection on a hand-rolled YAML emitter.

Revised design — structural validator

gen-workflow becomes a read-only validator (--check only, no --write). Five structural properties enforced:

  1. Job presence — every manifest gate with tier="pr-fast" has a job whose key matches the gate id.
  2. if: predicate alignmentgate_when=rust_changedneeds.classify.outputs.rust == 'true', etc.
  3. Aggregator coverage — every gate id is in required.needs:, the declare -A R=(...) aggregator, AND notify-failure.needs:. This is the exact rename-bookkeeping failure mode that motivated the whole plan.
  4. Branch-protection guardrequired.name: is exactly PR Fast CI / required (the literal string in the repo's branch-protection ruleset).
  5. Naming convention — every per-gate job's name: matches the manifest's label.

Per-step contents (run commands, runner selection, cache strategy) are explicitly NOT validated — they're per-job craft. Phase 1's gates-drift covers gate-set mismatch; structural validator covers rename-bookkeeping; per-step correctness stays in code review's hands.

The structural-validator design retains every drift-protection guarantee the original codegen design promised, at the same risk profile as Phase 1's gates-drift — the tool only reads files; it cannot break the workflow.

What this PR contains

  • §4.2 rewritten — pivot rationale + 5 enforced properties + explicit list of what's NOT validated.
  • §4.3 (gen-docs) marked deferred — every existing gate-matrix table in the repo is prose-laden (cells like ✅ if xwin (advisory; W5.6 upgraded from check to clippy)); encoding the prose in TOML is strictly worse than leaving it in markdown.
  • §Phase 3 migration section rewritten — narrowed scope to just the workflow-drift validator, explicitly carved out the deferred sub-phases (3a _lint_fast.sh codegen and 3c gen-docs).
  • §Status table updated — Phase 2 ✅ landed, Phase 3 🟡 plan revision in flight, 3a/3c deferred.
  • §9 action log appended with the pivot entry and the rationale.

What this PR does NOT contain

No code changes. The actual gen-workflow Rust crate, the new workflow-drift manifest gate, the pr-fast.yml::workflow-drift job, and the just workflow-drift recipe all land in the follow-up implementation PR (Phase 3 proper).

Diff size

docs/architecture/gates-manifest-plan.md | 285 ++++++++++++++++++++-----------
1 file changed, 184 insertions(+), 101 deletions(-)

Sequencing

Branched off main post-#141. Implementation PR branches off this once it lands.

Updates `docs/architecture/gates-manifest-plan.md` to reflect a
design-time decision made during Phase 3 prep.  No code changes;
the implementation PR follows once this lands.

## Pivot rationale

The original §4.2 specified that `gen-workflow` would own the
per-gate job blocks in `.github/workflows/pr-fast.yml` between
`# >>> generated:gates-manifest <<<` markers, with the
hand-written `classify` / `required` / `notify-failure` jobs
staying outside.

Investigation during Phase 3 prep (after Phase 2 landed in PR #141)
showed every per-gate job in `pr-fast.yml` is bespoke:

  * gates-drift, file-size:    bash only, no rust-cache, ~5 lines
  * hooks-drift:               cargo run, cache shared with sanity
  * fmt:                       tiny, no cache, `if: rust`
  * sanity:                    free-disk + cache + fetch + check +
                               conditional cargo-vet install AND run
  * clippy, docs:              shared-cache from sanity, no free-disk
  * test-build:                distinct cache key, free-disk
  * tests:                     shared-cache from test-build
  * security:                  distinct cache key, deny + vet
  * windows-lint:              windows-latest runner

That's eleven distinct shapes for ~thirteen pr-fast-tier gates.
Encoding all of that in TOML so the generator can emit it back is
a YAML-in-TOML translation problem with no real upside, AND it
stakes branch protection on a hand-rolled YAML emitter.

## Revised design — structural validator (`--check` only)

`gen-workflow` becomes a read-only validator that enforces five
structural properties:

1. Job presence — every manifest gate with `tier="pr-fast"` has a
   job whose key matches the gate id (or per-tier `consumer_names`
   override).
2. `if:` predicate alignment — `gate_when=rust_changed` →
   `needs.classify.outputs.rust == 'true'`, etc.
3. Aggregator coverage — every gate id is in `required.needs:`,
   the `declare -A R=(...)` aggregator, AND `notify-failure.needs:`.
   This is the exact rename-bookkeeping failure mode that motivated
   the whole plan.
4. Branch-protection guard — `required.name:` is exactly
   `PR Fast CI / required` (the literal string in the repo's
   branch-protection ruleset).
5. Naming convention — every per-gate job's `name:` matches the
   manifest's `label`.

Per-step contents (run commands, runner selection, cache strategy)
are explicitly NOT validated — they're per-job craft.  Phase 1's
`gates-drift` covers gate-set mismatch; structural validator
covers rename-bookkeeping; per-step correctness stays in code
review's hands.

The structural-validator design retains every drift-protection
guarantee the original codegen design promised, while reducing the
blast radius to "same as Phase 1's gates-drift" — the tool only
reads files; it cannot break the workflow.

## What this PR contains

  * §4.2 rewritten — pivot rationale + 5 enforced properties +
    explicit list of what's NOT validated.
  * §4.3 (gen-docs) marked deferred — investigation showed every
    existing gate-matrix table in the repo is prose-laden (cells
    like `✅ if xwin (advisory; W5.6 upgraded from check to
    clippy)`); encoding the prose in TOML is strictly worse than
    leaving it in markdown.
  * §Phase 3 migration section rewritten — narrowed scope to just
    the workflow-drift validator, explicitly carved out the
    deferred sub-phases (3a `_lint_fast.sh` codegen and 3c
    `gen-docs`).
  * §Status table updated — Phase 2 ✅ landed, Phase 3 🟡 plan
    revision in flight, 3a/3c deferred.
  * §9 action log appended with the pivot entry and the
    rationale.

## What this PR does NOT contain

No code changes.  The actual `gen-workflow` Rust crate, the new
`workflow-drift` manifest gate, the `pr-fast.yml::workflow-drift`
job, and the `just workflow-drift` recipe all land in the
follow-up implementation PR (Phase 3 proper).

## Verification

  * `reuse lint`             — exit 0
  * `taplo fmt --check`      — n/a (markdown-only PR)
  * `actionlint`             — n/a (markdown-only PR)
  * `bash -n` syntax checks  — n/a (markdown-only PR)
@githubrobbi githubrobbi enabled auto-merge (squash) May 6, 2026 23:27
@githubrobbi githubrobbi merged commit 69f506d into main May 6, 2026
16 checks passed
@githubrobbi githubrobbi deleted the docs/gates-manifest-plan-3b-pivot branch May 6, 2026 23:29
githubrobbi added a commit that referenced this pull request May 7, 2026
…+ workflow-drift gate (#143)

Phase 3 of `docs/architecture/gates-manifest-plan.md`.  Pivoted from
the originally-planned YAML emitter design (PR #142 landed the plan
revision) to a `--check`-only structural validator that catches
every drift class the codegen design promised at the same risk
profile as Phase 1's `gates-drift` — the tool only reads files; it
cannot break the workflow.

## What lands

NEW workspace member `scripts/ci/gen-workflow/`:

  * `Cargo.toml` — workspace member, no `[lints] workspace = true`
                   (matches the operational-tool convention from
                   `scripts/ci-pipeline` + `scripts/ci/gen-hooks`)
  * `src/main.rs` — CLI (`--check`, `--verbose`, hidden
                    `--manifest` / `--workflow` escape hatches)
  * `src/manifest.rs` — minimal subset of the gate-manifest schema
                        (only the fields gen-workflow needs:
                        `id`, `label`, `tiers`, `gate_when` →
                        `when` via serde rename, `consumer_names`)
  * `src/workflow.rs` — hand-rolled minimal YAML extractor (~120
                        lines) covering the four fields the
                        validator inspects: `jobs:` keys, per-job
                        `name:`, `if:`, `needs:` (all three YAML
                        shapes — single string, flow-style list,
                        block-style list)
  * `src/validate.rs` — four structural property checks per plan
                        §4.2 with detailed drift reports

## Why hand-roll YAML parsing

The pre-existing crate options all carry material costs:

  * `serde_yaml`         — archived 2024, unmaintained
  * `serde_yml`          — archived 2024, active RustSec advisory
                           in `Serializer.emitter`
  * `serde_yaml_ng`      — active fork; depends on unmaintained
                           `unsafe-libyaml` (C)
  * `serde_norway`       — active fork; depends on
                           `unsafe-libyaml-norway` (C)

The validator only needs to extract a handful of specific string
fields from a 600-line tame YAML file (no anchors, no complex flow
style, no embedded JSON).  Pulling in a heavyweight C-backed
dependency, paying the cargo-vet exemption + cargo-deny advisory
tax, and shipping the libyaml binary blob to every CI run is a
superficial workaround for a problem that's solvable with ~120
lines of focused string-matching Rust.

The hand-rolled extractor is fully fail-closed: malformed YAML, a
missing `jobs:` key, a flow list without a closing bracket, or any
construct outside the documented grammar yields a `Result::Err`
with helpful context, and the validator surfaces that as drift
rather than silently mis-classifying.

## Properties enforced (per plan §4.2)

  1. Job presence — every manifest gate with `tier="pr-fast"` has
     a corresponding job in `pr-fast.yml` (resolved via
     `consumer_names["pr-fast"]` if present, else gate id).
     Multiple gates may fold into one job (e.g. `rustdoc` +
     `doc-tests` → `docs`); the validator handles many-to-one.
  2. `if:` predicate alignment — for each pr-fast job, the `if:`
     predicate must accept every change class the folded gates'
     `gate_when` values require.  Implemented as a `PermissiveSet`
     u8 bitset lattice (rust / dep / infra / always) with `union`
     and `contains` operations.  Wider predicates pass (over-runs
     are fine); narrower ones fail (drift would block a gate from
     running on its trigger).
  3. Aggregator coverage — every gate's resolved job-id must
     appear in `required.needs:`, the bash `declare -A R=(...)`
     aggregator inside the `required` job, AND
     `notify-failure.needs:`.  This is the exact rename-
     bookkeeping failure mode (PR #138's windows-check →
     windows-lint rename touching 6 files) that motivated the
     whole gate-manifest plan.
  4. Branch-protection guard — the `required` job's `name:` field
     is exactly `PR Fast CI / required` — the literal string in
     the repo's branch-protection rule.  A future refactor that
     renamed it would silently break merge for every PR; the
     validator now hard-fails before that lands.

Property 5 ("naming convention") from the plan is intentionally
deferred — see plan §4.2 for the rationale (multiple gates fold
into one job, so display names like `Clippy` are not 1:1 derivable
from manifest fields without a schema extension).

## What is explicitly NOT validated

  * Job step contents — the actual `run:` commands, runner
    selection, cache key strategy, conditional steps.  These are
    per-job craft; pinning them would freeze legitimate evolution
    (e.g. adding a `df -h /` debug step during a disk-pressure
    investigation).  Phase 1's `gates-drift` covers gate-set
    mismatch; the structural validator covers rename-bookkeeping;
    per-step correctness stays in code review's hands.
  * YAML formatting — indentation, blank-line placement, comment
    positioning.
  * Job ordering — the file lists jobs in roughly cost order; the
    validator doesn't enforce any ordering invariant.

## Wire-up

NEW manifest gate `workflow-drift`:

  * `tiers = ["pre-push", "pr-fast"]`, `bucket = "bg"`,
    `gate_when = "always"`, `hard = true`, order 27 (next to
    `hooks-drift`'s 26 in Bucket 1).
  * Self-referential — runs `gen-workflow --check` to verify the
    workflow against the manifest including the workflow-drift
    job itself.  Pairs with Phase 1's `gates-drift` (gate-set
    drift) and Phase 2's `hooks-drift` (hook-content drift) to
    cover three orthogonal drift axes.

NEW `pr-fast.yml::workflow-drift` job:

  * Same shape as `hooks-drift` (cache shared with `sanity` so
    the gen-workflow binary build piggybacks on the existing
    rust-cache).
  * Always-on (no classify gating).
  * Added to `required.needs:`, the bash R=() aggregator, AND
    `notify-failure.needs:` (otherwise it would itself fail
    Property 3 on first run — a satisfying recursive consistency
    check).

NEW `just` recipes:

  * `just gen-workflow`     — manual run of the validator
  * `just workflow-drift`   — alias of `gen-workflow --check`,
                              named for parity with `gates-drift`
                              and `hooks-drift`

MODIFIED `scripts/hooks/_lint_pre_push.sh`:

  * Regenerated by `gen-hooks` to pick up the new `workflow-drift`
    gate (Bucket 1, `cargo run -q --release -p uffs-gen-workflow
    -- --check`).

MODIFIED `Cargo.toml`:

  * `scripts/ci/gen-workflow` added as a workspace member.

MODIFIED `docs/architecture/gates-manifest-plan.md`:

  * Status table updated (Phase 3 ✅ landed).
  * §9 action log entry appended.

## Lint policy (audited per the no-suppression-hacks rule)

  * Zero `#[allow(...)]` directives in non-test code.
  * `expect()` / `unwrap()` only in `#[cfg(test)]` blocks (allowed
    by `clippy.toml`'s `allow-*-in-tests = true`).
  * `PermissiveSet` is a `u8` bitset — chosen at design time over
    a `struct { rust: bool, dep: bool, infra: bool, always: bool }`
    shape that would have triggered `clippy::struct_excessive_bools`.
    Root-cause refactor, not per-item suppression.
  * Regex compilation in `aggregator_table_regex` returns
    `Result<Regex>` and propagates via `?` rather than `unwrap`
    or `expect`.
  * `Gate::gate_when` → `Gate::when` rename eliminates
    `clippy::struct_field_names` at the root cause.

## Tests (33 / 33 pass)

  * Manifest module (7) — minimal-subset parsing, `pr_fast_gates`
    filtering, `consumer_names` override resolution, `gate_when`
    ⇄ `when` rename round-trip, unknown-fields-ignored design
    choice, missing-required-field-fails-noisily.
  * Workflow extractor (12) — all three `needs:` shapes,
    quoted/unquoted field values, nested `with:`/`env:`/`run: |`
    blocks ignored cleanly, missing `jobs:` key fails with
    context, empty `jobs:` block fails, malformed flow list fails
    with helpful message, block-list with blank-line terminator,
    `job_key_at` rejects inline values + accepts trailing
    comments, `unquote` handles single/double/bare/whitespace/
    comment cases.
  * Validator (14) — `PermissiveSet` lattice unit tests
    (gate_when→set, if_expr→set, mixed-class union semantics),
    happy-path consistent-fixture assertion, plus seven mutation
    tests (one per property × failure mode):
      - P1 missing job
      - P2 too narrow predicate
      - P2 wider predicates pass (regression-guard)
      - P3 missing-from-required-needs
      - P3 missing-from-aggregator
      - P3 missing-from-notify-needs
      - P4 renamed required job
      - P4 missing required job
    Aggregator-extraction helper tested on a realistic-shape
    table and confirmed to ignore unrelated brackets
    (`${{ matrix.os }}` substitutions, mid-line `[other-bracket]=`
    lines outside the table).

Every fixture is a literal string; no randomness, no time, no
filesystem dependency in any test.

## Verification

  * `cargo test -p uffs-gen-workflow` — 33 / 33 pass
  * `cargo clippy -p uffs-gen-workflow -- -D pedantic -D nursery
    -D cargo -W unwrap_used -W expect_used -W
    missing_docs_in_private_items -D warnings` — exit 0, zero
    per-item suppressions in non-test code
  * `cargo run -q --release -p uffs-gen-workflow -- --check` —
    exit 0
  * `bash scripts/ci/check_gates_drift.sh` — 22 gates matched
  * `cargo run -q --release -p uffs-gen-hooks -- --check` —
    exit 0 (hook regenerated in lockstep)
  * `just lint-pre-push` — full 22-gate sweep green in 53 s warm
  * `actionlint .github/workflows/pr-fast.yml` — exit 0
  * `cargo deny check` — exit 0 (no advisories)

## What this PR does NOT do

  * No mutation of `pr-fast.yml` job step contents — only the
    additive `workflow-drift` job (5 lines of YAML matching the
    `hooks-drift` pattern) plus three additive list entries
    (`required.needs:`, R=() aggregator, `notify-failure.needs:`).
  * No new dependencies — the hand-rolled YAML extractor avoids
    the `serde_yml` advisory + the `unsafe-libyaml` C transitive.
  * No supply-chain churn — zero new cargo-vet exemptions added.

## Sequencing

Branched off `main` post-#142.  No dependency on any other
in-flight PR.  Closes Phase 3 of the gates manifest plan; Phases
3a (`_lint_fast.sh` codegen) and 3c (gen-docs) remain deferred
per plan §Phase 3 scope notes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant