diff --git a/docs/designs/PLAN_ROLLOUT.md b/docs/designs/PLAN_ROLLOUT.md new file mode 100644 index 0000000000..12593a97b6 --- /dev/null +++ b/docs/designs/PLAN_ROLLOUT.md @@ -0,0 +1,162 @@ +# Design: `/plan-rollout` + `/spill-check` — PR-stack-aware planning + +**Status:** PROPOSAL — seeking directional feedback from maintainer before implementation. +**Author:** @mastermanas805 +**Tracking PR:** [#1192](https://github.com/garrytan/gstack/pull/1192) +**Supporting materials:** [`docs/designs/plan-rollout/`](./plan-rollout/) + +--- + +## The problem + +LLM-assisted coding compresses implementation by 10-100x. It does not compress +**review**. A reviewer still reads code at human speed. This asymmetry shows up +as a specific, common failure mode: + +- AI produces a 2,000-line diff that touches 15 files across 4 components +- The reviewer can't meaningfully hold the change in their head +- Scope creep ("spills" — unrelated changes sneaking in) compound the load +- LGTM happens under cognitive pressure; real bugs ship +- Rollback is improvised when something breaks in production + +None of gstack's existing skills address this. `/plan-eng-review` and +`/plan-ceo-review` scope the plan. `/ship` creates one PR. `/review` reviews +one diff. Nothing asks **"is this one PR or three?"** or **"in what order +should these units ship to production?"** + +## The proposal + +Two new skills plus a declarative schema: + +### `SYSTEM.md` — the semantic contract graph + +A repo-root declarative file that declares each component's role, what it +owns, and the role-level contracts between components. It is the **human** +half of architectural truth — things only a human knows. + +The **package/import graph** is the machine half — discovered by the LLM at +runtime via AST, grep, and package manifests. Never declared, never cached. + +The two graphs are reconciled jointly: declared contracts give the *why*, +discovered imports give the *what*. Disagreements (import without contract, +contract without imports, rollout-order inversion) surface for human +resolution. + +Schema: see [`plan-rollout/system-md.schema.md`](./plan-rollout/system-md.schema.md). + +### `/plan-rollout` — decomposition + rollout planning + +Runs after plan approval (`/plan-eng-review` or equivalent). Produces two +artifacts: + +- `decomposition.md` — the PR stack: units with declared files, dependencies, + reviewer reading-order, time-budget estimates, ASCII stack-map +- `rollout.md` — rollout strategy (flag / canary / migration-first / big-bang), + step sequence with inverse rollback auto-generated, verify metrics per step + +Reads SYSTEM.md + the discovered package graph + the plan. Applies +decomposition heuristics (component boundary, interface-first, migration-first, +reviewer-budget cap, etc.). Ends with a confirmed decomposition written to +`~/.gstack/projects/$SLUG/`. + +Draft: see [`plan-rollout/plan-rollout.skill-draft.md`](./plan-rollout/plan-rollout.skill-draft.md). + +### `/spill-check` — mid-implementation scope enforcement + +Compares the current diff against the declared PR unit in decomposition.md. +Flags undeclared files as spills. Adaptive: strict for code, soft for +infra/meta files (CLAUDE.md, package.json, bun.lock, CI config). Can carve +spills into a separate branch on demand. + +Draft: see [`plan-rollout/spill-check.skill-draft.md`](./plan-rollout/spill-check.skill-draft.md). + +## Integration with existing gstack skills + +Zero regression gated on `decomposition.md` existence — every modification +below is a no-op when no decomposition artifact is present: + +- `/ship` gains **stack mode**: reads decomposition.md, runs /spill-check as + pre-gate, auto-titles the PR, auto-generates the PR body with reader-guide + block + reviewer time budget + dependency narration +- `/review` gains **scope verification**: flags diff files that aren't in the + declared PR unit +- `/plan-ceo-review` and `/plan-eng-review` gain `/plan-rollout` in their + Next Steps review chain + +Details: see [`plan-rollout/integration-notes.md`](./plan-rollout/integration-notes.md). + +## Dogfood evidence + +The design was stress-tested end-to-end by simulating the workflow against +[honojs/hono issue #4633](https://github.com/honojs/hono/issues/4633) (405 +Method Not Allowed). Results: + +- SYSTEM.md authored for Hono's 8 real components + 12 role-level contracts +- `/plan-rollout` decomposed the issue into a 3-PR stack with graceful + dependency relaxation (PR-3 can merge without PR-2 via feature detection) +- PR-1 implemented: 171 LOC, 3 files, 86/86 tests passing, zero regressions + across the 4 router implementations not modified by PR-1 + +The dogfood surfaced 8 concrete design gaps, all folded into the v1 scope. +Highlights: + +- **`kind: component | leaf-util | types-only`** field needed — shared utility + dirs don't fit the schema cleanly +- **`package-type: library | service | cli`** field needed — rollout.md + template is service-shaped and doesn't fit library changes +- **Reviewer-time formula** needs recalibration; ship v1 with conservative + defaults and log predicted-vs-actual from day one +- **Shared-test-fixture heuristic** missing from the decomposition step — PR + units extending shared interfaces need explicit fixture ownership + +Full findings in the [CEO plan](./plan-rollout/ceo-plan.md) under "Dogfood +findings". + +## Proposed contribution path + +The contribution is a **4-PR stack** — deliberately chosen so the skill +decomposes its own shipping: + +| # | Title | Reviewer est. | Scope | +|---|-------|---------------|-------| +| 1 | Foundation: SYSTEM.md parser + schema docs | ~15 min | `lib/plan-rollout/system-map-*.ts`, `test/plan-rollout/`, `docs/SYSTEM-MD.md` — standalone, no skills modified | +| 2 | `/plan-rollout` skill | ~25 min | `plan-rollout/SKILL.md` + remaining lib helpers (decomposition writer, rollout writer, reviewer-time estimator, inverse-rollback generator) | +| 3 | `/spill-check` skill | ~15 min | `spill-check/SKILL.md` + spill classifier. Independent of PR #2 | +| 4 | Integration | ~20 min | Modify `ship/SKILL.md`, `review/SKILL.md`, `plan-ceo-review/SKILL.md`, `plan-eng-review/SKILL.md`. Hot-path risk; covered by existing golden-fixture tests | + +Total: ~75 min of cumulative review time. PR #1 is low-risk standalone and +should be landed first to establish the schema before the rest. + +## What I'd like from you + +1. **Directional signal.** Is this the right shape of skill for gstack? Should + it land in-tree, or is this better as a separate plugin? +2. **Scope pushback.** Which of the 7 accepted expansions (see ceo-plan.md) + should move to v2? Which v2 candidates should be v1? +3. **Naming.** `/plan-rollout` pairs nicely with `/plan-*` series. `/spill-check` + is more utilitarian. Open to alternatives. +4. **Convention checks.** Especially the location of artifacts + (`~/.gstack/projects/$SLUG/` vs `.gstack/` in repo) and the SYSTEM.md + schema format. + +If directionally approved, I'll open the 4 PRs in the order above. If +rejected or redirected, that saves us both implementation time. + +## Why file an issue instead of opening the full PR stack? + +Two reasons: (1) 278 open PRs upstream suggests the review queue is deep; a +design-first check prevents sinking 4 PRs into that queue that might not fit; +(2) the gstack CONTRIBUTING guide's contributor workflow explicitly +recommends "fix gstack while doing your real work" — this is a larger +in-flight design that benefits from maintainer signal before the code lands. + +## References + +- [CEO plan (full spec)](./plan-rollout/ceo-plan.md) +- [SYSTEM.md schema](./plan-rollout/system-md.schema.md) +- [Usage documentation](./plan-rollout/usage.md) +- [/plan-rollout SKILL.md draft](./plan-rollout/plan-rollout.skill-draft.md) +- [/spill-check SKILL.md draft](./plan-rollout/spill-check.skill-draft.md) +- [Integration notes](./plan-rollout/integration-notes.md) +- [system-map-parser.ts (foundation code)](./plan-rollout/system-map-parser.ts) +- [Dogfood run: honojs/hono #4633](https://github.com/honojs/hono/issues/4633) diff --git a/docs/designs/plan-rollout/ceo-plan.md b/docs/designs/plan-rollout/ceo-plan.md new file mode 100644 index 0000000000..43517d028b --- /dev/null +++ b/docs/designs/plan-rollout/ceo-plan.md @@ -0,0 +1,424 @@ +--- +status: ACTIVE +skill: /plan-rollout + /spill-check +contribution-target: https://github.com/garrytan/gstack +fork: git@github.com:mastermanas805/gstack.git +generated-by: /plan-ceo-review +generated-on: 2026-04-24 +mode: SCOPE_EXPANSION +--- + +# CEO Plan: /plan-rollout + /spill-check (gstack contribution) + +## The Problem + +LLM coding tools produce humongous diffs that PR reviewers cannot meaningfully +ingest. "Spills" (unrelated changes sneaking in) compound the problem. Reviewers +approve under cognitive load, real bugs ship, rollback is improvised. + +The gstack landscape currently has skills for planning (`/plan-ceo-review`, +`/plan-eng-review`, `/plan-design-review`, `/plan-devex-review`), shipping one PR +(`/ship`), reviewing one diff (`/review`), and post-deploy monitoring (`/canary`, +`/land-and-deploy`). There is **no skill** that addresses PR decomposition, spill +detection, or rollout sequencing as first-class deliverables. This is the gap. + +## The User's User + +The PR reviewer at 4pm on a Wednesday. They open a gstack-produced PR and feel +relief, not dread. The body reads: *"180 lines. Read `auth.ts:47` first, then +`middleware.ts`. Depends on PR #412 (merged). Next: PR #414 will add the UI +surface. Rollback: `git revert` + disable `flag.new_auth`."* They review it in +4 minutes, catch a real bug, ship with confidence. + +## Vision + +### 10x Version +An end-to-end reviewer-ergonomics protocol where decomposition declared at +plan-time flows through every downstream skill: `/ship` auto-creates the +stacked PRs with reviewer guides, `/review` verifies each PR stays in its +declared lane, `/land-and-deploy` sequences the rollout, `/canary` watches for +regressions in the declared scope. This skill is the missing spine of gstack's +plan-to-prod pipeline. + +### Platonic Ideal +A lightweight protocol, not a heavy tool. Skill produces two declarative +artifacts (`decomposition.md`, `rollout.md`) informed by a declared architectural +truth (`SYSTEM.md`). Every other gstack skill reads them and gets smarter. + +### Layer-3 Eureka +Conventional stacking tools (Graphite, git-spr, Aviator) assume a human +developer makes decomposition decisions while coding and clean up the mess +after. The AI-coding inversion: **decomposition is declared at plan-time by a +reviewer-aware planning pass, then enforced during code generation.** gstack +already owns plan-time better than anyone else. This skill is the natural next +link. No other ecosystem does it from this direction. + +## Scope Decisions (all accepted under SCOPE EXPANSION) + +| # | Proposal | Effort (h / CC+g) | Decision | Rationale | +|---|----------|--------------------|----------|-----------| +| 1 | Reviewer reading-order guide in PR body | 2h / 10min | ACCEPTED | Massive reviewer-UX win, near-free | +| 2 | ASCII stack map (PR-dep Gantt) in decomposition.md | 3h / 15min | ACCEPTED | Fits gstack diagram-mandatory ethos | +| 3 | Inverse rollback auto-generation in rollout.md | 4h / 20min | ACCEPTED | Production-safety critical, boils the lake | +| 4 | Reviewer time-budget estimate per PR | 1d / 30min | ACCEPTED | Novel; user accepted despite deferral suggestion | +| 5 | `/ship` integration — stack-aware auto-PR creation | 2d / 1h | ACCEPTED | The integration spine; mitigate via opt-in gate | +| 6 | `/review` integration — verify diff stays in declared scope | 1d / 30min | ACCEPTED | Low-risk, closes the loop at review time | +| 7 | Cross-skill discoverability hooks | 1h / 10min | ACCEPTED | Near-free; skill only matters if users find it | +| 8 | SYSTEM.md declarative system map | 2d / 1.5h | ACCEPTED | User-introduced; unlocks graph-aware decomposition | + +### Temporal Decisions (locked) + +| # | Decision | Chosen | +|---|----------|--------| +| 1 | Artifact format | YAML frontmatter + markdown body (matches SKILL.md convention) | +| 2 | `/ship` integration | Extend `/ship` with opt-in stack mode gated on `decomposition.md` existence | +| 3 | `/spill-check` strictness | Adaptive: strict for code, soft for infra/meta files | +| 4 | SYSTEM.md scope | Intra-repo for v1; schema reserves `repo:` and cross-repo fields for v2 | +| 5 | SYSTEM.md scaffolder | Generates `SYSTEM.md.draft`; user reviews + renames before first use | + +## Artifact Specifications + +### `SYSTEM.md` (repo root) — the semantic contract graph + +**Critical principle:** SYSTEM.md declares **role/contract dependencies only** — +the semantic relationships between components that only a human knows. Package +dependencies, import graphs, symbol references, and other mechanical coupling +are **discovered by the LLM at runtime** (AST, grep, manifests). They do NOT +belong in SYSTEM.md because they go stale within a week. + +| Kind | Example | Lives where | +|------|---------|-------------| +| Role/contract dep | "auth mints session tokens that middleware enforces; format change without middleware redeploy breaks sessions" | SYSTEM.md (declared) | +| Package/import dep | "`auth.ts` imports `crypto-utils`; `middleware.ts` calls `auth.verify()`" | Discovered by LLM | + +The skill reasons over BOTH graphs jointly: declared contracts give the *why*, +discovered imports give the *what*. When they disagree (e.g., middleware.ts +imports from auth.ts but no contract declared), the skill flags it for the user +to resolve — that's the signal that either a contract is missing from SYSTEM.md +or the import is a layering violation. + +```yaml +--- +components: + - name: + path: + repo: # reserved for v2 multi-repo; optional + role: + owns: [] + contracts: + - with: + nature: + breaks-if: + rollout-edge: # hard = must deploy together; soft = can lag + rollout-order: +--- + +# System Map + + +``` + +**Example:** + +```yaml +components: + - name: auth + path: src/auth + role: authentication + session lifecycle + owns: [user table, session table, JWT minting] + contracts: + - with: middleware + nature: middleware enforces session tokens auth mints + breaks-if: session payload schema changes without middleware redeploy + rollout-edge: hard + - with: api-gateway + nature: gateway expects `req.user` context set by middleware downstream of auth + breaks-if: auth stops populating tenant claims + rollout-edge: soft + rollout-order: 1 + - name: middleware + path: src/middleware + role: request routing + auth enforcement + owns: [request context shape] + contracts: + - with: api-gateway + nature: gateway consumes req.user set by middleware + breaks-if: req.user shape changes + rollout-edge: hard + rollout-order: 2 +``` + +### Package/import dependency discovery (LLM responsibility at runtime) + +Everything mechanical is **discovered, not declared**: +- Import graph via AST (`ts-morph`, `tree-sitter`, or existing parsers) +- Package dependencies from `package.json`, `Cargo.toml`, `go.mod`, etc. +- Symbol-level call graph via `grep` + `ripgrep` +- File-touch correlation from recent git history (`git log --name-only`) + +This runs per-invocation. Never cached into SYSTEM.md. Stale package deps in a +declared artifact cause more harm than good. + +### Reconciliation rules (joint reasoning over both graphs) + +When the discovered package graph and the declared contract graph disagree, the +skill surfaces a flag for user resolution. It does not silently pick a side. + +| Discovered | Declared | Signal | +|------------|----------|--------| +| `X` imports from `Y` | No contract between their components | "Layering violation or missing contract — add to SYSTEM.md or refactor." | +| Contract declared | No imports/calls found | "Contract may be stale, or coupling is runtime-only (DB reads, message bus, HTTP). Add a note." | +| Rollout-order says X→Y | X depends on Y at import level | "Order inverted vs. imports. Usually wrong; may be legitimate for types-only imports." | + +### `~/.gstack/projects/$SLUG/decomposition.md` + +```yaml +--- +status: ACTIVE +plan-ref: +generated-on: +total-prs: N +reviewer-time-budget-total-min: +pr-units: + - id: 1 + title: + component: + files: [] + depends-on: [] + reviewer-time-budget-min: + reading-order: [] + rationale: + - id: 2 + title: ... + depends-on: [1] + ... +--- + +# Decomposition: + +## Stack Map (ASCII Gantt) + +PR-1 [auth] ████████ +PR-2 [middleware] ────────████████ +PR-3 [gateway] ────────────────████ + +## Per-PR Detail + +``` + +### `~/.gstack/projects/$SLUG/rollout.md` + +```yaml +--- +status: ACTIVE +strategy: +rollout-steps: + - step: 1 + action: + component: + rollback: + verify: + wait: + - step: 2 + ... +flags: + - name: + provider: + default: off + enable-runbook: + kill-switch: +--- + +# Rollout Plan + + +## Rollback Playbook + +``` + +## Skill Files to Create / Modify in Fork + +### New files in `mastermanas805/gstack`: + +``` +plan-rollout/ +└── SKILL.md # main skill file + +spill-check/ +└── SKILL.md # enforcement skill + +lib/plan-rollout/ +├── system-map-parser.ts # parse SYSTEM.md YAML +├── system-map-scaffolder.ts # generate SYSTEM.md.draft +├── decomposition-parser.ts +├── rollout-parser.ts +└── reviewer-time-estimator.ts # LOC + files + complexity → minutes + +test/plan-rollout/ +├── system-map-parser.test.ts +├── scaffolder.test.ts +├── decomposition-roundtrip.test.ts +└── reviewer-time-estimator.test.ts +``` + +### Existing files to modify: + +``` +ship/SKILL.md # add opt-in stack mode gated on decomposition.md +review/SKILL.md # add scope-verification step when decomposition.md exists +plan-ceo-review/SKILL.md # add /plan-rollout to Next Steps — Review Chaining +plan-eng-review/SKILL.md # add /plan-rollout to Next Steps — Review Chaining +docs/skills.md # register new skills +CHANGELOG.md # add entry +README.md # add /plan-rollout + /spill-check to feature list +``` + +## Contribution PR Stack (meta: this skill decomposes its own contribution) + +**The ultimate demo.** Land this skill as the PR stack it is designed to produce: + +- **PR #1 — Foundation.** Adds `SYSTEM.md` schema + parser + scaffolder + tests. + Touches `lib/plan-rollout/system-map-*.ts`, `test/plan-rollout/system-map-*.test.ts`. + Ships standalone; no other skills modified. Reviewer time: ~15 min. + +- **PR #2 — /plan-rollout skill (depends on #1).** Adds the skill, decomposition + and rollout artifact writers, ASCII stack-map renderer, reviewer-time estimator, + inverse rollback generator. Touches `plan-rollout/SKILL.md`, `lib/plan-rollout/*`. + Reviewer time: ~25 min. + +- **PR #3 — /spill-check skill (depends on #1).** Adaptive enforcement. Touches + `spill-check/SKILL.md`, uses the SYSTEM.md parser from PR #1. Reviewer time: ~15 min. + +- **PR #4 — Integration (depends on #2 and #3).** Modifies `/ship`, `/review`, + `/plan-ceo-review`, `/plan-eng-review` to consume the artifacts and surface the + skill in Next Steps. Highest review risk — touches hot paths. Covered by the + existing golden-fixture tests. Reviewer time: ~20 min. + +**Rollout**: v1 ships behind no flag (pure addition; absent `decomposition.md` +means every existing skill behaves identically). `/ship` opt-in mode activates +only when artifact exists — zero regression surface for existing users. + +## Dogfood findings (from 2026-04-25 simulation against honojs/hono issue #4633) + +Simulated the full `/plan-rollout` + `/spill-check` workflow end-to-end on a +real open-source issue. Produced SYSTEM.md, decomposition.md, rollout.md; +implemented PR-1 (171 LOC, 3 files, 86/86 tests passing, zero regressions +across 4 other router implementations). Findings that change v1 scope: + +### Must-fix before v1 ships (add to scope) + +1. **SYSTEM.md `kind` field.** Add `kind: component | leaf-util | types-only`. + Shared utility dirs (e.g., `src/utils/`) don't fit the component schema and + force awkward workarounds. Reconciler ignores leaf-util edges. + +2. **`package-type` field for rollout templating.** `library | service | cli`. + Rollout for an npm library ("publish patch revert") differs materially from + rollout for a service ("coordinated deploy + state restore"). Current + rollout.md template is service-shaped and doesn't fit libraries. Add field; + rollout.md generator picks template accordingly. + +3. **Heuristic #8 — shared test fixtures.** Current 7 heuristics don't cover: + "PR unit extends a shared interface; which shared fixtures need updating + and which PR unit owns them?" I caught this mid-implementation; the skill + should prompt proactively during the decomposition ceremony. + +### Should-fix in v1 (quality polish) + +4. **Reviewer-time formula recalibration.** Replace fixed `test_bonus: 5min` + with `test_loc / 30`. Add `change_kind: additive | refactor | mutation` + multiplier (additive reviews faster). **Ship v1 with conservative defaults + and log predicted-vs-actual to analytics from day one** — data compounds. + +5. **Scaffolder effort level.** Current draft is ~10% accurate (every role is + "TODO — inferred from directory: X"). Should aim for 60%: parse top-level + README, `index.ts` exports, `@module` jsdoc, CODEOWNERS. User edits, not + writes from blank. + +### v2 candidates (defer but log) + +6. **`/plan-rollout --trim` mode.** After implementation, offer to drop + declared-but-untouched files. Decomposition tends to over-specify; trim + discipline keeps the artifact honest. + +7. **Dual-location artifacts.** User-dir (`~/.gstack/`) default plus optional + `--also-project-root` flag to write/symlink to repo root for team + visibility and PR reviewers who don't run gstack. + +8. **Rollout pattern library.** The "hard-edge → soft via optionality + + feature detection" move is reusable. Add a pattern library to the rollout + ceremony: optionality cutover, dual-write/dual-read, flag-gated cutover, + shadow-traffic canary. + +### What the dogfood did NOT stress-test + +Hono is a library with additive API change — no migrations, no flags, no +canary. Before shipping v1, run a second dogfood on a **service-shaped +change** (DB migration + feature flag + canary sequence). Good candidates: +open issues in `drizzle-orm`, `prisma`, or any real service repo. This will +stress the rollout side of the skill, which the Hono test under-exercised. + +--- + +## Open Questions (for Section 6 / Section 9 of the deep review) + +1. Reviewer-time-budget formula: what LOC-to-minutes coefficient? Needs calibration + data. Proposal: ship with a conservative default (1 min per 20 LOC + 3 min per + file + 5 min if tests present), log predicted-vs-actual to analytics, calibrate + in v2. This IS reasonable premature optimization — the initial coefficients + don't need to be perfect, they need to be directionally useful. + +2. Feature flag detection: string-match on common flag libraries (LaunchDarkly, + Unleash, GrowthBook, env-var patterns) or user declares in rollout.md? Proposal: + both. Auto-detect and prompt user to confirm, defer to user declaration if + conflict. + +3. `/spill-check` infra-file allowlist: what's on it? Proposal: `CLAUDE.md`, + `.gitignore`, `package.json`, `bun.lock`, `yarn.lock`, `package-lock.json`, + `.env.example`, `*.md` docs, CI config. Anything in this list can be touched + without declaration; everything else is strict. + +4. Integration with existing `~/.gstack/projects/$SLUG/` artifacts — does + decomposition.md supersede or compose with CEO plan? Proposal: compose. + decomposition.md references the CEO plan via `plan-ref:` frontmatter. + +5. Review log integration — does `/plan-rollout` add an entry to + `~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl`? Proposal: yes, to surface in + the Review Readiness Dashboard as a non-required entry. + +## NOT in Scope (v1) + +- Multi-repo workspaces (SYSTEM.md schema reserved, unused) +- Automatic code-generation constrained to declared scope (this is the natural v2 + — today's approach is post-hoc enforcement via `/spill-check`) +- Integration with Graphite / git-spr / Aviator — gstack produces the + decomposition artifact; external stacking tools can read it +- Reviewer-time-budget calibration based on historical data (needs usage to + collect; v2) +- Cross-model outside-voice on decomposition decisions (Codex consult could + validate "is this a sensible decomposition?" — v2) +- UI for visualizing the stack (ASCII diagram is v1; future could render via + `/design-html`) + +## Next Steps + +1. Complete the 11-section engineering deep review (architecture, error maps, + security, tests, observability, deployment) — run it next, or run + `/plan-eng-review` on this CEO plan. +2. Clone the fork locally: `git clone git@github.com:mastermanas805/gstack.git` +3. Run `bin/dev-setup` to activate dev mode. +4. Implement PR #1 (SYSTEM.md foundation) first; land it; demonstrate value. +5. Stack PR #2, #3, #4 using whatever interim decomposition approach (manual, or + once PR #2 lands, `/plan-rollout` itself). + +## Contribution Strategy Note + +This is a substantial first contribution — 4 PRs, new skills, cross-skill edits. +Mitigations: +- File an issue in `garrytan/gstack` first describing the proposal + linking this + CEO plan. Get directional buy-in before sinking implementation time. +- Offer to start with just PR #1 (SYSTEM.md) as proof-of-concept; a minimal, + standalone, high-value artifact that establishes the schema. If merged, the + rest becomes lower-risk. +- The meta-demo (this skill decomposing its own contribution) is a strong + rhetorical asset — lead with it in the issue/PR description. diff --git a/docs/designs/plan-rollout/integration-notes.md b/docs/designs/plan-rollout/integration-notes.md new file mode 100644 index 0000000000..02a56574bf --- /dev/null +++ b/docs/designs/plan-rollout/integration-notes.md @@ -0,0 +1,181 @@ +# Integration notes — changes needed to existing gstack skills + +This document describes modifications to existing gstack skills to consume the +`decomposition.md` and `rollout.md` artifacts produced by `/plan-rollout`. Land +these as PR #4 in the contribution stack (after the two new skills and the +parser have merged). + +## Design principle: zero regression + +Every existing skill must behave **identically** when no `decomposition.md` +exists. Stack-mode behavior is gated on the artifact's presence. Users who +never run `/plan-rollout` see no change. + +--- + +## `/ship` — stack mode (Expansion #5) + +**Gate:** `[ -f "$_ARTIFACTS_DIR/decomposition.md" ]` after Step 1 (state discovery). + +**New step: Step 11.5 — Stack-mode PR creation** (inserted before existing +Step 12 — VERSION bump). + +```bash +if [ -f "$_ARTIFACTS_DIR/decomposition.md" ]; then + # Parse decomposition, determine which PR unit this push represents. + # Heuristic: git log of this branch since its parent stack-unit's merge base. + _UNIT_ID=$(~/.claude/skills/gstack/lib/plan-rollout/infer-current-unit \ + --decomposition "$_ARTIFACTS_DIR/decomposition.md" \ + --branch "$_BRANCH" --diff-base "$(determine-parent-base)") + + # Enforce: if /spill-check would fail, halt with clear explanation. + ~/.claude/skills/gstack/bin/spill-check-gate --unit "$_UNIT_ID" || exit 1 + + # Generate the PR body from the decomposition entry for this unit: + # - Title (conventional-commits from decomposition) + # - Reading-order block + # - Reviewer time-budget estimate + # - Dependency note ("Depends on PR #N (merged/open)") + # - Next-in-stack note ("Followed by PR #M (title)") + # - Rollout strategy summary (links to rollout.md) + _PR_BODY=$(~/.claude/skills/gstack/lib/plan-rollout/pr-body-for-unit \ + --decomposition "$_ARTIFACTS_DIR/decomposition.md" \ + --rollout "$_ARTIFACTS_DIR/rollout.md" \ + --unit "$_UNIT_ID") +fi +``` + +The existing `gh pr create` (Step 15-ish) uses `$_PR_BODY` when set, falling +through to the current default behavior when unset. + +### Stacking mechanic + +v1 does NOT implement native stacked-PR mechanics (rewriting base-branch fields +for each PR as the parent merges). Instead: + +- First PR in the stack: base = `main` / default branch (standard) +- Subsequent PRs: base = the previous unit's branch, auto-filled +- User is responsible for using Graphite / git-spr / manual rebasing for the + stack mechanics + +This keeps `/ship` from becoming a stacking tool. The reviewer-ergonomics win +(reading order, time budget, dependency narration) is orthogonal to the +stack-rebase mechanics and more valuable. + +v2 could add native stacking; scope for a future PR. + +--- + +## `/review` — scope verification (Expansion #6) + +**Gate:** same — `decomposition.md` exists. + +**New step: before the existing review runs**, verify scope: + +```bash +if [ -f "$_ARTIFACTS_DIR/decomposition.md" ]; then + _UNIT_ID=$(~/.claude/skills/gstack/lib/plan-rollout/infer-current-unit ...) + + # Run spill classifier in gate mode (non-interactive). + _SCOPE_REPORT=$(~/.claude/skills/gstack/lib/plan-rollout/spill-classifier \ + --decomposition "$_ARTIFACTS_DIR/decomposition.md" \ + --current-unit "$_UNIT_ID" \ + --files "$(git diff --name-only BASE HEAD)" \ + --format human) + + # If hard spills present, surface them as Section 0 of the review output. + echo "## Scope Verification" + echo "$_SCOPE_REPORT" +fi +``` + +Hard spills become P1 findings in the existing `/review` output. Soft spills +become informational notes. This closes the loop: `/spill-check` catches +spills during implementation; `/review` catches any that made it to PR. + +--- + +## `/plan-ceo-review` and `/plan-eng-review` — Next Steps hooks (Expansion #7) + +In the existing "Next Steps — Review Chaining" section, add a line: + +```markdown +**Recommend /plan-rollout if the accepted plan represents more than one logical +unit of work.** Signs: plan touches >5 files, spans multiple SYSTEM.md +components, includes both migrations and feature code, or the CEO review +accepted scope expansions. /plan-rollout decomposes the plan into a reviewable +PR stack and produces the rollout artifact, saving you from either a +humongous single PR or improvised stacking. +``` + +Add `/plan-rollout` to the AskUserQuestion options in that section: + +``` +- A) Run /plan-eng-review next (required gate) +- B) Run /plan-rollout next (decompose into PR stack + rollout plan) +- C) Run /plan-design-review next (only if UI scope detected) +- D) Skip — I'll handle reviews manually +``` + +--- + +## `/land-and-deploy` — rollout-aware deployment (stretch) + +Not in v1 scope, but the hook point is obvious: `/land-and-deploy` reads +`rollout.md` for the step sequence, executes each step with its verify block, +and uses the auto-generated rollback lines if a step fails. Defer to v2 to +keep the initial PR stack tractable. + +--- + +## `/canary` — scope-aware regression watching (stretch) + +Also v2. `/canary` could read the `verify:` metrics from `rollout.md` and +watch them specifically post-deploy, alerting on regression in the declared +scope while ignoring unrelated noise. + +--- + +## Test plan (lives in `test/plan-rollout/`) + +``` +test/plan-rollout/ +├── system-map-parser.test.ts # YAML parsing, validation, component lookup +├── reconcile.test.ts # the 3 reconciliation categories on fixtures +├── scaffolder.test.ts # draft generation from a test repo fixture +├── decomposition-roundtrip.test.ts # write → read → write produces identical output +├── reviewer-time-estimator.test.ts # formula produces expected ranges +└── spill-classifier.test.ts # hard/soft classification + allowlist logic +``` + +Fixtures in `test/fixtures/plan-rollout/`: +- `system-map-minimal.yaml` — single component +- `system-map-three-layers.yaml` — auth + middleware + gateway example +- `system-map-invalid-*.yaml` — each validation error path +- `import-graph-sample.json` — synthetic edges for reconcile tests +- `decomposition-three-prs.md` — golden output + +Test approach matches existing gstack skills (see `test/fixtures/golden/` in +the repo for the pattern). + +--- + +## Rollout of the contribution itself + +As described in CEO-PLAN.md, this lands as a 4-PR stack (the skill +decomposing its own contribution): + +1. **PR #1 — Foundation.** `lib/plan-rollout/system-map-*.ts` + tests + + `docs/SYSTEM-MD.md`. Standalone; no skills modified. ~15 min review. +2. **PR #2 — /plan-rollout skill.** `plan-rollout/SKILL.md` + remaining + `lib/plan-rollout/*.ts` (decomposition writer, rollout writer, reviewer-time + estimator, inverse-rollback generator). ~25 min review. +3. **PR #3 — /spill-check skill.** `spill-check/SKILL.md` + spill classifier. + Independent of PR #2. ~15 min review. +4. **PR #4 — Integration.** Modify `ship/SKILL.md`, `review/SKILL.md`, + `plan-ceo-review/SKILL.md`, `plan-eng-review/SKILL.md`. Hot-path risk; + covered by golden-fixture tests. ~20 min review. + +File the issue first: +https://github.com/garrytan/gstack/issues/new — link CEO-PLAN.md, propose the +stack, ask for directional buy-in before sinking implementation time. diff --git a/docs/designs/plan-rollout/plan-rollout.skill-draft.md b/docs/designs/plan-rollout/plan-rollout.skill-draft.md new file mode 100644 index 0000000000..fdcf58862b --- /dev/null +++ b/docs/designs/plan-rollout/plan-rollout.skill-draft.md @@ -0,0 +1,383 @@ +--- +name: plan-rollout +preamble-tier: 4 +version: 0.1.0 +description: | + Decompose a large change into a reviewable PR stack with a rollout plan. + Reads SYSTEM.md (semantic contract graph) + the discovered package graph, + produces decomposition.md and rollout.md — consumed downstream by /ship, + /review, /spill-check, /land-and-deploy. Use when you have an approved plan + (from /plan-eng-review or otherwise) and you're about to implement. Triggers: + "plan the rollout", "decompose this", "break into PRs", "stack the PRs", + "plan the shipping order". (gstack) +allowed-tools: + - Bash + - Read + - Write + - Edit + - Grep + - Glob + - Agent + - AskUserQuestion + - WebSearch +triggers: + - plan the rollout + - decompose this + - break into prs + - stack the prs + - plan the shipping order + - rollout plan +--- + + + +## Step 0: Detect platform and base branch + +Same as /ship and /plan-eng-review. Use the shared detection block. + +## Overview + +This skill sits between *plan approved* and *code written*. It answers: "how do I +decompose this work into a reviewable PR stack, and in what order should each unit +ship to production?" + +The output is two artifacts (YAML frontmatter + markdown body): + +- `~/.gstack/projects/$SLUG/decomposition.md` — the PR stack, one unit per PR, + with files declared, reviewer reading-order, dependency edges, reviewer + time-budget estimate, and an ASCII Gantt-style stack map. +- `~/.gstack/projects/$SLUG/rollout.md` — rollout strategy (flag / canary / + migration-first / big-bang), step-by-step sequence with inverse rollback + lines auto-generated, and the kill-switch runbook if flags are involved. + +Downstream: +- `/spill-check` reads the decomposition to flag scope creep during + implementation. +- `/ship` enters **stack mode** when `decomposition.md` exists, auto-creating + the PR stack with reviewer guides. +- `/review` verifies the PR diff stays within its declared PR unit. +- `/land-and-deploy` sequences the rollout steps. + +## Prerequisites + +Before running, confirm: + +1. A plan file exists in the conversation or on disk (from `/plan-eng-review`, + `/plan-ceo-review`, or a user-authored plan). +2. The repo has `SYSTEM.md` at the root. If not, offer to scaffold (Step 2). +3. No active `decomposition.md` exists (if yes, prompt to revise or start fresh). + +## Step 1: Discover inputs + +Collect what you need to decompose: + +```bash +# Current state +_BRANCH=$(git branch --show-current) +eval "$(~/.claude/skills/gstack/bin/gstack-slug)" +_ARTIFACTS_DIR="${GSTACK_HOME:-$HOME/.gstack}/projects/$SLUG" + +# Source plan (from /plan-eng-review or /plan-ceo-review output) +_CEO_PLAN=$(ls -t "$_ARTIFACTS_DIR/ceo-plans"/*.md 2>/dev/null | head -1) +_ENG_PLAN=$(ls -t "$_ARTIFACTS_DIR/eng-plans"/*.md 2>/dev/null | head -1) + +# SYSTEM.md presence +_SYSTEM_MD="$(git rev-parse --show-toplevel)/SYSTEM.md" +[ -f "$_SYSTEM_MD" ] && echo "SYSTEM_MD: present" || echo "SYSTEM_MD: missing" + +# Existing decomposition (revision vs fresh) +_EXISTING_DECOMP="$_ARTIFACTS_DIR/decomposition.md" +[ -f "$_EXISTING_DECOMP" ] && echo "DECOMP: exists" || echo "DECOMP: fresh" + +# Discovered change surface — either pending diff or plan-declared files +_DIFF_FILES=$(git diff --name-only "$(git merge-base HEAD origin/main 2>/dev/null || echo HEAD)" 2>/dev/null || true) +``` + +Read the plan file(s) to extract: +- Feature summary / user-facing outcome +- Declared scope (files, components, user flows) +- Accepted expansions from CEO review (if any) +- Deferred / out-of-scope items + +## Step 2: SYSTEM.md scaffolder (only if missing) + +If `SYSTEM.md` is missing, offer to scaffold. Never write it authoritatively +without user review. + +Use AskUserQuestion: + +> "No SYSTEM.md found at the repo root. /plan-rollout needs it to reason about +> role-level contracts between components. I can scan your repo and generate a +> draft based on top-level directories, package.json workspaces, CODEOWNERS +> (if present), and import-graph clustering. You review, edit, rename from +> `.draft` to `SYSTEM.md`, commit. Takes ~5 min of your time." +> +> RECOMMENDATION: Choose A — SYSTEM.md is the semantic spine this skill reasons +> over. Without it, decomposition falls back to file-level heuristics (worse). +> Completeness: A=9/10, B=5/10, C=3/10. + +Options: +- A) Scaffold SYSTEM.md.draft now (recommended) +- B) Let me hand-write SYSTEM.md, then re-run /plan-rollout +- C) Run /plan-rollout without SYSTEM.md (degraded mode — flag-level only) + +If A: run the scaffolder: + +```bash +~/.claude/skills/gstack/lib/plan-rollout/system-map-scaffolder \ + --repo "$(git rev-parse --show-toplevel)" \ + --output "$(git rev-parse --show-toplevel)/SYSTEM.md.draft" +``` + +The scaffolder writes components with: +- `name`, `path`, `role` (inferred from directory name + README excerpt, TODO marker if unclear) +- `owns`: empty with TODO marker +- `contracts`: empty with TODO marker +- `rollout-order`: empty with TODO marker + +Tell the user: *"Draft written to SYSTEM.md.draft. Review the TODO markers, +fill in role + contracts + rollout-order, rename to SYSTEM.md, commit, then +re-run /plan-rollout."* + +**STOP.** Wait for user to complete the edit-and-rename cycle. Do not proceed +to Step 3 without a present-and-valid SYSTEM.md (unless user chose C). + +If B: tell the user where the schema doc lives +(`~/.claude/skills/gstack/docs/SYSTEM-MD.md`) and stop. + +If C: degraded mode — proceed to Step 3 with no component graph. `/spill-check` +can still run on file-level declarations. Note degradation in the output. + +## Step 3: Reconcile declared contracts with discovered imports + +Before decomposing, run reconciliation between SYSTEM.md (declared contracts) +and the package/import graph (discovered at runtime). + +```bash +~/.claude/skills/gstack/lib/plan-rollout/system-map-reconcile \ + --system-md "$_SYSTEM_MD" \ + --repo "$(git rev-parse --show-toplevel)" \ + --format json > /tmp/reconcile-$$.json +``` + +The reconcile tool surfaces three categories of flag: + +1. **Import without declared contract.** File X imports from file Y, but their + components have no contract in SYSTEM.md. +2. **Contract without supporting imports.** Components X and Y have a contract + declared but no code-level coupling was found (may be runtime-only: DB reads, + message bus, HTTP, filesystem). +3. **Rollout-order inversion.** Declared rollout order contradicts the import + direction. + +For each flag, use AskUserQuestion: + +> "Reconciliation flag: [category]. [Concrete example with file paths and +> component names]. Is this a declared contract gap, a layering violation, or +> noise?" +> +> RECOMMENDATION: [choose based on category heuristic] + +Options: +- A) Add missing contract to SYSTEM.md now +- B) This is a layering violation — add to TODOS.md and proceed +- C) Noise / runtime-only coupling — suppress this flag and proceed +- D) Runtime-only coupling — add a contract with `note: runtime-only` + +Batch up to 4 flags per question call for efficiency. If there are >12 flags, +present the top 8 sorted by severity (hard-edge contract gaps first). + +**STOP** after the reconciliation pass completes. Wait for all user resolutions. + +## Step 4: Decomposition ceremony + +Now the core work. Given the plan + SYSTEM.md + discovered file set, propose a +PR decomposition. + +### 4a. Propose units + +For each proposed PR unit, output: + +``` +PR-UNIT #N: + title: + component: + files: + depends-on: + rationale: + reading-order: + reviewer-mins: +``` + +### 4b. Unit-splitting heuristics (apply in priority order) + +1. **Component boundary.** Files in different SYSTEM.md components go in + different PR units unless a single indivisible user-facing outcome requires + them together. +2. **Migration-first.** DB migrations always ship as PR #1 (or earliest), + separate from code that reads the new schema. +3. **Interface-first.** Types, interfaces, and schemas go in an early PR; their + implementers come after. +4. **Pure additions first, mutations later.** New code before edits to existing + code when possible. +5. **Tests travel with their code.** Never a tests-only PR unless refactoring + test infrastructure. Reviewers evaluate code + test jointly. +6. **Flag-gate before flag-flip.** Introduce a feature flag (off) as one PR; + enable / roll out as a separate operational step, not a PR. +7. **Reviewer-budget cap.** No single PR unit exceeds 30 minutes of estimated + review time. If it does, split further. + +### 4c. Reviewer time-budget estimator + +For each PR unit: + +``` +reviewer_mins = base + (loc / 20) + (files * 3) + test_bonus + complexity_bonus + +base = 2 minutes +loc / 20 = 1 minute per 20 lines changed +files * 3 = 3 minutes per file touched (context-switching cost) +test_bonus = 5 minutes if PR contains tests (good — reviewers read them) +complexity_bonus = cyclomatic complexity delta × 2, capped at 10 +``` + +v1 coefficients are a conservative default. Skill logs +`predicted_vs_actual_reviewer_time` to analytics so we can calibrate in v2 +against real data. + +### 4d. Present decomposition to user + +AskUserQuestion (one question): + +> "Proposed PR stack: [ASCII Gantt-style map]. Total: N PRs, estimated total +> reviewer time: M minutes. Does this decomposition make sense?" +> +> RECOMMENDATION: Confirm the decomposition. If any unit feels wrong, pick B +> and we'll iterate. + +Options: +- A) Confirm — write decomposition.md and continue to rollout planning +- B) Revise — tell me what to split, merge, reorder, or re-scope +- C) Split further — every unit over 15 minutes becomes two +- D) Abort — this plan isn't ready for decomposition yet + +If B: collect the user's guidance and re-propose. Loop until A or D. + +## Step 5: Rollout ceremony + +For each PR unit (or the stack as a whole), plan the rollout. + +### 5a. Strategy selection + +AskUserQuestion (one question per non-trivial PR unit, or one for the stack if +strategy is uniform): + +> "Rollout strategy for [PR unit / whole stack]?" +> +> RECOMMENDATION: Depends on change surface. For code behind existing tested +> paths: big-bang. For user-visible features: flag. For data model changes: +> migration-first. + +Options: +- A) Feature flag — ship behind a flag, enable in rollout step +- B) Canary — deploy to N% of traffic, watch, ramp +- C) Migration-first — DB migration as step 1, code after +- D) Big-bang — merge, deploy, done (only for low-risk changes) + +### 5b. Inverse rollback auto-generation + +For each rollout step, auto-generate its inverse: + +| Forward action | Auto-generated rollback | +|----------------|--------------------------| +| Deploy binary vN | Re-deploy binary v(N-1) | +| Run migration M-up | Run migration M-down | +| Enable flag F | Disable flag F + clear cache | +| Ramp canary to 50% | Ramp canary to 0% | +| Update config key K to value V | Update config key K to previous value | + +If a step's rollback is non-trivial (e.g., migration is non-reversible), flag +it loudly in the rollout.md as `rollback: MANUAL — see runbook ` and +require the user to specify the manual procedure. + +### 5c. Verify step + +For each rollout step, ask: + +> "What metric / dashboard tells you this step succeeded?" + +Store as `verify:` in the step. `/canary` will consume this post-deploy. + +## Step 6: Write artifacts + +```bash +~/.claude/skills/gstack/lib/plan-rollout/decomposition-writer \ + --output "$_ARTIFACTS_DIR/decomposition.md" \ + --units-json /tmp/units-$$.json + +~/.claude/skills/gstack/lib/plan-rollout/rollout-writer \ + --output "$_ARTIFACTS_DIR/rollout.md" \ + --steps-json /tmp/rollout-$$.json +``` + +Both writers produce YAML frontmatter + human-readable markdown with the ASCII +Gantt diagram inline. + +## Step 7: Review log + next-steps recommendation + +Log to the review log for the Review Readiness Dashboard: + +```bash +~/.claude/skills/gstack/bin/gstack-review-log "$(jq -n \ + --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + --arg commit "$(git rev-parse --short HEAD)" \ + --argjson units "$N_UNITS" \ + --argjson total_mins "$TOTAL_MINS" \ + '{skill:"plan-rollout", timestamp:$ts, status:"clean", + pr_units:$units, total_reviewer_mins:$total_mins, commit:$commit}')" +``` + +Recommend next steps via AskUserQuestion: + +Options: +- A) Start implementing PR #1 now — I'll also enable /spill-check monitoring +- B) Review the decomposition.md and rollout.md first, then start +- C) Run /plan-design-review on the stack (if any unit has UI scope) + +## Completion Summary + +``` ++================================================================+ +| /plan-rollout — COMPLETION SUMMARY | ++================================================================+ +| SYSTEM.md | present / scaffolded / skipped (degraded)| +| Reconciliation | N flags, N resolved, N suppressed | +| PR units | N | +| Total reviewer mins | M | +| Rollout strategy | flag / canary / migration-first / big-bang| +| Rollout steps | N with auto-rollback, K manual | +| Artifacts | decomposition.md, rollout.md | +| Next | start PR #1 / review artifacts / design | ++================================================================+ +``` + +## Plan File Review Report + +Same pattern as /plan-ceo-review: update `## GSTACK REVIEW REPORT` in the plan +file if present. + +## Capture Learnings + +Log non-obvious decomposition patterns observed during the session, especially +when user guidance overrode a default heuristic. These compound over sessions. + +```bash +~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"plan-rollout",...}' +``` + + diff --git a/docs/designs/plan-rollout/spill-check.skill-draft.md b/docs/designs/plan-rollout/spill-check.skill-draft.md new file mode 100644 index 0000000000..f516a3b006 --- /dev/null +++ b/docs/designs/plan-rollout/spill-check.skill-draft.md @@ -0,0 +1,209 @@ +--- +name: spill-check +preamble-tier: 3 +version: 0.1.0 +description: | + Detect scope creep mid-implementation. Compares the current diff against + the declared PR unit in decomposition.md (from /plan-rollout). Flags + undeclared files as spills. Adaptive: strict for code, soft for infra/meta + files (CLAUDE.md, package.json, bun.lock, CI configs). Triggers: "check for + spills", "am I in scope", "verify my diff", "spill check". Can also run as + a pre-ship gate; /ship calls this automatically when a decomposition.md + exists. (gstack) +allowed-tools: + - Bash + - Read + - Edit + - Grep + - Glob + - AskUserQuestion +triggers: + - check for spills + - spill check + - am i in scope + - verify my diff + - scope check +--- + + + +## Step 0: Detect platform and base branch + +Same as /ship. + +## Overview + +`/spill-check` answers: "am I still in scope for the PR unit I'm currently +implementing?" It reads the active `decomposition.md`, infers which PR unit the +user is working on, compares files touched vs declared, and reports spills. + +Spills fall into three adaptive categories: + +| Category | Example | Default action | +|----------|---------|----------------| +| Hard spill (code) | `src/billing/stripe.ts` touched, not declared | **Block** — prompt user to resolve | +| Soft spill (infra/meta) | `CLAUDE.md`, `package.json`, `bun.lock` touched | **Warn** — allow, note in output | +| Declared but untouched | Expected file not modified yet | **Info** — maybe unfinished; not a spill | + +## Step 1: Discover state + +```bash +_BRANCH=$(git branch --show-current) +eval "$(~/.claude/skills/gstack/bin/gstack-slug)" +_ARTIFACTS_DIR="${GSTACK_HOME:-$HOME/.gstack}/projects/$SLUG" +_DECOMP="$_ARTIFACTS_DIR/decomposition.md" + +if [ ! -f "$_DECOMP" ]; then + echo "No decomposition.md found. Run /plan-rollout first or invoke without scope enforcement." + exit 0 +fi + +# Which PR unit is the user working on? +# Heuristic 1: branch name contains a unit ID (e.g., feat/auth-pr1, feat/pr2-middleware) +# Heuristic 2: ask the user explicitly +# Heuristic 3: match diff against declared files for each unit, pick best +_CURRENT_UNIT_ID=$(~/.claude/skills/gstack/lib/plan-rollout/infer-current-unit \ + --decomposition "$_DECOMP" --branch "$_BRANCH" --diff-base "origin/main") +``` + +If `_CURRENT_UNIT_ID` is ambiguous, use AskUserQuestion: + +> "Which PR unit are you currently working on? /spill-check needs to know to +> compare your diff against the right declared scope." + +Options listed from the decomposition.md (one per unit, labels with +`[unit-id] title`). + +## Step 2: Compute the diff and classify + +```bash +_DIFF_FILES=$(git diff --name-only \ + "$(git merge-base HEAD origin/main)" HEAD --diff-filter=ACMR) + +~/.claude/skills/gstack/lib/plan-rollout/spill-classifier \ + --decomposition "$_DECOMP" \ + --current-unit "$_CURRENT_UNIT_ID" \ + --files "$_DIFF_FILES" \ + --format json > /tmp/spill-$$.json +``` + +Classifier output (JSON): + +```json +{ + "in-scope": ["src/auth/session.ts", "src/auth/jwt.ts"], + "hard-spills": ["src/billing/stripe.ts"], + "soft-spills": ["CLAUDE.md", "package.json"], + "declared-untouched": ["src/auth/tests/session.test.ts"] +} +``` + +## Step 3: The infra-file allowlist + +Soft-spill allowlist (touchable without declaration): + +- Root: `CLAUDE.md`, `.gitignore`, `.editorconfig`, `.prettierrc*`, `.eslintrc*`, + `README.md`, `CHANGELOG.md`, `LICENSE`, `.env.example`, `VERSION` +- Package: `package.json`, `bun.lock`, `yarn.lock`, `package-lock.json`, + `Cargo.toml`, `Cargo.lock`, `go.mod`, `go.sum`, `requirements.txt`, + `poetry.lock`, `Gemfile`, `Gemfile.lock` +- CI: `.github/**`, `.gitlab-ci.yml`, `.circleci/**`, `azure-pipelines.yml` +- Docs: `docs/**/*.md` +- gstack artifacts: `.gstack/**`, `.claude/skills/**` (if vendored) + +Anything not matching the allowlist is a hard spill. + +Users can extend the allowlist per-project in +`.gstack/spill-check.yml`: + +```yaml +soft-spill-allowlist: + - "scripts/generate-*.sh" + - "migrations/timestamps-only/*.txt" +``` + +## Step 4: Report and resolve + +If no spills: print success, exit 0. + +If soft spills only: print a one-line note per file, exit 0 (advisory). + +If hard spills present: use AskUserQuestion per spill (batch up to 4): + +> "Hard spill detected: `` was modified but is not declared for PR unit +> `` (). This is out-of-scope code that will confuse the +> reviewer and may belong in a different PR unit." +> +> RECOMMENDATION: Choose A (carve) if the change is unrelated to the current +> unit's purpose. Choose B (extend) if it's actually part of this unit and +> decomposition.md needs updating. Choose C (revert) if the change isn't +> needed at all. + +Options: +- A) Carve this file to a separate branch (I'll stash the diff and leave a + TODO) — recommended when the change is genuinely unrelated +- B) Extend decomposition.md to add this file to the current PR unit — only if + the file genuinely belongs to this unit +- C) Revert the change to this file — if it shouldn't be in any PR +- D) Add a project-level soft-spill rule for this path (never flag again) + +For A (carve): + +```bash +# Stash just this file's diff into a named branch +~/.claude/skills/gstack/lib/plan-rollout/carve-spill \ + --file "" \ + --source-branch "$_BRANCH" \ + --target-branch "spill/-$(date +%s)" +``` + +The `carve-spill` helper: +1. Stashes the file's uncommitted changes +2. Resets the file to its base-branch state in the current branch +3. Creates a new branch from base, applies the stash, commits +4. Prints the new branch name for the user to ship later + +For B (extend): + +```bash +~/.claude/skills/gstack/lib/plan-rollout/decomposition-extend \ + --decomposition "$_DECOMP" \ + --unit "$_CURRENT_UNIT_ID" \ + --add-file "" \ + --reason "" +``` + +Rewrites decomposition.md adding the file to the declared unit with a note +in the `extended-on:` field for audit. + +## Step 5: Optional — run as a pre-ship gate + +When called by `/ship` in stack mode (automatic, not user-invoked): + +- Exit 0: no hard spills, allow /ship to proceed +- Exit 1: hard spills unresolved — /ship halts, user must re-run /spill-check + interactively + +In gate mode, no AskUserQuestion; just report and exit with the right code. + +## Completion Summary + +``` ++=============================================================+ +| /spill-check — COMPLETION SUMMARY | ++=============================================================+ +| PR unit inferred | () | +| Files in scope | N | +| Hard spills | N (N carved, N extended, N reverted) | +| Soft spills (warned) | N | +| Declared untouched | N (may be incomplete work) | +| Status | CLEAN / RESOLVED / UNRESOLVED | ++=============================================================+ +``` + +## Learnings + +Log any soft-spill allowlist additions the user accepts — these are +project-specific knowledge that future sessions benefit from knowing. + +<!-- TELEMETRY FOOTER: auto-generated --> diff --git a/docs/designs/plan-rollout/system-map-parser.ts b/docs/designs/plan-rollout/system-map-parser.ts new file mode 100644 index 0000000000..f1f3986f89 --- /dev/null +++ b/docs/designs/plan-rollout/system-map-parser.ts @@ -0,0 +1,251 @@ +// SYSTEM.md parser + reconciler. +// +// Parses the YAML frontmatter block from SYSTEM.md, validates schema, builds +// the contract graph. The reconcile() function is the other half: given a +// parsed SystemMap and a discovered import graph, produces a list of flags +// for the skill to surface to the user. +// +// This is the v1 stub. Public API is stable; internals will grow. + +import { readFileSync } from "node:fs"; +import YAML from "yaml"; // gstack already uses yaml in other places + +// ---------- Types ---------- + +export type RolloutEdge = "hard" | "soft"; + +export interface Contract { + with: string; + nature: string; + breaksIf: string; + rolloutEdge: RolloutEdge; + note?: string; +} + +export interface Component { + name: string; + path: string; + repo?: string; + role: string; + owns: string[]; + contracts: Contract[]; + rolloutOrder: number; +} + +export interface SystemMap { + version: number; + components: Component[]; + narrative: string; +} + +// Discovered edges — produced elsewhere (AST walker, grep pipeline, etc.) +export interface ImportEdge { + from: string; // file path + to: string; // file path + kind: "import" | "call" | "reexport"; +} + +// ---------- Parser ---------- + +export function parseSystemMap(filepath: string): SystemMap { + const raw = readFileSync(filepath, "utf8"); + const match = raw.match(/^---\n([\s\S]+?)\n---\n([\s\S]*)$/); + if (!match) { + throw new Error( + `${filepath}: missing YAML frontmatter (expected ---...--- block)`, + ); + } + const [, frontmatter, narrative] = match; + const parsed = YAML.parse(frontmatter); + validateSystemMap(parsed, filepath); + return { + version: parsed.version, + components: parsed.components.map(normalizeComponent), + narrative: narrative.trim(), + }; +} + +function normalizeComponent(raw: any): Component { + return { + name: raw.name, + path: raw.path, + repo: raw.repo, + role: raw.role, + owns: raw.owns ?? [], + contracts: (raw.contracts ?? []).map((c: any) => ({ + with: c.with, + nature: c.nature, + breaksIf: c["breaks-if"], + rolloutEdge: c["rollout-edge"], + note: c.note, + })), + rolloutOrder: raw["rollout-order"], + }; +} + +function validateSystemMap(parsed: any, filepath: string): void { + if (parsed.version !== 1) { + throw new Error(`${filepath}: unsupported version ${parsed.version} (expected 1)`); + } + if (!Array.isArray(parsed.components) || parsed.components.length === 0) { + throw new Error(`${filepath}: components array is missing or empty`); + } + const names = new Set<string>(); + for (const c of parsed.components) { + for (const field of ["name", "path", "role"]) { + if (typeof c[field] !== "string" || !c[field]) { + throw new Error(`${filepath}: component is missing required field '${field}'`); + } + } + if (names.has(c.name)) { + throw new Error(`${filepath}: duplicate component name '${c.name}'`); + } + names.add(c.name); + for (const contract of c.contracts ?? []) { + if (!["hard", "soft"].includes(contract["rollout-edge"])) { + throw new Error( + `${filepath}: contract ${c.name} -> ${contract.with}: rollout-edge must be 'hard' or 'soft'`, + ); + } + } + if (typeof c["rollout-order"] !== "number") { + throw new Error(`${filepath}: component '${c.name}' missing numeric rollout-order`); + } + } + // Every contract's `with` must reference a known component. + for (const c of parsed.components) { + for (const contract of c.contracts ?? []) { + if (!names.has(contract.with)) { + throw new Error( + `${filepath}: component '${c.name}' has contract with unknown component '${contract.with}'`, + ); + } + } + } +} + +// ---------- Component membership ---------- + +// Given a file path, return the component it belongs to (or null for +// "not in any declared component" — e.g., root-level infra files). +export function componentForFile( + map: SystemMap, + filepath: string, +): Component | null { + // Longest-prefix match — more specific component wins. + const matches = map.components + .filter((c) => filepath === c.path || filepath.startsWith(c.path + "/")) + .sort((a, b) => b.path.length - a.path.length); + return matches[0] ?? null; +} + +// ---------- Reconciliation ---------- + +export type ReconcileFlagCategory = + | "import-without-contract" + | "contract-without-imports" + | "rollout-order-inversion"; + +export interface ReconcileFlag { + category: ReconcileFlagCategory; + fromComponent?: string; + toComponent?: string; + evidence: string; // human-readable, included in the AskUserQuestion prompt + suggestedFix: string; +} + +export function reconcile( + map: SystemMap, + edges: ImportEdge[], +): ReconcileFlag[] { + const flags: ReconcileFlag[] = []; + + // Bucket edges by component pair + const byPair = new Map<string, ImportEdge[]>(); + for (const edge of edges) { + const fromComp = componentForFile(map, edge.from); + const toComp = componentForFile(map, edge.to); + if (!fromComp || !toComp || fromComp.name === toComp.name) continue; + const key = `${fromComp.name}|${toComp.name}`; + if (!byPair.has(key)) byPair.set(key, []); + byPair.get(key)!.push(edge); + } + + // Category 1: import-without-contract + // If we see imports from component A to component B, A should have a contract + // with B (or vice versa — direction of contract isn't required to match import + // direction, since contracts describe role relationships not data flow). + for (const [key, pairEdges] of byPair.entries()) { + const [fromName, toName] = key.split("|"); + const hasContract = map.components.some( + (c) => + (c.name === fromName && c.contracts.some((x) => x.with === toName)) || + (c.name === toName && c.contracts.some((x) => x.with === fromName)), + ); + if (!hasContract) { + flags.push({ + category: "import-without-contract", + fromComponent: fromName, + toComponent: toName, + evidence: `${pairEdges.length} import edge(s) between '${fromName}' and '${toName}' but no contract declared. Example: ${pairEdges[0].from} -> ${pairEdges[0].to}`, + suggestedFix: `Add a contract to SYSTEM.md between '${fromName}' and '${toName}', OR refactor to remove the cross-component import if it's a layering violation.`, + }); + } + } + + // Category 2: contract-without-imports + // A contract exists but no supporting import edges. May be runtime-only (OK if + // `note: runtime-only` is set) or stale. + for (const c of map.components) { + for (const contract of c.contracts) { + const keyA = `${c.name}|${contract.with}`; + const keyB = `${contract.with}|${c.name}`; + const hasSupport = byPair.has(keyA) || byPair.has(keyB); + if (!hasSupport && contract.note !== "runtime-only") { + flags.push({ + category: "contract-without-imports", + fromComponent: c.name, + toComponent: contract.with, + evidence: `Contract '${c.name} -> ${contract.with}' declared but no import/call edges found in the codebase.`, + suggestedFix: `Either the contract is stale (remove it), or the coupling is runtime-only (DB, message bus, HTTP, filesystem) — add 'note: runtime-only' to suppress this flag.`, + }); + } + } + } + + // Category 3: rollout-order-inversion + // If A imports from B but A.rollout-order < B.rollout-order, B ships after A + // but A depends on B at compile time. Usually wrong; types-only imports can + // be legitimate exceptions. + for (const [key, pairEdges] of byPair.entries()) { + const [fromName, toName] = key.split("|"); + const fromComp = map.components.find((c) => c.name === fromName)!; + const toComp = map.components.find((c) => c.name === toName)!; + if (fromComp.rolloutOrder < toComp.rolloutOrder) { + flags.push({ + category: "rollout-order-inversion", + fromComponent: fromName, + toComponent: toName, + evidence: `'${fromName}' (rollout-order ${fromComp.rolloutOrder}) imports from '${toName}' (rollout-order ${toComp.rolloutOrder}). '${toName}' ships after '${fromName}' but '${fromName}' depends on it at build time.`, + suggestedFix: `Swap rollout-order values, OR if the import is types-only, add 'note: types-only' to the contract.`, + }); + } + } + + return flags; +} + +// ---------- Utility: stable component ordering for rollout ---------- + +export function rolloutOrder(map: SystemMap): Component[][] { + // Group components by rollout-order integer, return in ascending order. + // Each inner array contains components that can ship in parallel (same order). + const buckets = new Map<number, Component[]>(); + for (const c of map.components) { + if (!buckets.has(c.rolloutOrder)) buckets.set(c.rolloutOrder, []); + buckets.get(c.rolloutOrder)!.push(c); + } + return [...buckets.entries()] + .sort(([a], [b]) => a - b) + .map(([, comps]) => comps.sort((a, b) => a.name.localeCompare(b.name))); +} diff --git a/docs/designs/plan-rollout/system-md.schema.md b/docs/designs/plan-rollout/system-md.schema.md new file mode 100644 index 0000000000..8340cf6d62 --- /dev/null +++ b/docs/designs/plan-rollout/system-md.schema.md @@ -0,0 +1,230 @@ +# SYSTEM.md — the semantic contract graph + +`SYSTEM.md` is a declarative file at the root of a repository that describes +what each component *is*, what it *owns*, and the role-level contracts it has +with other components. It is the input to `/plan-rollout`, consumed by +`/spill-check`, `/ship` (stack mode), and `/review` (scope verification). + +## What SYSTEM.md is NOT + +It is not a package manifest. It does not list: + +- Import graphs or symbol-level callers +- NPM / Cargo / Gem / Go module versions +- Build dependencies or linker flags +- Test-framework wiring + +**Everything mechanical is discovered by the LLM at runtime** (AST, grep, +package manifests, git history). Declaring it here would go stale within a +week and cause more harm than good. + +## What SYSTEM.md IS + +It is the **semantic contract graph**: the relationships between components +that only a human knows. + +| Kind | Example | Where it belongs | +|------|---------|------------------| +| Role/contract dependency | "auth mints session tokens that middleware enforces; format change without middleware redeploy breaks sessions" | SYSTEM.md | +| Package/import dependency | "`auth.ts` imports `crypto-utils`; `middleware.ts` calls `auth.verify()`" | Discovered (NOT here) | + +The payoff: `/plan-rollout` reasons over the declared graph (semantic) jointly +with the discovered graph (mechanical). When they disagree, it surfaces the +disagreement for human resolution — either a contract is missing, a layering +violation exists, or the coupling is runtime-only and should be noted. + +## Schema (v1 — intra-repo) + +```yaml +--- +version: 1 +components: + - name: <string, unique within repo> + path: <string, repo-relative path to the component root> + repo: <string, optional; reserved for v2 multi-repo> + role: <string, one-line description of the component's job> + owns: + - <string, a data surface, table, API, or feature this component is source-of-truth for> + contracts: + - with: <string, name of another component> + nature: <string, what the relationship is in plain English> + breaks-if: <string, what human action causes the contract to break> + rollout-edge: <hard | soft> + note: <string, optional; e.g., "runtime-only coupling via message bus"> + rollout-order: <integer, lower = ship first; components with the same number can ship in parallel> +--- + +# System Map + +<Free-form markdown narrative. Document anti-patterns, incidents that shaped +current structure, deploy-edge semantics the team has learned the hard way. +This section is for humans, not parsers.> +``` + +### Field reference + +**`name`** — unique identifier used by other gstack artifacts +(`decomposition.md`, `rollout.md`) to reference the component. Keep short +and stable. + +**`path`** — where the component lives in the repo. Can be a file or a +directory. Used by `/spill-check` to classify which component a touched file +belongs to. + +**`role`** — one sentence describing what the component is FOR. Not what it +contains, not how it's built. What it does in the system. + +**`owns`** — data surfaces, tables, APIs, or features this component is the +single source of truth for. Two components claiming ownership of the same +surface is a design smell; the skill will flag it. + +**`contracts`** — the heart of SYSTEM.md. Each contract declares a role-level +relationship with another component. + +- **`with`**: the other component's name. +- **`nature`**: plain-English description of the relationship. +- **`breaks-if`**: the specific human action that violates the contract. This + is the field rollout planning reads — "session payload schema changes + without middleware redeploy" tells `/plan-rollout` these two PRs must ship + as a coordinated stage. +- **`rollout-edge`**: + - `hard` = must deploy together (e.g., a session-format change); `/plan-rollout` + will enforce same-step deploy, or block with explanation. + - `soft` = can lag (e.g., a logging metric addition); `/plan-rollout` will + note but not enforce simultaneity. +- **`note`** (optional): free-form annotation. Common values: + - `runtime-only` — coupling happens via DB, message bus, HTTP, or filesystem; + no code-level import exists. Prevents reconciliation from flagging it as + "contract without supporting imports." + - `legacy` — contract exists but is being phased out; useful context for + human reviewers. + +**`rollout-order`** — integer. Components with lower numbers ship first. +Components with the same number can ship in parallel (no inter-dependency in +this direction). Used as the default ordering for `/plan-rollout`, which the +user can override per-decomposition. + +## Example + +```yaml +--- +version: 1 +components: + - name: auth + path: src/auth + role: authentication + session lifecycle + owns: + - user table + - session table + - JWT minting + contracts: + - with: middleware + nature: middleware enforces session tokens that auth mints + breaks-if: session payload schema changes without middleware redeploy + rollout-edge: hard + - with: api-gateway + nature: gateway consumes tenant claims auth populates in user context + breaks-if: auth stops populating tenant claims + rollout-edge: soft + rollout-order: 1 + + - name: middleware + path: src/middleware + role: request routing + auth enforcement + owns: + - request context shape + - rate-limit tables + contracts: + - with: api-gateway + nature: gateway consumes req.user context middleware sets + breaks-if: req.user shape changes without gateway redeploy + rollout-edge: hard + rollout-order: 2 + + - name: api-gateway + path: src/gateway + role: external HTTP surface — only component exposed to the internet + owns: + - public API schema + - CORS policy + contracts: [] + rollout-order: 3 + + - name: metrics-pipeline + path: src/metrics + role: emit product analytics to the warehouse + owns: + - event schema registry + contracts: + - with: auth + nature: auth emits login/logout events consumed by pipeline + breaks-if: event schema version changes without consumer update + rollout-edge: soft + note: runtime-only (message bus) + rollout-order: 2 +--- + +# System Map + +auth and middleware are the security boundary. Any change that touches +session format or the user-context shape is a coordinated deploy — +rollout-edge:hard. We learned this the hard way after the Feb 2025 incident +where a session serializer change shipped 40 minutes ahead of middleware +and logged every user out. + +metrics-pipeline is runtime-only coupled to auth via the event bus. No import +edge exists. Reconciliation tools will flag the contract as "no supporting +imports" — that's expected; the `note: runtime-only` field suppresses the +flag after first acknowledgment. + +api-gateway is the one component we can deploy independently most of the +time. It has no declared contracts OUT, only contracts IN. +``` + +## Scaffolding a new SYSTEM.md + +If your repo has no SYSTEM.md, `/plan-rollout` offers to scaffold one. The +scaffolder (`lib/plan-rollout/system-map-scaffolder.ts`): + +1. Lists top-level directories that contain source files +2. Reads each directory's README / package.json / Cargo.toml / go.mod for a + description — drafts the `role:` field if one is found +3. Leaves `owns:`, `contracts:`, and `rollout-order:` empty with TODO markers +4. Reads CODEOWNERS (if present) — adds owner teams as comments for reference +5. Writes to `SYSTEM.md.draft`, never directly to `SYSTEM.md` + +The user is expected to: + +1. Review the draft +2. Fill in the TODO markers (role refinement, owns, contracts, rollout-order) +3. Rename `SYSTEM.md.draft` → `SYSTEM.md` +4. Commit + +**Why the draft-rename dance:** prevents an LLM-hallucinated SYSTEM.md from +becoming load-bearing without human review. The whole point of SYSTEM.md is +that it encodes knowledge the LLM does not have. + +## Keeping SYSTEM.md fresh + +SYSTEM.md drifts when components are renamed, split, or merged. `/plan-rollout` +treats drift as a reconciliation flag: + +- Component `path` no longer exists → block and prompt user to update +- No imports from a component that claims `contracts: [...]` → flag for review +- New top-level directory with source files → suggest adding as component + +A future `/system-map-audit` skill (v2) could run periodically to detect and +surface drift. Not in v1 scope. + +## Relationship to other declarative files + +| File | Purpose | Who writes it | +|------|---------|---------------| +| `CLAUDE.md` | Project-specific instructions for Claude (routing rules, test commands, etc.) | Human | +| `CODEOWNERS` | Who reviews changes to which paths | Human | +| `SYSTEM.md` | Semantic contract graph | Human (scaffolded, then edited) | +| `decomposition.md` | Per-change PR stack | `/plan-rollout` (ephemeral per feature) | +| `rollout.md` | Per-change rollout plan | `/plan-rollout` (ephemeral per feature) | + +SYSTEM.md is the long-lived, repo-wide truth. The `decomposition.md` and +`rollout.md` are per-change artifacts that reference it. diff --git a/docs/designs/plan-rollout/usage.md b/docs/designs/plan-rollout/usage.md new file mode 100644 index 0000000000..1ecadecc15 --- /dev/null +++ b/docs/designs/plan-rollout/usage.md @@ -0,0 +1,324 @@ +# Using `/plan-rollout` and `/spill-check` + +These two skills work together to solve a specific class of pain: **an +LLM-generated change set that is too big for a reviewer to meaningfully +ingest, with scope creep ("spills") hiding inside it.** If your change is one +file and one afternoon, you do not need these skills. If it's a feature that +naturally splits into a 3-PR stack, these skills make the stack obvious and +keep you in scope while you implement. + +--- + +## The three artifacts + +### `SYSTEM.md` (repo root, committed) + +The semantic contract graph for your repo. Declares each component's role, +what it owns, and the role-level contracts it has with other components. +Long-lived. Authored once per repo, edited as components are added or renamed. + +See [`SYSTEM-MD.md`](./SYSTEM-MD.md) for the full schema. + +### `decomposition.md` (per change, generated) + +The PR stack for a specific piece of work. Written by `/plan-rollout`. +Contains: PR units with files, dependencies, reading-order for reviewers, +time-budget estimates, and an ASCII stack map. Consumed by `/spill-check`, +`/ship`, and `/review`. + +### `rollout.md` (per change, generated) + +The rollout plan for a specific piece of work. Written by `/plan-rollout`. +Contains: strategy (flag / canary / migration-first / big-bang), step +sequence with inverse rollback auto-generated, verify metrics per step. +Consumed by `/land-and-deploy`. + +--- + +## The core flow + +``` + 1. /plan-eng-review (or /plan-ceo-review — or your own plan) + │ + ▼ + 2. /plan-rollout ──▶ decomposition.md + rollout.md + │ (and SYSTEM.md if not present) + ▼ + 3. Implement PR-1 ────┐ + │ │ + ▼ │ /spill-check runs on demand + commit │ or as /ship pre-gate + ▼ │ + 4. /ship (stack mode) ──┘ ──▶ PR-1 opened with reader guide, + time budget, dep narration + ▼ + 5. Review + merge PR-1 + ▼ + 6. Repeat 3-5 for PR-2, PR-3, ... + ▼ + 7. /land-and-deploy ──▶ reads rollout.md, sequences deploy steps +``` + +--- + +## Step-by-step: your first run + +### 0. Prerequisite — commit a plan + +Have a plan on disk. The plan can come from `/plan-eng-review`, `/plan-ceo-review`, +a design doc you wrote yourself, or just a well-scoped GitHub issue you've +captured locally. `/plan-rollout` will not guess at scope — it starts from +your plan. + +### 1. First-time SYSTEM.md setup + +If your repo has no `SYSTEM.md` at the root, `/plan-rollout` offers to +scaffold one on first run: + +``` +> /plan-rollout + +No SYSTEM.md found at the repo root. /plan-rollout needs it to reason about +role-level contracts between components. + +A) Scaffold SYSTEM.md.draft now (recommended) +B) Let me hand-write SYSTEM.md, then re-run /plan-rollout +C) Run /plan-rollout without SYSTEM.md (degraded mode — flag-level only) +``` + +Pick **A**. The scaffolder writes `SYSTEM.md.draft` with every top-level +directory as a component, role inferred from README/module docs, and TODO +markers for the fields only you can fill in (`owns`, `contracts`, +`rollout-order`). + +Now edit `SYSTEM.md.draft`: + +- Refine each `role` line to one sentence describing what the component is FOR +- Fill `owns` with data surfaces, tables, or APIs the component is source-of-truth for +- Fill `contracts` with role-level dependencies on other components, including + `breaks-if:` (the concrete human action that violates the contract) and + `rollout-edge: hard | soft` +- Assign `rollout-order` integers (lower = ships first) + +Rename `SYSTEM.md.draft` → `SYSTEM.md`. Commit. + +You only do this once per repo. Rerun `/plan-rollout`. + +### 2. Running /plan-rollout + +``` +> /plan-rollout +``` + +The skill does: + +1. **Reads your plan** (from `/plan-eng-review` output, your plan file, or the current conversation) +2. **Parses SYSTEM.md** and builds the contract graph +3. **Discovers the import graph** via AST walk across your source tree +4. **Reconciles both graphs** — flags "import without declared contract", + "declared contract with no supporting imports", "rollout-order inversion" + cases for you to resolve (each via AskUserQuestion) +5. **Proposes a PR decomposition** applying these heuristics: + - Component boundary — different SYSTEM.md components → different PR units + - Migration-first — DB migrations ship in PR #1 + - Interface-first — types and schemas before implementations + - Pure additions first, mutations later + - Tests travel with their code + - Flag-gate before flag-flip + - Reviewer-budget cap (no PR unit > 30 min review time) +6. **Shows the stack as an ASCII Gantt** with reviewer time totals — + asks you to confirm, revise, or split further +7. **Proposes a rollout strategy** (big-bang / flag / canary / migration-first) +8. **Auto-generates inverse rollback lines** for each rollout step +9. **Writes the two artifacts** to `~/.gstack/projects/$SLUG/` +10. **Logs the review** for the Review Readiness Dashboard + +Output when done: + +``` ++================================================================+ +| /plan-rollout — COMPLETION SUMMARY | ++================================================================+ +| SYSTEM.md | present | +| Reconciliation | 3 flags, 3 resolved, 0 suppressed | +| PR units | 3 | +| Total reviewer mins | 55 | +| Rollout strategy | big-bang | +| Rollout steps | 3 with auto-rollback, 0 manual | +| Artifacts | decomposition.md, rollout.md | +| Next | start PR #1 | ++================================================================+ +``` + +### 3. Implementing with /spill-check discipline + +Start PR #1. As you code, `/spill-check` tells you whether the diff you've +built is in scope for the PR unit you declared: + +``` +> /spill-check + +Current PR unit: [1] "feat(router): add optional findAllowedMethods..." + +In scope (declared, touched): + ✓ src/router.ts + ✓ src/router/trie-router/router.ts + ✓ src/router/trie-router/router.test.ts + +Declared but untouched (maybe you're not done): + - src/router/trie-router/node.ts + +Soft spills (warned, allowed — infra/meta files): + - CHANGELOG.md + - bun.lock + +Hard spills (out of scope for this PR unit): + ✗ src/hono-base.ts — intended for PR unit [3] + +1 hard spill. Resolve before shipping. + +A) Carve src/hono-base.ts to a separate branch +B) Extend decomposition.md to add this file to current unit +C) Revert the change +D) Add soft-spill rule for this path +``` + +Pick the right resolution. `/spill-check` stages the change for you. + +Run `/spill-check` as often as you like. It's also the automatic pre-ship gate +when you invoke `/ship` with a decomposition.md present. + +### 4. Shipping PR-1 with stack-aware /ship + +``` +> /ship +``` + +When `decomposition.md` exists, `/ship` enters **stack mode**: + +- Detects which PR unit the branch represents +- Runs `/spill-check` as a gate (halts if hard spills present) +- Titles the PR from `decomposition.md` (conventional commits format) +- Generates the PR body with: + - The declared rationale for this unit + - Reader guide block: "Read `<file>` first, then `<file>`..." + - Dependency note: "Depends on PR #412 (merged)" / "Followed by PR #414" + - Reviewer time budget: "Est. 18 min" + - Rollout link: "Part of issue #4633; see rollout.md" +- Creates the PR + +Reviewers open a PR that is **immediately legible**: they know what to read +first, how long it should take, and what comes next. That's the whole point. + +### 5. Repeat for each PR unit + +After PR-1 merges, create PR-2's branch from PR-1's branch (standard stacking +mechanic). `/plan-rollout`'s decomposition.md says which files belong to PR-2. +Implement → /spill-check → /ship. Same flow. + +When the whole stack is merged, `/land-and-deploy` reads `rollout.md` and +sequences the rollout steps with their verify blocks. + +--- + +## When NOT to use these skills + +- **One-file, one-afternoon changes.** `/plan-rollout`'s ceremony costs 5-10 + minutes. Not worth it for a 50-line bug fix. +- **Research/spike work.** If you're exploring, not shipping, the + decomposition step is premature. +- **Hotfixes.** A production incident demands fast; use `/ship` directly, do + the post-mortem decomposition later if you want. +- **Docs-only PRs.** Trivial scope; the overhead doesn't pay back. + +The skills pay back hardest on changes that span 3+ files across 2+ +components with meaningful rollout risk. + +--- + +## Common issues and how to resolve them + +### "The scaffolder inferred nonsense roles for my components" + +Expected — the scaffolder is a starting point, not a finished product. It +aims for ~60% accuracy so you can edit rather than write from blank. If a +role field is wrong, overwrite it. If a whole component is wrong (e.g., the +scaffolder treated your `scripts/` dir as a component), delete the block. +The scaffolder only runs on the first pass; you own `SYSTEM.md` after that. + +### "Reconciliation flagged an 'import without contract' but it's legitimate" + +Three legitimate cases: + +1. **It's actually a layering violation.** Fix the code. +2. **It's a transitive import through a leaf utility** (e.g., `src/utils/`). + Declare the utility dir as `kind: leaf-util` — reconciler will ignore + edges through it. +3. **The contract is runtime-only** (coupling via DB, message bus, or + filesystem; no code-level import). Add the contract to SYSTEM.md with + `note: runtime-only`. + +### "The reviewer time-budget estimate feels off" + +It's uncalibrated in v1. Treat the numbers as directional (PR-1 is smaller +than PR-2), not absolute. Predicted-vs-actual data is logged to analytics so +the coefficients improve in v2. + +### "/spill-check is flagging a legit touch on a file I need to change" + +You have three paths: + +1. **Extend the decomposition** — the file genuinely belongs to the current + unit; you missed it in planning. Pick option B in the spill prompt. +2. **Carve the change** — the file is needed but unrelated to this unit's + purpose. Pick option A; the change goes to a separate branch. +3. **Project-level allowlist** — the file is infrastructure/meta and should + never be considered a spill in this repo (e.g., a custom codegen config). + Pick option D; adds the file to `.gstack/spill-check.yml`. + +### "I want the decomposition.md visible in the PR description" + +The artifact lives in `~/.gstack/projects/$SLUG/` by default. Run +`/plan-rollout --also-project-root` (or symlink manually) to mirror it into +your repo. Reviewers who don't use gstack can then read it directly. + +### "I'm shipping a library, but the rollout.md template talks about services" + +Known gap in v1. Library rollouts look different from service rollouts +(publish-and-revert vs. coordinated deploy). The `package-type:` field in +SYSTEM.md addresses this in v2; for now, edit the generated rollout.md by +hand to fit your context. + +--- + +## Relationship to other gstack skills + +| Skill | Produces | Consumes | +|-------|----------|----------| +| `/plan-eng-review` | eng plan | — | +| `/plan-ceo-review` | CEO plan | — | +| **`/plan-rollout`** | **`decomposition.md`, `rollout.md`** | **eng/CEO plan, `SYSTEM.md`** | +| **`/spill-check`** | **spill report, carve-branch if needed** | **`decomposition.md`** | +| `/ship` (stack mode) | stacked PRs with reader guides | `decomposition.md`, `rollout.md` | +| `/review` (scope-aware) | scope-verified review | `decomposition.md` | +| `/land-and-deploy` | deployed rollout | `rollout.md` | +| `/canary` | post-deploy regression watch | `rollout.md` verify blocks | + +Every skill downstream of `/plan-rollout` reads its artifacts. That is the +integration spine — the reason this skill exists as the spine of +gstack's plan-to-prod pipeline, not a standalone tool. + +--- + +## Command reference + +``` +/plan-rollout # standard run +/plan-rollout --trim # remove declared-but-untouched files (v2) +/plan-rollout --also-project-root # mirror artifacts into repo root (v2) +/plan-rollout --dry-run # preview decomposition without writing (v2) + +/spill-check # interactive — asks what to do with spills +/spill-check --gate # non-interactive; exits non-zero if hard spills +/spill-check --unit <id> # override auto-inferred PR unit +```