From ccc214481a97a726cc6952c40e8dd2c2b8250d11 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 5 May 2026 11:06:55 +0000 Subject: [PATCH] docs: document dispatch-input hardening and pr_number lifecycle MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - catalog/agent-team/README.md: add `pr_number` lifecycle section to the handoff model (blank on first impl → creates PR; reviewer passes it back on kickback → implementer pushes to existing branch). Expand "Input propagation" gotcha with the exact error message, unresolved-literal detection, and the optional-by-design note for pr_number. - CONTRIBUTING.md: replace "no automated test harness" with the actual tiered test strategy (tier-2 invariants, tier-1 skill tests, tier-3 E2E) and clear guidance on when to run each tier. Follows PR #69 (fix agent-team dispatch-input hardening). Co-Authored-By: Claude Sonnet 4.6 --- CONTRIBUTING.md | 21 +++++++++++++++++---- catalog/agent-team/README.md | 6 +++++- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1c2721e..84092c7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -74,12 +74,25 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit ## Testing -There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are: +### Automated tests (CI) + +The repository uses a tiered test strategy. All tests live in `tests/`. CI runs tier-2 invariants and tier-1 skill tests on every PR. + +| Tier | How to run | Speed | +|---|---|---| +| 2 — invariants | `bash tests/test-invariants.sh` | <1s, no Claude needed | +| 1 — skill tests | `bash tests/run-tests.sh` | 4–5 min, uses Claude tokens | +| 3 — E2E (manual) | `bash tests/test-e2e.sh` | 20–35 min, see `tests/README.md` | + +Run tier-2 locally before every commit and tier-1 before opening a PR. Tier-3 runs are manual, reserved for releases and significant catalog changes. + +### Manual validation steps + +For changes not covered by automated tests (OAuth tweak shape, `.lock.yml` validity): 1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors. -2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow. -3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile. -4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape). +2. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile. +3. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape). Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run. diff --git a/catalog/agent-team/README.md b/catalog/agent-team/README.md index 5490837..43f6828 100644 --- a/catalog/agent-team/README.md +++ b/catalog/agent-team/README.md @@ -48,6 +48,10 @@ Each agent finishes its work by **emitting a `dispatch-workflow` safe-output** n `state:*` labels (`plan-needed`, `impl-needed`, `review-needed`, `done`, `blocked`) are **cosmetic breadcrumbs for humans** — they let the GitHub UI show pipeline progress at a glance. They do **not** drive control flow; the `dispatch-workflow` safe-outputs do. +### `pr_number` lifecycle + +`pr_number` is optional on the implementer. On the first implementation attempt, it is blank — the implementer creates a new draft PR and captures the resulting PR number. On reviewer kickback, the reviewer passes that same `pr_number` back to the implementer (along with a bumped `iteration`), so the implementer pushes fixes to the existing PR branch instead of opening a second one. The issue always closes via a single PR regardless of how many kickback cycles occur. + ## The comment contract Agents communicate their work product via fenced HTML-comment blocks, which downstream agents grep out of the issue body + comments. Never rely on prose ordering. @@ -111,7 +115,7 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl - **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue. - **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback. -- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback. +- **Input propagation**: planner, implementer, and reviewer each validate all required `workflow_dispatch` inputs before doing any work. If a required input is empty, whitespace-only, or still appears as an unresolved template literal (e.g. `${{ github.event.inputs.issue_number }}`), the workflow posts `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` on the issue and stops. Do not rely on label search or recent-activity inference as a fallback — that approach hides dispatch bugs and silently corrupts pipeline state. The `pr_number` input on the implementer is optional by design and is treated as "not set" when blank or when it matches the unresolved literal pattern. - **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff. - **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs. - **No auto-merge**: the reviewer approves but never merges. Humans merge.