From ccc214481a97a726cc6952c40e8dd2c2b8250d11 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Tue, 5 May 2026 11:06:55 +0000
Subject: [PATCH] docs: document dispatch-input hardening and pr_number
 lifecycle
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- catalog/agent-team/README.md: add `pr_number` lifecycle section to the
  handoff model (blank on first impl → creates PR; reviewer passes it back
  on kickback → implementer pushes to existing branch). Expand "Input
  propagation" gotcha with the exact error message, unresolved-literal
  detection, and the optional-by-design note for pr_number.
- CONTRIBUTING.md: replace "no automated test harness" with the actual
  tiered test strategy (tier-2 invariants, tier-1 skill tests, tier-3 E2E)
  and clear guidance on when to run each tier.

Follows PR #69 (fix agent-team dispatch-input hardening).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 CONTRIBUTING.md              | 21 +++++++++++++++++----
 catalog/agent-team/README.md |  6 +++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 1c2721e..84092c7 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -74,12 +74,25 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit
 
 ## Testing
 
-There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are:
+### Automated tests (CI)
+
+The repository uses a tiered test strategy. All tests live in `tests/`. CI runs tier-2 invariants and tier-1 skill tests on every PR.
+
+| Tier | How to run | Speed |
+|---|---|---|
+| 2 — invariants | `bash tests/test-invariants.sh` | <1s, no Claude needed |
+| 1 — skill tests | `bash tests/run-tests.sh` | 4–5 min, uses Claude tokens |
+| 3 — E2E (manual) | `bash tests/test-e2e.sh` | 20–35 min, see `tests/README.md` |
+
+Run tier-2 locally before every commit and tier-1 before opening a PR. Tier-3 runs are manual, reserved for releases and significant catalog changes.
+
+### Manual validation steps
+
+For changes not covered by automated tests (OAuth tweak shape, `.lock.yml` validity):
 
 1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors.
-2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow.
-3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
-4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
+2. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
+3. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
 
 Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run.
 
diff --git a/catalog/agent-team/README.md b/catalog/agent-team/README.md
index 5490837..43f6828 100644
--- a/catalog/agent-team/README.md
+++ b/catalog/agent-team/README.md
@@ -48,6 +48,10 @@ Each agent finishes its work by **emitting a `dispatch-workflow` safe-output** n
 
 `state:*` labels (`plan-needed`, `impl-needed`, `review-needed`, `done`, `blocked`) are **cosmetic breadcrumbs for humans** — they let the GitHub UI show pipeline progress at a glance. They do **not** drive control flow; the `dispatch-workflow` safe-outputs do.
 
+### `pr_number` lifecycle
+
+`pr_number` is optional on the implementer. On the first implementation attempt, it is blank — the implementer creates a new draft PR and captures the resulting PR number. On reviewer kickback, the reviewer passes that same `pr_number` back to the implementer (along with a bumped `iteration`), so the implementer pushes fixes to the existing PR branch instead of opening a second one. The issue always closes via a single PR regardless of how many kickback cycles occur.
+
 ## The comment contract
 
 Agents communicate their work product via fenced HTML-comment blocks, which downstream agents grep out of the issue body + comments. Never rely on prose ordering.
@@ -111,7 +115,7 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl
 
 - **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue.
 - **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback.
-- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback.
+- **Input propagation**: planner, implementer, and reviewer each validate all required `workflow_dispatch` inputs before doing any work. If a required input is empty, whitespace-only, or still appears as an unresolved template literal (e.g. `${{ github.event.inputs.issue_number }}`), the workflow posts `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` on the issue and stops. Do not rely on label search or recent-activity inference as a fallback — that approach hides dispatch bugs and silently corrupts pipeline state. The `pr_number` input on the implementer is optional by design and is treated as "not set" when blank or when it matches the unresolved literal pattern.
 - **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff.
 - **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs.
 - **No auto-merge**: the reviewer approves but never merges. Humans merge.