verkyyi · github-actions · May 5, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -74,12 +74,25 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit
 
 ## Testing
 
-There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are:
+### Automated tests (CI)
+
+The repository uses a tiered test strategy. All tests live in `tests/`. CI runs tier-2 invariants and tier-1 skill tests on every PR.
+
+| Tier | How to run | Speed |
+|---|---|---|
+| 2 — invariants | `bash tests/test-invariants.sh` | <1s, no Claude needed |
+| 1 — skill tests | `bash tests/run-tests.sh` | 4–5 min, uses Claude tokens |
+| 3 — E2E (manual) | `bash tests/test-e2e.sh` | 20–35 min, see `tests/README.md` |
+
+Run tier-2 locally before every commit and tier-1 before opening a PR. Tier-3 runs are manual, reserved for releases and significant catalog changes.
+
+### Manual validation steps
+
+For changes not covered by automated tests (OAuth tweak shape, `.lock.yml` validity):
 
 1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors.
-2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow.
-3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
-4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
+2. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
+3. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
 
 Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run.
 

diff --git a/catalog/agent-team/README.md b/catalog/agent-team/README.md
@@ -48,6 +48,10 @@ Each agent finishes its work by **emitting a `dispatch-workflow` safe-output** n
 
 `state:*` labels (`plan-needed`, `impl-needed`, `review-needed`, `done`, `blocked`) are **cosmetic breadcrumbs for humans** — they let the GitHub UI show pipeline progress at a glance. They do **not** drive control flow; the `dispatch-workflow` safe-outputs do.
 
+### `pr_number` lifecycle
+
+`pr_number` is optional on the implementer. On the first implementation attempt, it is blank — the implementer creates a new draft PR and captures the resulting PR number. On reviewer kickback, the reviewer passes that same `pr_number` back to the implementer (along with a bumped `iteration`), so the implementer pushes fixes to the existing PR branch instead of opening a second one. The issue always closes via a single PR regardless of how many kickback cycles occur.
+
 ## The comment contract
 
 Agents communicate their work product via fenced HTML-comment blocks, which downstream agents grep out of the issue body + comments. Never rely on prose ordering.
@@ -111,7 +115,7 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl
 
 - **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue.
 - **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback.
-- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback.
+- **Input propagation**: planner, implementer, and reviewer each validate all required `workflow_dispatch` inputs before doing any work. If a required input is empty, whitespace-only, or still appears as an unresolved template literal (e.g. `${{ github.event.inputs.issue_number }}`), the workflow posts `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` on the issue and stops. Do not rely on label search or recent-activity inference as a fallback — that approach hides dispatch bugs and silently corrupts pipeline state. The `pr_number` input on the implementer is optional by design and is treated as "not set" when blank or when it matches the unresolved literal pattern.
 - **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff.
 - **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs.
 - **No auto-merge**: the reviewer approves but never merges. Humans merge.