Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,25 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit

## Testing

There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are:
### Automated tests (CI)

The repository uses a tiered test strategy. All tests live in `tests/`. CI runs tier-2 invariants and tier-1 skill tests on every PR.

| Tier | How to run | Speed |
|---|---|---|
| 2 — invariants | `bash tests/test-invariants.sh` | <1s, no Claude needed |
| 1 — skill tests | `bash tests/run-tests.sh` | 4–5 min, uses Claude tokens |
| 3 — E2E (manual) | `bash tests/test-e2e.sh` | 20–35 min, see `tests/README.md` |

Run tier-2 locally before every commit and tier-1 before opening a PR. Tier-3 runs are manual, reserved for releases and significant catalog changes.

### Manual validation steps

For changes not covered by automated tests (OAuth tweak shape, `.lock.yml` validity):

1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors.
2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow.
3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
2. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
3. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).

Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run.

Expand Down
6 changes: 5 additions & 1 deletion catalog/agent-team/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ Each agent finishes its work by **emitting a `dispatch-workflow` safe-output** n

`state:*` labels (`plan-needed`, `impl-needed`, `review-needed`, `done`, `blocked`) are **cosmetic breadcrumbs for humans** — they let the GitHub UI show pipeline progress at a glance. They do **not** drive control flow; the `dispatch-workflow` safe-outputs do.

### `pr_number` lifecycle

`pr_number` is optional on the implementer. On the first implementation attempt, it is blank — the implementer creates a new draft PR and captures the resulting PR number. On reviewer kickback, the reviewer passes that same `pr_number` back to the implementer (along with a bumped `iteration`), so the implementer pushes fixes to the existing PR branch instead of opening a second one. The issue always closes via a single PR regardless of how many kickback cycles occur.

## The comment contract

Agents communicate their work product via fenced HTML-comment blocks, which downstream agents grep out of the issue body + comments. Never rely on prose ordering.
Expand Down Expand Up @@ -111,7 +115,7 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl

- **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue.
- **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback.
- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback.
- **Input propagation**: planner, implementer, and reviewer each validate all required `workflow_dispatch` inputs before doing any work. If a required input is empty, whitespace-only, or still appears as an unresolved template literal (e.g. `${{ github.event.inputs.issue_number }}`), the workflow posts `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` on the issue and stops. Do not rely on label search or recent-activity inference as a fallback — that approach hides dispatch bugs and silently corrupts pipeline state. The `pr_number` input on the implementer is optional by design and is treated as "not set" when blank or when it matches the unresolved literal pattern.
- **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff.
- **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs.
- **No auto-merge**: the reviewer approves but never merges. Humans merge.
Expand Down