From df64d340c3c0adf744d72b481599b4ecd78bedc6 Mon Sep 17 00:00:00 2001 From: Sven Date: Wed, 25 Mar 2026 13:43:45 +0100 Subject: [PATCH] docs(agents): add Anti-Rationalization tables and Tier 2 Agent Teams docs Add Anti-Rationalization Table sections to executor.md and planner.md agent definitions to enforce evidence-based claims. Add Tier 2 Agent Teams documentation to AGENTS.md covering activation, communication, and hooks for multi-agent orchestration. Co-Authored-By: Claude Opus 4.6 (1M context) --- templates/agents/AGENTS.md | 26 ++++++++++++++++++++++++++ templates/agents/executor.md | 17 +++++++++++++++++ templates/agents/planner.md | 17 +++++++++++++++++ 3 files changed, 60 insertions(+) diff --git a/templates/agents/AGENTS.md b/templates/agents/AGENTS.md index bd685797..82f1b981 100644 --- a/templates/agents/AGENTS.md +++ b/templates/agents/AGENTS.md @@ -92,3 +92,29 @@ All skills use `user-invocable: false` -- agents auto-invoke them based on descr ## Planner Read-Only Enforcement The `planner` agent runs with `permissionMode: plan`. This enforces read-only access to the filesystem -- the planner can analyze the codebase and return plan content, but cannot execute commands that modify source files or run builds. This prevents the planner from accidentally beginning execution during the planning phase. + +## Tier 2 — Agent Teams + +When `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` is set and Claude Code supports it, MaxsimCLI can use multi-agent orchestration via Agent Teams. + +### Activation + +Tier 2 activates when: +1. Environment variable `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is set to `1` +2. A `TeamCreate` probe succeeds (feature is available in the runtime) + +If either condition fails, all workflows gracefully degrade to Tier 1 (subagents via the `Agent` tool). + +### Communication + +Teams coordinate exclusively through: +- **Task lists** — `.claude/tasks/{team-name}/` for pending work +- **GitHub Issues** — Phase tracking, task sub-issues, plan comments +- **Handoff contracts** — Structured output posted as GitHub Issue comments +- **SendMessage** — Direct inter-agent messages within the same team + +### Hooks + +Two hooks support Tier 2 operations: +- `maxsim-teammate-idle` (TeammateIdle) — Checks for pending tasks and assigns idle teammates +- `maxsim-task-completed` (TaskCompleted) — Runs verification gates (test, build, lint) before allowing task completion diff --git a/templates/agents/executor.md b/templates/agents/executor.md index f2da10ec..59f93e6e 100644 --- a/templates/agents/executor.md +++ b/templates/agents/executor.md @@ -85,6 +85,23 @@ When the plan frontmatter includes a `requirements` field, populate the `## Requ Every requirement ID from the plan MUST have an entry. +## Anti-Rationalization Table + +These phrases are NEVER acceptable as evidence. If you catch yourself using them, STOP and provide actual tool output instead. + +| Forbidden Phrase | Why It Fails | +|---|---| +| "should work" | Describes expectation, not observed outcome | +| "I already checked" | Not verifiable in this session | +| "tests were passing before" | Stale evidence; fresh run required | +| "this is obviously correct" | Correctness is measured, not assessed by inspection | +| "I think it's fine" | No tool output, no claim | +| "the logic is sound" | Logic can be sound and still produce wrong output | +| "nothing changed in that area" | Changes in dependencies, configs, and imports are invisible to this claim | +| "it worked in my local run" | Local run is not this session's evidence unless tool output is shown | +| "we can verify later" | Verification deferred is verification skipped | +| "this is low risk" | Risk level does not substitute for evidence | + ## Completion Gate Before returning results: diff --git a/templates/agents/planner.md b/templates/agents/planner.md index ddba8b6e..9b7ff19f 100644 --- a/templates/agents/planner.md +++ b/templates/agents/planner.md @@ -88,6 +88,23 @@ After writing the plan, verify backward from the phase goal: If gaps exist, add tasks to close them before finalizing. +## Anti-Rationalization Table + +These phrases are NEVER acceptable as evidence. If you catch yourself using them, STOP and provide actual tool output instead. + +| Forbidden Phrase | Why It Fails | +|---|---| +| "should work" | Describes expectation, not observed outcome | +| "I already checked" | Not verifiable in this session | +| "tests were passing before" | Stale evidence; fresh run required | +| "this is obviously correct" | Correctness is measured, not assessed by inspection | +| "I think it's fine" | No tool output, no claim | +| "the logic is sound" | Logic can be sound and still produce wrong output | +| "nothing changed in that area" | Changes in dependencies, configs, and imports are invisible to this claim | +| "it worked in my local run" | Local run is not this session's evidence unless tool output is shown | +| "we can verify later" | Verification deferred is verification skipped | +| "this is low risk" | Risk level does not substitute for evidence | + ## Completion Gate Before returning, verify the plan: