From 8460d8a6d4e95522b6e90a9e10cda6a34ae14320 Mon Sep 17 00:00:00 2001 From: Sven Date: Wed, 25 Mar 2026 22:59:36 +0100 Subject: [PATCH] feat: add executable Tier 2 Agent Teams patterns to execute.md and maxsim-batch SKILL.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace prose descriptions of Tier 2 competitive implementation with concrete TeamCreate/SendMessage call syntax in execute.md section 6.3. Add three complete Tier 2 workflow patterns to maxsim-batch SKILL.md: competitive implementation (debate), multi-reviewer code review (cross-checking), and collaborative debugging (adversarial hypothesis testing). Each pattern includes TeamCreate, teammate spawn, SendMessage exchange, verifier resolution, and Tier 1 graceful degradation fallback. Removes "planned but not yet implemented" disclaimer. Addresses PROJECT.md §7.2 audit gap (Parallelism PARTIAL 1-3). Co-Authored-By: Claude Opus 4.6 (1M context) --- templates/skills/maxsim-batch/SKILL.md | 327 ++++++++++++++++++++++++- templates/workflows/execute.md | 95 ++++++- 2 files changed, 409 insertions(+), 13 deletions(-) diff --git a/templates/skills/maxsim-batch/SKILL.md b/templates/skills/maxsim-batch/SKILL.md index 75802d3b..86f48afd 100644 --- a/templates/skills/maxsim-batch/SKILL.md +++ b/templates/skills/maxsim-batch/SKILL.md @@ -87,7 +87,7 @@ When all agents complete: Agent Teams (available since Claude Code v2.1.32, Feb 2026) enable inter-agent communication for workflows that require debate, cross-checking, or collaborative problem-solving. MaxsimCLI sets `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` during install and registers `TeammateIdle` and `TaskCompleted` quality-gate hooks. -**Current status:** Infrastructure is in place (env var, hooks). Workflow templates that invoke `TeamCreate`/`SendMessage` for Tier 2 patterns (competitive implementation, multi-reviewer code review, collaborative debugging) are planned but not yet implemented. All workflows currently use Tier 1 subagents. See PROJECT.md §7.2 for the full specification. +**Current status:** Infrastructure is in place (env var, hooks). Tier 2 workflow patterns (competitive implementation, multi-reviewer code review, collaborative debugging) are defined below with executable `TeamCreate`/`SendMessage` call syntax. All workflows gracefully degrade to Tier 1 subagents when Agent Teams are unavailable. See PROJECT.md §7.2 for the authoritative specification. ### Tier Selection Logic @@ -103,11 +103,324 @@ MaxsimCLI chooses the tier automatically based on the workflow: | Collaborative debugging | Tier 2 (Agent Teams) | Hypotheses need adversarial testing | | Architecture exploration | Tier 2 (Agent Teams) | Requires discussion | -**When Tier 2 is ready, it will be used for:** -- Competitive implementation with adversarial debate -- Multi-dimensional code review (security + performance + test coverage) -- Collaborative debugging with competing hypotheses -- Cross-layer feature work (frontend + backend + tests) +### Tier 2 Activation Check + +Before using any Tier 2 pattern, verify availability: + +```bash +# 1. Check env var +[ "$CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS" = "1" ] || { echo "Tier 2 unavailable: env var not set"; TIER=1; } +``` + +``` +# 2. Probe TeamCreate (lightweight -- create and immediately clean up) +TeamCreate(team_name: "probe-{timestamp}", description: "availability check") +# If probe fails, set TIER=1 and log reason +``` + +If either check fails, skip to the Graceful Degradation section below. Do not attempt Tier 2 patterns. + +--- + +### Pattern 1 -- Competitive Implementation (Debate) + +**Use when:** A task is marked `critical` and `config.execution.competitive_enabled` is `true`. Multiple agents implement the same task independently, then adversarially critique each other's work before a neutral verifier selects the winner. + +**Flow:** `TeamCreate` --> spawn 2-3 competitors --> each implements independently --> `SendMessage` critiques --> fresh verifier judges --> winner selected. + +**Step 1 -- Create the competition team:** +``` +TeamCreate( + team_name: "competition-phase-{N}-task-{id}", + description: "Competitive implementation: {task_description}. Each teammate implements independently, then reviews the others adversarially." +) +``` + +**Step 2 -- Spawn competing teammates:** +Spawn 2 teammates minimum, 3 for tasks labeled `critical`. Each gets a distinct approach directive and the full task context. + +``` +// Teammate A -- conservative approach +Spawn teammate "competitor-a" with prompt: + "Implement {task_description} using approach: CONSERVATIVE. + Prefer existing patterns, minimal new abstractions, conventional solutions. + Work in isolation until the review phase. + Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}. + Success criteria: {criteria from plan}. + When done, commit your work and report RESULT: PASS or RESULT: FAIL." +Model: {executor_model} + +// Teammate B -- innovative approach +Spawn teammate "competitor-b" with prompt: + "Implement {task_description} using approach: INNOVATIVE. + Optimize for performance and elegance, explore novel patterns where justified. + Work in isolation until the review phase. + Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}. + Success criteria: {criteria from plan}. + When done, commit your work and report RESULT: PASS or RESULT: FAIL." +Model: {executor_model} + +// (Optional -- critical tasks only) Teammate C -- defensive approach +Spawn teammate "competitor-c" with prompt: + "Implement {task_description} using approach: DEFENSIVE. + Maximize error handling, edge case coverage, and robustness. + Work in isolation until the review phase. + Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}. + Success criteria: {criteria from plan}. + When done, commit your work and report RESULT: PASS or RESULT: FAIL." +Model: {executor_model} +``` + +**Step 3 -- Adversarial critique via SendMessage:** +After all teammates complete, each reviews the others' implementations: + +``` +SendMessage({ + type: "message", + recipient: "competitor-b", + content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.", + summary: "Requesting adversarial review of competitor-a" +}) + +SendMessage({ + type: "message", + recipient: "competitor-a", + content: "Review competitor-b's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.", + summary: "Requesting adversarial review of competitor-b" +}) +``` + +**Step 4 -- Fresh verifier selects winner:** +Spawn a verifier agent (NOT a team member) to evaluate both implementations and both critiques: + +``` +Agent( + subagent_type: "verifier", + model: "{verifier_model}", + prompt: " + Judge a competitive implementation. Agents implemented the same task independently, then critiqued each other. + + Implementations: competitor-a (CONSERVATIVE), competitor-b (INNOVATIVE) + Critiques: {critique summaries} + + Selection criteria (priority order): + 1. Correctness -- satisfies all success criteria + 2. Test coverage -- edge cases tested + 3. Code quality -- readability, codebase consistency + 4. Simplicity -- fewer abstractions when correctness is equal + + Output: WINNER: competitor-{a|b|c} + Followed by justification. + " +) +``` + +Discard the losing worktree branch. Merge the winner via the standard flow. + +**Tier 1 fallback:** Spawn 2 independent executor subagents via `Agent(isolation: "worktree", run_in_background: true)` with different approach prompts. After both complete, spawn a verifier to compare. No inter-agent messaging -- the verifier reads both outputs directly. + +--- + +### Pattern 2 -- Multi-Reviewer Code Review (Cross-Checking) + +**Use when:** A PR or implementation requires review from multiple specialist perspectives that must challenge each other's findings. + +**Flow:** `TeamCreate` --> spawn 3 specialist reviewers --> each reviews independently --> `SendMessage` to share findings --> each reviewer challenges other reviewers' findings --> coordinator synthesizes unified report. + +**Step 1 -- Create the review team:** +``` +TeamCreate( + team_name: "review-phase-{N}-task-{id}", + description: "Multi-dimensional code review: {description}. Reviewers share and cross-check findings." +) +``` + +**Step 2 -- Spawn specialist reviewers:** +``` +// Security reviewer +Spawn teammate "reviewer-security" with prompt: + "Review the implementation for security concerns: authentication, authorization, input validation, injection risks, token handling, data exposure. + Files to review: {file list or PR reference}. + Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number). + When done, send your findings to reviewer-performance and reviewer-tests." +Model: {executor_model} + +// Performance reviewer +Spawn teammate "reviewer-performance" with prompt: + "Review the implementation for performance concerns: N+1 queries, missing indexes, unnecessary allocations, caching opportunities, algorithmic complexity. + Files to review: {file list or PR reference}. + Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number). + When done, send your findings to reviewer-security and reviewer-tests." +Model: {executor_model} + +// Test coverage reviewer +Spawn teammate "reviewer-tests" with prompt: + "Review the implementation for test coverage: missing edge cases, untested error paths, assertion quality, flaky test patterns, coverage gaps. + Files to review: {file list or PR reference}. + Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number). + When done, send your findings to reviewer-security and reviewer-performance." +Model: {executor_model} +``` + +**Step 3 -- Share and cross-check findings:** +After each reviewer completes their initial review, they share findings with the others via `SendMessage`: + +``` +// Each reviewer sends findings to the other two +SendMessage({ + type: "message", + recipient: "reviewer-performance", + content: "My security findings: {findings list}. Do any of these conflict with your performance findings? Are there performance optimizations that would introduce security risks?", + summary: "Security reviewer sharing findings for cross-check" +}) +``` + +Each reviewer then challenges the others' findings: +``` +SendMessage({ + type: "message", + recipient: "reviewer-security", + content: "Reviewing your security findings: Finding #2 (SQL injection risk in query builder) -- I confirmed this also causes a performance issue due to string concatenation in a hot path. Finding #4 (token expiry) -- this is a false positive; the token refresh middleware handles this case. Evidence: {file}:{line}.", + summary: "Performance reviewer challenging security findings" +}) +``` + +**Step 4 -- Coordinator synthesizes report:** +The team lead (or a fresh agent) collects all findings and cross-check results, then produces a unified review: + +``` +Agent( + subagent_type: "verifier", + model: "{verifier_model}", + prompt: " + Synthesize a unified code review from three specialist reviewers. + + Security findings: {security reviewer's final findings} + Performance findings: {performance reviewer's final findings} + Test coverage findings: {test reviewer's final findings} + Cross-check disputes: {list of challenged findings and resolutions} + + Produce a single review report: + - CRITICAL items (must fix before merge) + - WARNING items (should fix, not blocking) + - INFO items (suggestions) + - Disputed findings and resolution + " +) +``` + +Post the unified report as a GitHub comment on the relevant issue. + +**Tier 1 fallback:** Spawn 3 independent reviewer subagents via `Agent(run_in_background: true)`. Each produces its own report. The orchestrator merges reports manually -- no cross-checking between reviewers. Less thorough but fully functional. + +--- + +### Pattern 3 -- Collaborative Debugging (Adversarial Hypothesis Testing) + +**Use when:** A bug's root cause is unclear and multiple hypotheses need to be tested simultaneously. Each investigator pursues a different theory and actively tries to disprove the others. + +**Flow:** `TeamCreate` --> spawn 2-3 investigators --> each pursues a different hypothesis --> `SendMessage` to share evidence and challenge other hypotheses --> hypothesis that survives adversarial testing wins --> fix implemented by the confirmed investigator. + +**Step 1 -- Create the investigation team:** +``` +TeamCreate( + team_name: "debug-phase-{N}-task-{id}", + description: "Adversarial debugging: {bug description}. Investigators pursue competing hypotheses and challenge each other's evidence." +) +``` + +**Step 2 -- Spawn investigators with distinct hypotheses:** +Derive hypotheses from the bug symptoms, error logs, and codebase analysis. + +``` +// Investigator A -- hypothesis: race condition +Spawn teammate "investigator-a" with prompt: + "Bug: {bug description with symptoms and error output}. + Your hypothesis: RACE CONDITION in {suspected component}. + Investigate this hypothesis: + 1. Find evidence supporting or refuting it + 2. Write a reproducer test if possible + 3. If confirmed, draft a fix + 4. Share evidence with other investigators via SendMessage + 5. Actively challenge other investigators' hypotheses with counter-evidence" +Model: {executor_model} + +// Investigator B -- hypothesis: configuration error +Spawn teammate "investigator-b" with prompt: + "Bug: {bug description with symptoms and error output}. + Your hypothesis: CONFIGURATION ERROR in {suspected component}. + Investigate this hypothesis: + 1. Find evidence supporting or refuting it + 2. Write a reproducer test if possible + 3. If confirmed, draft a fix + 4. Share evidence with other investigators via SendMessage + 5. Actively challenge other investigators' hypotheses with counter-evidence" +Model: {executor_model} + +// Investigator C -- hypothesis: data corruption +Spawn teammate "investigator-c" with prompt: + "Bug: {bug description with symptoms and error output}. + Your hypothesis: DATA CORRUPTION in {suspected component}. + Investigate this hypothesis: + 1. Find evidence supporting or refuting it + 2. Write a reproducer test if possible + 3. If confirmed, draft a fix + 4. Share evidence with other investigators via SendMessage + 5. Actively challenge other investigators' hypotheses with counter-evidence" +Model: {executor_model} +``` + +**Step 3 -- Evidence sharing and adversarial challenges:** +Investigators share findings and challenge each other via `SendMessage`: + +``` +// Investigator A shares evidence +SendMessage({ + type: "message", + recipient: "investigator-b", + content: "Evidence for race condition hypothesis: I found unsynchronized access to {resource} at {file}:{line}. The timing window is ~50ms under load. This contradicts your configuration hypothesis because the config values are correct -- the issue only manifests under concurrent access. Can you disprove this?", + summary: "Investigator-a sharing race condition evidence, challenging config hypothesis" +}) + +// Investigator B responds with counter-evidence +SendMessage({ + type: "message", + recipient: "investigator-a", + content: "Your race condition evidence is plausible but I found that the same symptom occurs on single-threaded test runs. See: {test output}. This suggests the root cause is upstream of the concurrent access point. My config hypothesis: the timeout value at {file}:{line} defaults to 0 when the env var is missing.", + summary: "Investigator-b providing counter-evidence to race condition hypothesis" +}) +``` + +**Step 4 -- Resolution:** +The team lead evaluates which hypothesis survived adversarial testing: + +``` +Agent( + subagent_type: "verifier", + model: "{verifier_model}", + prompt: " + Evaluate competing debugging hypotheses. + + Hypothesis A (race condition): {evidence summary, challenges received, responses} + Hypothesis B (configuration): {evidence summary, challenges received, responses} + Hypothesis C (data corruption): {evidence summary, challenges received, responses} + + Determine: + 1. Which hypothesis best explains ALL symptoms? + 2. Which hypothesis survived adversarial challenge? + 3. Is the proposed fix correct and complete? + + Output: CONFIRMED: investigator-{a|b|c} -- {hypothesis name} + Followed by: evidence that confirms, evidence that was disproven, recommended fix. + " +) +``` + +The confirmed investigator's fix is merged. Other worktree branches are discarded. + +**Tier 1 fallback:** Spawn 2-3 independent debugging subagents via `Agent(isolation: "worktree", run_in_background: true)`. Each investigates a different hypothesis and reports findings. The orchestrator compares reports without inter-agent debate. Less adversarial but still tests multiple hypotheses in parallel. + +--- ### Graceful Degradation @@ -115,7 +428,7 @@ If Agent Teams are unavailable (env var `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` n > "Competitive mode: using Tier 1 subagents (Agent Teams not available or not required for this strategy). Each executor works independently; verifier selects the best result." -The user is informed but not blocked. All workflows remain fully functional via Tier 1. +The user is informed but not blocked. All workflows remain fully functional via Tier 1. Each pattern above includes a specific Tier 1 fallback that preserves the core workflow (parallel execution + verifier selection) without inter-agent messaging. ## Limits diff --git a/templates/workflows/execute.md b/templates/workflows/execute.md index 8184392d..d48eb64e 100644 --- a/templates/workflows/execute.md +++ b/templates/workflows/execute.md @@ -235,12 +235,95 @@ Before spawning competitive agents, evaluate the execution tier: - Read `config.execution.parallelism.competition_strategy` from `.claude/maxsim/config.json` 2. **If Tier 2 is available AND `competition_strategy` is `deep`:** - - Use Agent Teams debate pattern: - - `TeamCreate` to create a competition team - - Spawn 2-3 teammates, each solving the same task independently - - Teammates use `SendMessage` to actively challenge each other's approaches - - The theory/implementation that survives adversarial cross-examination wins - - This fights LLM anchoring bias (first plausible answer wins) + + Use the Agent Teams debate pattern. Create a competition team and spawn teammates who implement independently, then actively challenge each other. + + **Step 2a -- Create the competition team:** + ``` + TeamCreate( + team_name: "competition-phase-{N}-task-{id}", + description: "Competitive implementation: {task_description}" + ) + ``` + + **Step 2b -- Spawn 2-3 competing teammates:** + For each competitor (2 minimum, 3 for critical tasks), spawn a teammate with a distinct approach variation. Each teammate works in its own worktree. + ``` + // Teammate A -- conservative approach + Spawn teammate "competitor-a" with prompt: + "Implement {task_description} using approach: CONSERVATIVE. + Prefer existing patterns, minimal new abstractions, conventional solutions. + Work in isolation. Do not coordinate with other teammates until review phase. + [full task context: phase issue #{phase_issue_number}, plan content, success criteria]" + Model: {executor_model} + + // Teammate B -- innovative approach + Spawn teammate "competitor-b" with prompt: + "Implement {task_description} using approach: INNOVATIVE. + Optimize for performance and elegance, explore novel patterns where justified. + Work in isolation. Do not coordinate with other teammates until review phase. + [full task context: phase issue #{phase_issue_number}, plan content, success criteria]" + Model: {executor_model} + + // Teammate C -- (only for critical tasks) defensive approach + Spawn teammate "competitor-c" with prompt: + "Implement {task_description} using approach: DEFENSIVE. + Maximize error handling, edge case coverage, and robustness over brevity. + Work in isolation. Do not coordinate with other teammates until review phase. + [full task context: phase issue #{phase_issue_number}, plan content, success criteria]" + Model: {executor_model} + ``` + + **Step 2c -- Debate phase (teammates challenge each other):** + After all teammates complete their implementations, each reviews the others' work via `SendMessage`: + ``` + SendMessage({ + type: "message", + recipient: "competitor-b", + content: "Review competitor-a's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.", + summary: "Requesting adversarial review of competitor-a's work" + }) + + SendMessage({ + type: "message", + recipient: "competitor-a", + content: "Review competitor-b's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.", + summary: "Requesting adversarial review of competitor-b's work" + }) + ``` + Each teammate responds with a structured critique. This fights LLM anchoring bias -- the first plausible answer does not automatically win. + + **Step 2d -- Verifier selects winner:** + Spawn a fresh verifier agent (NOT a team member) to evaluate both implementations and both critiques: + ``` + Agent( + subagent_type: "verifier", + model: "{verifier_model}", + prompt: " + You are judging a competitive implementation. Two (or three) agents each implemented the same task independently, then reviewed each other's work adversarially. + + ## Implementations + - competitor-a (CONSERVATIVE): {summary or path to worktree-a} + - competitor-b (INNOVATIVE): {summary or path to worktree-b} + + ## Critiques + - competitor-b's critique of competitor-a: {critique-b-of-a} + - competitor-a's critique of competitor-b: {critique-a-of-b} + + ## Selection Criteria (in priority order) + 1. Correctness -- does it satisfy all success criteria? + 2. Test coverage -- are edge cases tested? + 3. Code quality -- readability, maintainability, consistency with codebase + 4. Simplicity -- prefer fewer abstractions when correctness is equal + + Output exactly: WINNER: competitor-{a|b|c} + Followed by a justification paragraph. + " + ) + ``` + Discard the losing implementation's worktree branch. Merge the winner using the standard branch merge flow (step 6.8). + + **Fallback:** If any step in 2a-2d fails (TeamCreate probe fails, teammate spawn errors, SendMessage timeout), immediately fall back to Tier 1 (step 3 below). Log the failure reason for diagnostics. 3. **If Tier 2 is NOT available (env var unset, feature not yet stable, or `competition_strategy` is `none`/`quick`/`standard`):** - **Graceful degradation to Tier 1** — inform the user: