From 8460d8a6d4e95522b6e90a9e10cda6a34ae14320 Mon Sep 17 00:00:00 2001
From: Sven <svenmaibaum21@gmail.com>
Date: Wed, 25 Mar 2026 22:59:36 +0100
Subject: [PATCH] feat: add executable Tier 2 Agent Teams patterns to
 execute.md and maxsim-batch SKILL.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace prose descriptions of Tier 2 competitive implementation with
concrete TeamCreate/SendMessage call syntax in execute.md section 6.3.
Add three complete Tier 2 workflow patterns to maxsim-batch SKILL.md:
competitive implementation (debate), multi-reviewer code review
(cross-checking), and collaborative debugging (adversarial hypothesis
testing). Each pattern includes TeamCreate, teammate spawn, SendMessage
exchange, verifier resolution, and Tier 1 graceful degradation fallback.
Removes "planned but not yet implemented" disclaimer.

Addresses PROJECT.md §7.2 audit gap (Parallelism PARTIAL 1-3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 templates/skills/maxsim-batch/SKILL.md | 327 ++++++++++++++++++++++++-
 templates/workflows/execute.md         |  95 ++++++-
 2 files changed, 409 insertions(+), 13 deletions(-)

diff --git a/templates/skills/maxsim-batch/SKILL.md b/templates/skills/maxsim-batch/SKILL.md
index 75802d3b..86f48afd 100644
--- a/templates/skills/maxsim-batch/SKILL.md
+++ b/templates/skills/maxsim-batch/SKILL.md
@@ -87,7 +87,7 @@ When all agents complete:
 
 Agent Teams (available since Claude Code v2.1.32, Feb 2026) enable inter-agent communication for workflows that require debate, cross-checking, or collaborative problem-solving. MaxsimCLI sets `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` during install and registers `TeammateIdle` and `TaskCompleted` quality-gate hooks.
 
-**Current status:** Infrastructure is in place (env var, hooks). Workflow templates that invoke `TeamCreate`/`SendMessage` for Tier 2 patterns (competitive implementation, multi-reviewer code review, collaborative debugging) are planned but not yet implemented. All workflows currently use Tier 1 subagents. See PROJECT.md §7.2 for the full specification.
+**Current status:** Infrastructure is in place (env var, hooks). Tier 2 workflow patterns (competitive implementation, multi-reviewer code review, collaborative debugging) are defined below with executable `TeamCreate`/`SendMessage` call syntax. All workflows gracefully degrade to Tier 1 subagents when Agent Teams are unavailable. See PROJECT.md §7.2 for the authoritative specification.
 
 ### Tier Selection Logic
 
@@ -103,11 +103,324 @@ MaxsimCLI chooses the tier automatically based on the workflow:
 | Collaborative debugging | Tier 2 (Agent Teams) | Hypotheses need adversarial testing |
 | Architecture exploration | Tier 2 (Agent Teams) | Requires discussion |
 
-**When Tier 2 is ready, it will be used for:**
-- Competitive implementation with adversarial debate
-- Multi-dimensional code review (security + performance + test coverage)
-- Collaborative debugging with competing hypotheses
-- Cross-layer feature work (frontend + backend + tests)
+### Tier 2 Activation Check
+
+Before using any Tier 2 pattern, verify availability:
+
+```bash
+# 1. Check env var
+[ "$CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS" = "1" ] || { echo "Tier 2 unavailable: env var not set"; TIER=1; }
+```
+
+```
+# 2. Probe TeamCreate (lightweight -- create and immediately clean up)
+TeamCreate(team_name: "probe-{timestamp}", description: "availability check")
+# If probe fails, set TIER=1 and log reason
+```
+
+If either check fails, skip to the Graceful Degradation section below. Do not attempt Tier 2 patterns.
+
+---
+
+### Pattern 1 -- Competitive Implementation (Debate)
+
+**Use when:** A task is marked `critical` and `config.execution.competitive_enabled` is `true`. Multiple agents implement the same task independently, then adversarially critique each other's work before a neutral verifier selects the winner.
+
+**Flow:** `TeamCreate` --> spawn 2-3 competitors --> each implements independently --> `SendMessage` critiques --> fresh verifier judges --> winner selected.
+
+**Step 1 -- Create the competition team:**
+```
+TeamCreate(
+  team_name: "competition-phase-{N}-task-{id}",
+  description: "Competitive implementation: {task_description}. Each teammate implements independently, then reviews the others adversarially."
+)
+```
+
+**Step 2 -- Spawn competing teammates:**
+Spawn 2 teammates minimum, 3 for tasks labeled `critical`. Each gets a distinct approach directive and the full task context.
+
+```
+// Teammate A -- conservative approach
+Spawn teammate "competitor-a" with prompt:
+  "Implement {task_description} using approach: CONSERVATIVE.
+   Prefer existing patterns, minimal new abstractions, conventional solutions.
+   Work in isolation until the review phase.
+   Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
+   Success criteria: {criteria from plan}.
+   When done, commit your work and report RESULT: PASS or RESULT: FAIL."
+Model: {executor_model}
+
+// Teammate B -- innovative approach
+Spawn teammate "competitor-b" with prompt:
+  "Implement {task_description} using approach: INNOVATIVE.
+   Optimize for performance and elegance, explore novel patterns where justified.
+   Work in isolation until the review phase.
+   Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
+   Success criteria: {criteria from plan}.
+   When done, commit your work and report RESULT: PASS or RESULT: FAIL."
+Model: {executor_model}
+
+// (Optional -- critical tasks only) Teammate C -- defensive approach
+Spawn teammate "competitor-c" with prompt:
+  "Implement {task_description} using approach: DEFENSIVE.
+   Maximize error handling, edge case coverage, and robustness.
+   Work in isolation until the review phase.
+   Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
+   Success criteria: {criteria from plan}.
+   When done, commit your work and report RESULT: PASS or RESULT: FAIL."
+Model: {executor_model}
+```
+
+**Step 3 -- Adversarial critique via SendMessage:**
+After all teammates complete, each reviews the others' implementations:
+
+```
+SendMessage({
+  type: "message",
+  recipient: "competitor-b",
+  content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
+  summary: "Requesting adversarial review of competitor-a"
+})
+
+SendMessage({
+  type: "message",
+  recipient: "competitor-a",
+  content: "Review competitor-b's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
+  summary: "Requesting adversarial review of competitor-b"
+})
+```
+
+**Step 4 -- Fresh verifier selects winner:**
+Spawn a verifier agent (NOT a team member) to evaluate both implementations and both critiques:
+
+```
+Agent(
+  subagent_type: "verifier",
+  model: "{verifier_model}",
+  prompt: "
+    Judge a competitive implementation. Agents implemented the same task independently, then critiqued each other.
+
+    Implementations: competitor-a (CONSERVATIVE), competitor-b (INNOVATIVE)
+    Critiques: {critique summaries}
+
+    Selection criteria (priority order):
+    1. Correctness -- satisfies all success criteria
+    2. Test coverage -- edge cases tested
+    3. Code quality -- readability, codebase consistency
+    4. Simplicity -- fewer abstractions when correctness is equal
+
+    Output: WINNER: competitor-{a|b|c}
+    Followed by justification.
+  "
+)
+```
+
+Discard the losing worktree branch. Merge the winner via the standard flow.
+
+**Tier 1 fallback:** Spawn 2 independent executor subagents via `Agent(isolation: "worktree", run_in_background: true)` with different approach prompts. After both complete, spawn a verifier to compare. No inter-agent messaging -- the verifier reads both outputs directly.
+
+---
+
+### Pattern 2 -- Multi-Reviewer Code Review (Cross-Checking)
+
+**Use when:** A PR or implementation requires review from multiple specialist perspectives that must challenge each other's findings.
+
+**Flow:** `TeamCreate` --> spawn 3 specialist reviewers --> each reviews independently --> `SendMessage` to share findings --> each reviewer challenges other reviewers' findings --> coordinator synthesizes unified report.
+
+**Step 1 -- Create the review team:**
+```
+TeamCreate(
+  team_name: "review-phase-{N}-task-{id}",
+  description: "Multi-dimensional code review: {description}. Reviewers share and cross-check findings."
+)
+```
+
+**Step 2 -- Spawn specialist reviewers:**
+```
+// Security reviewer
+Spawn teammate "reviewer-security" with prompt:
+  "Review the implementation for security concerns: authentication, authorization, input validation, injection risks, token handling, data exposure.
+   Files to review: {file list or PR reference}.
+   Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number).
+   When done, send your findings to reviewer-performance and reviewer-tests."
+Model: {executor_model}
+
+// Performance reviewer
+Spawn teammate "reviewer-performance" with prompt:
+  "Review the implementation for performance concerns: N+1 queries, missing indexes, unnecessary allocations, caching opportunities, algorithmic complexity.
+   Files to review: {file list or PR reference}.
+   Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number).
+   When done, send your findings to reviewer-security and reviewer-tests."
+Model: {executor_model}
+
+// Test coverage reviewer
+Spawn teammate "reviewer-tests" with prompt:
+  "Review the implementation for test coverage: missing edge cases, untested error paths, assertion quality, flaky test patterns, coverage gaps.
+   Files to review: {file list or PR reference}.
+   Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number).
+   When done, send your findings to reviewer-security and reviewer-performance."
+Model: {executor_model}
+```
+
+**Step 3 -- Share and cross-check findings:**
+After each reviewer completes their initial review, they share findings with the others via `SendMessage`:
+
+```
+// Each reviewer sends findings to the other two
+SendMessage({
+  type: "message",
+  recipient: "reviewer-performance",
+  content: "My security findings: {findings list}. Do any of these conflict with your performance findings? Are there performance optimizations that would introduce security risks?",
+  summary: "Security reviewer sharing findings for cross-check"
+})
+```
+
+Each reviewer then challenges the others' findings:
+```
+SendMessage({
+  type: "message",
+  recipient: "reviewer-security",
+  content: "Reviewing your security findings: Finding #2 (SQL injection risk in query builder) -- I confirmed this also causes a performance issue due to string concatenation in a hot path. Finding #4 (token expiry) -- this is a false positive; the token refresh middleware handles this case. Evidence: {file}:{line}.",
+  summary: "Performance reviewer challenging security findings"
+})
+```
+
+**Step 4 -- Coordinator synthesizes report:**
+The team lead (or a fresh agent) collects all findings and cross-check results, then produces a unified review:
+
+```
+Agent(
+  subagent_type: "verifier",
+  model: "{verifier_model}",
+  prompt: "
+    Synthesize a unified code review from three specialist reviewers.
+
+    Security findings: {security reviewer's final findings}
+    Performance findings: {performance reviewer's final findings}
+    Test coverage findings: {test reviewer's final findings}
+    Cross-check disputes: {list of challenged findings and resolutions}
+
+    Produce a single review report:
+    - CRITICAL items (must fix before merge)
+    - WARNING items (should fix, not blocking)
+    - INFO items (suggestions)
+    - Disputed findings and resolution
+  "
+)
+```
+
+Post the unified report as a GitHub comment on the relevant issue.
+
+**Tier 1 fallback:** Spawn 3 independent reviewer subagents via `Agent(run_in_background: true)`. Each produces its own report. The orchestrator merges reports manually -- no cross-checking between reviewers. Less thorough but fully functional.
+
+---
+
+### Pattern 3 -- Collaborative Debugging (Adversarial Hypothesis Testing)
+
+**Use when:** A bug's root cause is unclear and multiple hypotheses need to be tested simultaneously. Each investigator pursues a different theory and actively tries to disprove the others.
+
+**Flow:** `TeamCreate` --> spawn 2-3 investigators --> each pursues a different hypothesis --> `SendMessage` to share evidence and challenge other hypotheses --> hypothesis that survives adversarial testing wins --> fix implemented by the confirmed investigator.
+
+**Step 1 -- Create the investigation team:**
+```
+TeamCreate(
+  team_name: "debug-phase-{N}-task-{id}",
+  description: "Adversarial debugging: {bug description}. Investigators pursue competing hypotheses and challenge each other's evidence."
+)
+```
+
+**Step 2 -- Spawn investigators with distinct hypotheses:**
+Derive hypotheses from the bug symptoms, error logs, and codebase analysis.
+
+```
+// Investigator A -- hypothesis: race condition
+Spawn teammate "investigator-a" with prompt:
+  "Bug: {bug description with symptoms and error output}.
+   Your hypothesis: RACE CONDITION in {suspected component}.
+   Investigate this hypothesis:
+   1. Find evidence supporting or refuting it
+   2. Write a reproducer test if possible
+   3. If confirmed, draft a fix
+   4. Share evidence with other investigators via SendMessage
+   5. Actively challenge other investigators' hypotheses with counter-evidence"
+Model: {executor_model}
+
+// Investigator B -- hypothesis: configuration error
+Spawn teammate "investigator-b" with prompt:
+  "Bug: {bug description with symptoms and error output}.
+   Your hypothesis: CONFIGURATION ERROR in {suspected component}.
+   Investigate this hypothesis:
+   1. Find evidence supporting or refuting it
+   2. Write a reproducer test if possible
+   3. If confirmed, draft a fix
+   4. Share evidence with other investigators via SendMessage
+   5. Actively challenge other investigators' hypotheses with counter-evidence"
+Model: {executor_model}
+
+// Investigator C -- hypothesis: data corruption
+Spawn teammate "investigator-c" with prompt:
+  "Bug: {bug description with symptoms and error output}.
+   Your hypothesis: DATA CORRUPTION in {suspected component}.
+   Investigate this hypothesis:
+   1. Find evidence supporting or refuting it
+   2. Write a reproducer test if possible
+   3. If confirmed, draft a fix
+   4. Share evidence with other investigators via SendMessage
+   5. Actively challenge other investigators' hypotheses with counter-evidence"
+Model: {executor_model}
+```
+
+**Step 3 -- Evidence sharing and adversarial challenges:**
+Investigators share findings and challenge each other via `SendMessage`:
+
+```
+// Investigator A shares evidence
+SendMessage({
+  type: "message",
+  recipient: "investigator-b",
+  content: "Evidence for race condition hypothesis: I found unsynchronized access to {resource} at {file}:{line}. The timing window is ~50ms under load. This contradicts your configuration hypothesis because the config values are correct -- the issue only manifests under concurrent access. Can you disprove this?",
+  summary: "Investigator-a sharing race condition evidence, challenging config hypothesis"
+})
+
+// Investigator B responds with counter-evidence
+SendMessage({
+  type: "message",
+  recipient: "investigator-a",
+  content: "Your race condition evidence is plausible but I found that the same symptom occurs on single-threaded test runs. See: {test output}. This suggests the root cause is upstream of the concurrent access point. My config hypothesis: the timeout value at {file}:{line} defaults to 0 when the env var is missing.",
+  summary: "Investigator-b providing counter-evidence to race condition hypothesis"
+})
+```
+
+**Step 4 -- Resolution:**
+The team lead evaluates which hypothesis survived adversarial testing:
+
+```
+Agent(
+  subagent_type: "verifier",
+  model: "{verifier_model}",
+  prompt: "
+    Evaluate competing debugging hypotheses.
+
+    Hypothesis A (race condition): {evidence summary, challenges received, responses}
+    Hypothesis B (configuration): {evidence summary, challenges received, responses}
+    Hypothesis C (data corruption): {evidence summary, challenges received, responses}
+
+    Determine:
+    1. Which hypothesis best explains ALL symptoms?
+    2. Which hypothesis survived adversarial challenge?
+    3. Is the proposed fix correct and complete?
+
+    Output: CONFIRMED: investigator-{a|b|c} -- {hypothesis name}
+    Followed by: evidence that confirms, evidence that was disproven, recommended fix.
+  "
+)
+```
+
+The confirmed investigator's fix is merged. Other worktree branches are discarded.
+
+**Tier 1 fallback:** Spawn 2-3 independent debugging subagents via `Agent(isolation: "worktree", run_in_background: true)`. Each investigates a different hypothesis and reports findings. The orchestrator compares reports without inter-agent debate. Less adversarial but still tests multiple hypotheses in parallel.
+
+---
 
 ### Graceful Degradation
 
@@ -115,7 +428,7 @@ If Agent Teams are unavailable (env var `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` n
 
 > "Competitive mode: using Tier 1 subagents (Agent Teams not available or not required for this strategy). Each executor works independently; verifier selects the best result."
 
-The user is informed but not blocked. All workflows remain fully functional via Tier 1.
+The user is informed but not blocked. All workflows remain fully functional via Tier 1. Each pattern above includes a specific Tier 1 fallback that preserves the core workflow (parallel execution + verifier selection) without inter-agent messaging.
 
 ## Limits
 
diff --git a/templates/workflows/execute.md b/templates/workflows/execute.md
index 8184392d..d48eb64e 100644
--- a/templates/workflows/execute.md
+++ b/templates/workflows/execute.md
@@ -235,12 +235,95 @@ Before spawning competitive agents, evaluate the execution tier:
    - Read `config.execution.parallelism.competition_strategy` from `.claude/maxsim/config.json`
 
 2. **If Tier 2 is available AND `competition_strategy` is `deep`:**
-   - Use Agent Teams debate pattern:
-     - `TeamCreate` to create a competition team
-     - Spawn 2-3 teammates, each solving the same task independently
-     - Teammates use `SendMessage` to actively challenge each other's approaches
-     - The theory/implementation that survives adversarial cross-examination wins
-   - This fights LLM anchoring bias (first plausible answer wins)
+
+   Use the Agent Teams debate pattern. Create a competition team and spawn teammates who implement independently, then actively challenge each other.
+
+   **Step 2a -- Create the competition team:**
+   ```
+   TeamCreate(
+     team_name: "competition-phase-{N}-task-{id}",
+     description: "Competitive implementation: {task_description}"
+   )
+   ```
+
+   **Step 2b -- Spawn 2-3 competing teammates:**
+   For each competitor (2 minimum, 3 for critical tasks), spawn a teammate with a distinct approach variation. Each teammate works in its own worktree.
+   ```
+   // Teammate A -- conservative approach
+   Spawn teammate "competitor-a" with prompt:
+     "Implement {task_description} using approach: CONSERVATIVE.
+      Prefer existing patterns, minimal new abstractions, conventional solutions.
+      Work in isolation. Do not coordinate with other teammates until review phase.
+      [full task context: phase issue #{phase_issue_number}, plan content, success criteria]"
+   Model: {executor_model}
+
+   // Teammate B -- innovative approach
+   Spawn teammate "competitor-b" with prompt:
+     "Implement {task_description} using approach: INNOVATIVE.
+      Optimize for performance and elegance, explore novel patterns where justified.
+      Work in isolation. Do not coordinate with other teammates until review phase.
+      [full task context: phase issue #{phase_issue_number}, plan content, success criteria]"
+   Model: {executor_model}
+
+   // Teammate C -- (only for critical tasks) defensive approach
+   Spawn teammate "competitor-c" with prompt:
+     "Implement {task_description} using approach: DEFENSIVE.
+      Maximize error handling, edge case coverage, and robustness over brevity.
+      Work in isolation. Do not coordinate with other teammates until review phase.
+      [full task context: phase issue #{phase_issue_number}, plan content, success criteria]"
+   Model: {executor_model}
+   ```
+
+   **Step 2c -- Debate phase (teammates challenge each other):**
+   After all teammates complete their implementations, each reviews the others' work via `SendMessage`:
+   ```
+   SendMessage({
+     type: "message",
+     recipient: "competitor-b",
+     content: "Review competitor-a's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.",
+     summary: "Requesting adversarial review of competitor-a's work"
+   })
+
+   SendMessage({
+     type: "message",
+     recipient: "competitor-a",
+     content: "Review competitor-b's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.",
+     summary: "Requesting adversarial review of competitor-b's work"
+   })
+   ```
+   Each teammate responds with a structured critique. This fights LLM anchoring bias -- the first plausible answer does not automatically win.
+
+   **Step 2d -- Verifier selects winner:**
+   Spawn a fresh verifier agent (NOT a team member) to evaluate both implementations and both critiques:
+   ```
+   Agent(
+     subagent_type: "verifier",
+     model: "{verifier_model}",
+     prompt: "
+       You are judging a competitive implementation. Two (or three) agents each implemented the same task independently, then reviewed each other's work adversarially.
+
+       ## Implementations
+       - competitor-a (CONSERVATIVE): {summary or path to worktree-a}
+       - competitor-b (INNOVATIVE): {summary or path to worktree-b}
+
+       ## Critiques
+       - competitor-b's critique of competitor-a: {critique-b-of-a}
+       - competitor-a's critique of competitor-b: {critique-a-of-b}
+
+       ## Selection Criteria (in priority order)
+       1. Correctness -- does it satisfy all success criteria?
+       2. Test coverage -- are edge cases tested?
+       3. Code quality -- readability, maintainability, consistency with codebase
+       4. Simplicity -- prefer fewer abstractions when correctness is equal
+
+       Output exactly: WINNER: competitor-{a|b|c}
+       Followed by a justification paragraph.
+     "
+   )
+   ```
+   Discard the losing implementation's worktree branch. Merge the winner using the standard branch merge flow (step 6.8).
+
+   **Fallback:** If any step in 2a-2d fails (TeamCreate probe fails, teammate spawn errors, SendMessage timeout), immediately fall back to Tier 1 (step 3 below). Log the failure reason for diagnostics.
 
 3. **If Tier 2 is NOT available (env var unset, feature not yet stable, or `competition_strategy` is `none`/`quick`/`standard`):**
    - **Graceful degradation to Tier 1** — inform the user: