From eab545a6d222970c2554e09fcb86fc47786d9a1d Mon Sep 17 00:00:00 2001 From: Sven Date: Wed, 25 Mar 2026 18:37:40 +0100 Subject: [PATCH] fix(self-improvement): fix TSV columns, add two-batch wizard, add two-stage review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - improve.md: Replace 8-column TSV with spec §11.4 canonical 7-column format (iteration, commit, metric, delta, guard, status, description) - improve.md: Restructure parameter collection into two-batch AskUserQuestion pattern with dry-run baseline step - verify-phase.md: Add Step 4b two-stage sequential review (Spec Compliance then Code Quality) gated on strict_mode config - verify-phase.md: Update status determination and comment template with spec_compliance_review and code_quality_review fields Co-Authored-By: Claude Opus 4.6 (1M context) --- templates/commands/maxsim/improve.md | 24 ++++++---- templates/workflows/verify-phase.md | 68 ++++++++++++++++++++++++++++ 2 files changed, 83 insertions(+), 9 deletions(-) diff --git a/templates/commands/maxsim/improve.md b/templates/commands/maxsim/improve.md index 81c5e098..97bf018a 100644 --- a/templates/commands/maxsim/improve.md +++ b/templates/commands/maxsim/improve.md @@ -25,14 +25,20 @@ Invoke the `autoresearch` skill to drive the optimization loop. Invoke the `veri **Phase 1 — Setup (Plan Mode)** 1. Enter Plan Mode via EnterPlanMode -2. Gather loop parameters via AskUserQuestion: - - **Metric command** — the command whose output is the optimization target (from $ARGUMENTS or ask) - - **Guard command** — regression check that must always pass (e.g., `npm test`) - - **Direction** — minimize or maximize the metric - - **Iteration budget** — max iterations before stopping (default: 20) - - **Scope** — which files/directories are in-scope for modification -3. Show the proposed loop configuration and confirm with user -4. Exit Plan Mode via ExitPlanMode +2. Gather loop parameters via two AskUserQuestion calls: + **Batch 1** (required — 4 questions): + - Metric command (the command to run and extract a number from) + - Guard command (regression check, e.g., `npm test`) + - Metric direction (`lower_is_better` or `higher_is_better`) + - Iteration budget (default: 20) + + **Batch 2** (scope and constraints — 3 questions): + - Scope (files/directories to modify) + - Files to NEVER modify (test files, guard files, config) + - Starting approach (optional — first idea to try) +3. Dry-run: Execute the metric command once to establish baseline. Execute the guard command to confirm it passes. If either fails, ask the user to fix before proceeding. +4. Show the proposed loop configuration and confirm with user +5. Exit Plan Mode via ExitPlanMode **Phase 2 — Optimization Loop** @@ -46,7 +52,7 @@ Run the 8-phase autoresearch loop, one iteration at a time: 6. **Guard** — run the guard command to check for regressions - Guard failure + verify pass → rework (max 2 attempts), then discard 7. **Decide** — metric improved AND guard passed → keep; otherwise → `git revert HEAD --no-edit` -8. **Log** — append iteration result to the TSV file (date, iteration, approach, metric-value, outcome, commit-hash, notes) +8. **Log** — append iteration result to the TSV file (iteration, commit, metric, delta, guard, status, description) **Stuck Detection:** After 5 consecutive discards or crashes: diff --git a/templates/workflows/verify-phase.md b/templates/workflows/verify-phase.md index ea950d6b..6a7a4a31 100644 --- a/templates/workflows/verify-phase.md +++ b/templates/workflows/verify-phase.md @@ -226,6 +226,68 @@ Agent( Wait for all three review agents to complete before proceeding. +### Step 4b — Two-Stage Sequential Review (Optional) + +When `verification.strict_mode` is enabled in the project config, run an additional two-stage sequential review after the parallel agents complete. Each stage uses a fresh verifier subagent to prevent anchoring bias. + +**Stage 1 — Spec Compliance:** + +Spawn a fresh verifier agent: +``` +Agent( + subagent_type="Explore", + model="{verifier_model}", + prompt=" + You are performing a spec compliance review for phase {phase_number}: {phase_name}. + + Read the phase requirements from GitHub Issue #{phase_issue_number}. + Read all files modified in this phase. + + For EACH requirement listed in the issue, verify it is implemented with evidence: + + CLAIM: Requirement [ID] — [description] + EVIDENCE: [file:line or command] + OUTPUT: [actual result observed] + VERDICT: PASS | FAIL — [reason] + + End with: SPEC COMPLIANCE: PASS or SPEC COMPLIANCE: FAIL — [list of unmet requirements] + " +) +``` + +Wait for Stage 1 to complete. If it fails, include the failures in the final report. + +**Stage 2 — Code Quality (fresh subagent):** + +Spawn a NEW verifier agent (do NOT reuse the Stage 1 agent): +``` +Agent( + subagent_type="Explore", + model="{verifier_model}", + prompt=" + You are performing a code quality deep review for phase {phase_number}: {phase_name}. + + Context: Spec compliance review has already been completed. + Read all files modified in this phase. + + Focus on implementation quality beyond spec compliance: + - Architecture and design pattern adherence + - Error handling completeness + - Edge case coverage + - Code maintainability and clarity + - No dead code, no unnecessary complexity + + For each finding: + CLAIM: [what was checked] + EVIDENCE: [file:line] + OUTPUT: [observed behavior or code pattern] + VERDICT: PASS | FAIL — [reason] + + End with: CODE QUALITY: PASS or CODE QUALITY: FAIL — [issues found] + " +) +``` + ## Step 5 — Identify Human Verification Items Some checks cannot be automated. Flag these for human review: @@ -258,6 +320,7 @@ Why manual: {why automated checks cannot cover this} - Security review: PASS - Quality review: PASS (no blockers) - Efficiency review: PASS (no blockers) +- If strict_mode was on: Spec compliance review PASS and Code quality review PASS **FAIL** — Any of: - Any must-have truth: FAILED @@ -267,6 +330,7 @@ Why manual: {why automated checks cannot cover this} - Build: FAIL - Any Blocker anti-pattern - Security or Quality review: FAIL with blockers +- If strict_mode was on: Spec compliance review FAIL or Code quality review FAIL **HUMAN_NEEDED** — All automated checks PASS but human verification items remain unreviewed. @@ -292,6 +356,8 @@ checks: security_review: pass | fail quality_review: pass | fail efficiency_review: pass | fail + spec_compliance_review: pass | fail | skipped + code_quality_review: pass | fail | skipped --- ## Verification: Phase {phase_number} — {phase_name} @@ -328,6 +394,8 @@ checks: | Security | {PASS/FAIL} | {issues if fail} | | Quality | {PASS/FAIL} | {blockers if fail} | | Efficiency | {PASS/FAIL} | {blockers if fail} | +| Spec Compliance | {PASS/FAIL/SKIPPED} | strict_mode only; {unmet requirements if fail} | +| Code Quality (deep) | {PASS/FAIL/SKIPPED} | strict_mode only; {issues if fail} | ## Anti-Patterns Found