Skip to content

feat: adversarial spec review loop + skill chaining (v0.9.1.0)#249

Merged
garrytan merged 6 commits intomainfrom
garrytan/brainstorm-skill
Mar 20, 2026
Merged

feat: adversarial spec review loop + skill chaining (v0.9.1.0)#249
garrytan merged 6 commits intomainfrom
garrytan/brainstorm-skill

Conversation

@garrytan
Copy link
Owner

Summary

  • Adversarial spec review loop: /office-hours and /plan-ceo-review now dispatch an independent reviewer subagent that checks documents on 5 dimensions (completeness, consistency, clarity, scope, feasibility) with a convergence guard, quality score (1-10), and JSONL metrics. Implemented as a {{SPEC_REVIEW_LOOP}} resolver in gen-skill-docs.ts — any skill can use it.
  • Visual sketch phase: For UI ideas, /office-hours generates a rough HTML wireframe using design principles from {{DESIGN_METHODOLOGY}} and DESIGN.md, renders via $B, and presents a screenshot for iteration before writing the design doc.
  • Skill chaining: /plan-ceo-review and /plan-eng-review now detect when no design doc exists and offer to run /office-hours first. Implemented via benefits-from: frontmatter + {{BENEFITS_FROM}} resolver. One-hop-max, never blocks, max one offer per session.
  • Spec review metrics: Every review logs iterations, issues found/fixed, remaining, and quality score to ~/.gstack/analytics/spec-review.jsonl.

Test Coverage

Tests: 370 → 439 (+69 new)

CODE PATH COVERAGE
===========================
[+] scripts/gen-skill-docs.ts
    ├── generateSpecReviewLoop()  [★★★ TESTED] 7 assertions — dimensions, Agent, iterations, quality, metrics, convergence, graceful failure
    ├── generateDesignSketch()    [★★★ TESTED] 6 assertions — DESIGN.md, wireframe, $B goto, screenshot, rough aesthetic, skip conditions
    ├── generateBenefitsFrom()    [★★★ TESTED] 4 assertions — CEO offer, eng offer, graceful decline, negative (qa has no offer)
    └── benefitsFrom parsing     [★★★ TESTED] via generated output validation

[+] office-hours/SKILL.md (generated)
    ├── spec review section      [★★★ TESTED] structure + content validation
    └── visual sketch section    [★★★ TESTED] structure + content validation

[+] E2E tests
    ├── office-hours-spec-review [★★★ TESTED] Agent tool called, spec review mentioned
    └── plan-ceo-review-benefits [★★★ TESTED] skill chaining offer understood

Pre-Landing Review

No issues found — template-only changes with no attack surface expansion.

Eval Results

  • Routing E2E: 11/11 pass ($1.65)
  • E2E: 36/37 pass — 1 pre-existing failure in /design-consultation preview (unrelated)
  • Our 2 new E2E tests: both pass

TODOS

No TODO items completed in this PR.

Test plan

  • All skill validation tests pass (439 tests, 0 failures)
  • gen-skill-docs freshness check passes (--dry-run)
  • E2E routing tests pass (11/11)
  • New E2E tests pass (spec-review + benefits-from)

🤖 Generated with Claude Code

garrytan and others added 6 commits March 19, 2026 21:02
…lvers

Three new resolvers in gen-skill-docs.ts:

- {{SPEC_REVIEW_LOOP}}: adversarial subagent reviews documents on 5
  dimensions (completeness, consistency, clarity, scope, feasibility)
  with convergence guard, quality score, and JSONL metrics
- {{DESIGN_SKETCH}}: generates rough HTML wireframes for UI ideas using
  DESIGN.md constraints and design principles, renders via $B
- {{BENEFITS_FROM}}: parses benefits-from frontmatter and generates
  skill chaining offer prose (one-hop-max, never blocks)

Also extends TemplateContext with benefitsFrom field and adds inline
YAML frontmatter parsing for the new field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Phase 4.5 ({{DESIGN_SKETCH}}): for UI ideas, generates rough HTML
  wireframe using design principles from {{DESIGN_METHODOLOGY}} and
  DESIGN.md, renders via $B, presents screenshot for iteration
- Phase 5.5 ({{SPEC_REVIEW_LOOP}}): adversarial subagent reviews the
  design doc before user sees it — catches gaps in completeness,
  consistency, clarity, scope, and feasibility
- Adds {{BROWSE_SETUP}} for $B availability in sketch phase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- plan-ceo-review: benefits-from office-hours, offers /office-hours when
  no design doc found, mid-session detection when user seems lost,
  spec review loop on CEO plan documents
- plan-eng-review: benefits-from office-hours, offers /office-hours when
  no design doc found
- One-hop-max chaining: never blocks, max one offer per session

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unit tests (32 new assertions):
- SPEC_REVIEW_LOOP: 5 dimensions, Agent dispatch, 3 iterations, quality
  score, metrics path, convergence guard, graceful failure
- DESIGN_SKETCH: DESIGN.md awareness, wireframe, $B goto/screenshot,
  rough aesthetic, skip conditions
- BENEFITS_FROM: prerequisite offer in CEO + eng review, graceful
  decline, skills without benefits-from don't get offer
- office-hours structure: spec review loop, adversarial dimensions,
  visual sketch section

E2E tests (2 new):
- office-hours-spec-review: verifies agent understands the spec review
  loop from SKILL.md
- plan-ceo-review-benefits: verifies agent understands the skill
  chaining offer

Touchfiles updated for diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit ae2d841 into main Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant