feat: adversarial spec review loop + skill chaining (v0.9.1.0) by garrytan · Pull Request #249 · garrytan/gstack

garrytan · 2026-03-20T13:19:41Z

Summary

Adversarial spec review loop: /office-hours and /plan-ceo-review now dispatch an independent reviewer subagent that checks documents on 5 dimensions (completeness, consistency, clarity, scope, feasibility) with a convergence guard, quality score (1-10), and JSONL metrics. Implemented as a {{SPEC_REVIEW_LOOP}} resolver in gen-skill-docs.ts — any skill can use it.
Visual sketch phase: For UI ideas, /office-hours generates a rough HTML wireframe using design principles from {{DESIGN_METHODOLOGY}} and DESIGN.md, renders via $B, and presents a screenshot for iteration before writing the design doc.
Skill chaining: /plan-ceo-review and /plan-eng-review now detect when no design doc exists and offer to run /office-hours first. Implemented via benefits-from: frontmatter + {{BENEFITS_FROM}} resolver. One-hop-max, never blocks, max one offer per session.
Spec review metrics: Every review logs iterations, issues found/fixed, remaining, and quality score to ~/.gstack/analytics/spec-review.jsonl.

Test Coverage

Tests: 370 → 439 (+69 new)

CODE PATH COVERAGE
===========================
[+] scripts/gen-skill-docs.ts
    ├── generateSpecReviewLoop()  [★★★ TESTED] 7 assertions — dimensions, Agent, iterations, quality, metrics, convergence, graceful failure
    ├── generateDesignSketch()    [★★★ TESTED] 6 assertions — DESIGN.md, wireframe, $B goto, screenshot, rough aesthetic, skip conditions
    ├── generateBenefitsFrom()    [★★★ TESTED] 4 assertions — CEO offer, eng offer, graceful decline, negative (qa has no offer)
    └── benefitsFrom parsing     [★★★ TESTED] via generated output validation

[+] office-hours/SKILL.md (generated)
    ├── spec review section      [★★★ TESTED] structure + content validation
    └── visual sketch section    [★★★ TESTED] structure + content validation

[+] E2E tests
    ├── office-hours-spec-review [★★★ TESTED] Agent tool called, spec review mentioned
    └── plan-ceo-review-benefits [★★★ TESTED] skill chaining offer understood

Pre-Landing Review

No issues found — template-only changes with no attack surface expansion.

Eval Results

Routing E2E: 11/11 pass ($1.65)
E2E: 36/37 pass — 1 pre-existing failure in /design-consultation preview (unrelated)
Our 2 new E2E tests: both pass

TODOS

No TODO items completed in this PR.

Test plan

All skill validation tests pass (439 tests, 0 failures)
gen-skill-docs freshness check passes (--dry-run)
E2E routing tests pass (11/11)
New E2E tests pass (spec-review + benefits-from)

🤖 Generated with Claude Code

…lvers Three new resolvers in gen-skill-docs.ts: - {{SPEC_REVIEW_LOOP}}: adversarial subagent reviews documents on 5 dimensions (completeness, consistency, clarity, scope, feasibility) with convergence guard, quality score, and JSONL metrics - {{DESIGN_SKETCH}}: generates rough HTML wireframes for UI ideas using DESIGN.md constraints and design principles, renders via $B - {{BENEFITS_FROM}}: parses benefits-from frontmatter and generates skill chaining offer prose (one-hop-max, never blocks) Also extends TemplateContext with benefitsFrom field and adds inline YAML frontmatter parsing for the new field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Phase 4.5 ({{DESIGN_SKETCH}}): for UI ideas, generates rough HTML wireframe using design principles from {{DESIGN_METHODOLOGY}} and DESIGN.md, renders via $B, presents screenshot for iteration - Phase 5.5 ({{SPEC_REVIEW_LOOP}}): adversarial subagent reviews the design doc before user sees it — catches gaps in completeness, consistency, clarity, scope, and feasibility - Adds {{BROWSE_SETUP}} for $B availability in sketch phase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- plan-ceo-review: benefits-from office-hours, offers /office-hours when no design doc found, mid-session detection when user seems lost, spec review loop on CEO plan documents - plan-eng-review: benefits-from office-hours, offers /office-hours when no design doc found - One-hop-max chaining: never blocks, max one offer per session Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Unit tests (32 new assertions): - SPEC_REVIEW_LOOP: 5 dimensions, Agent dispatch, 3 iterations, quality score, metrics path, convergence guard, graceful failure - DESIGN_SKETCH: DESIGN.md awareness, wireframe, $B goto/screenshot, rough aesthetic, skip conditions - BENEFITS_FROM: prerequisite offer in CEO + eng review, graceful decline, skills without benefits-from don't get offer - office-hours structure: spec review loop, adversarial dimensions, visual sketch section E2E tests (2 new): - office-hours-spec-review: verifies agent understands the spec review loop from SKILL.md - plan-ceo-review-benefits: verifies agent understands the skill chaining offer Touchfiles updated for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 6 commits March 19, 2026 21:02

merge: resolve conflicts with origin/main (telemetry + codex host)

bc272a6

chore: bump version and changelog (v0.9.1.0)

c099c8c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan merged commit ae2d841 into main Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adversarial spec review loop + skill chaining (v0.9.1.0)#249

feat: adversarial spec review loop + skill chaining (v0.9.1.0)#249
garrytan merged 6 commits intomainfrom
garrytan/brainstorm-skill

garrytan commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Mar 20, 2026

Summary

Test Coverage

Pre-Landing Review

Eval Results

TODOS

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant