Skip to content

Add directive-driven improvement and prompt surface optimization#46

Closed
Born14 wants to merge 2 commits intomainfrom
claude/evaluate-self-improvement-loop-3jCG4
Closed

Add directive-driven improvement and prompt surface optimization#46
Born14 wants to merge 2 commits intomainfrom
claude/evaluate-self-improvement-loop-3jCG4

Conversation

@Born14
Copy link
Copy Markdown
Owner

@Born14 Born14 commented Apr 5, 2026

Summary

Introduces two major enhancements to the autonomous improvement engine:

  1. Directive-Driven Improvement — Operators can now guide the improvement loop through a improve-directive.md file instead of modifying TypeScript. This follows AutoAgent's "program the meta-agent" pattern.

  2. Prompt Surface Optimization — Extends the improvement loop to recognize and optimize LLM prompts and tunable thresholds within gate files (e.g., vision.ts, triangulation.ts), allowing the LLM to prefer prompt edits over logic changes when appropriate.

  3. Continuous Mode — Implements hill-climbing iteration support, allowing the improvement engine to re-baseline and iterate after each accepted improvement.

Key Changes

  • improve-directive.ts (new)

    • loadDirective() — Loads and parses improve-directive.md with structured fields (priority gates, focus mode, edit style) and custom instructions
    • formatDirectiveForPrompt() — Injects directive context into LLM prompts
    • applyDirectiveToBundles() — Prioritizes evidence bundles based on directive's priority gates
  • improve-prompt-surface.ts (new)

    • Defines known prompt regions in gate files (vision.ts, triangulation.ts, hallucination.ts)
    • extractPromptRegion() — Extracts actual prompt text from source files
    • formatPromptSurfaceContext() — Provides LLM with prompt region metadata and tuning advice
    • isPromptRegion() — Checks if a file/function is a tunable prompt surface
  • improve.ts (modified)

    • Refactored runImproveLoop() into runSingleIteration() to support continuous mode
    • Loads and applies directive at start of each iteration
    • Injects directive and prompt surface context into bundle processing
    • Tracks cumulative LLM usage and accepted improvements across iterations
    • Early termination when no improvements found
  • self-test.ts (modified)

    • Added CLI flags: --continuous, --max-iterations=N, --directive=PATH, --prompt-surface
    • Updated help text with examples for all new modes
  • types.ts (modified)

    • Extended ImproveConfig with maxIterations, directivePath, promptSurface fields
  • improve-directive.md (new)

    • Template file with commented examples showing how to configure improvement priorities
  • improve-directive.test.ts (new)

    • Unit tests for directive parsing, prompt formatting, and bundle prioritization
  • .gitignore (modified)

    • Added .verify/ directory (created by test runs)

Notable Implementation Details

  • Directive parsing is lenient (case-insensitive, flexible delimiters) to reduce friction
  • Prompt regions use start/end markers for robust extraction even if code changes
  • Directive context is injected into both diagnosis and fix generation prompts
  • Continuous mode re-baselines after each accepted improvement, enabling iterative refinement
  • Cumulative LLM usage is tracked and reported at the end of continuous runs
  • Early termination prevents wasted iterations when the improvement frontier is reached

https://claude.ai/code/session_01SJkfKmU2V83UrCvgyH2JAD

claude added 2 commits April 4, 2026 22:50
… surface optimization

Three AutoAgent-inspired concepts integrated into the evidence-centric improve loop:

1. Continuous mode (--continuous / --max-iterations=N): Re-baselines after each
   accepted improvement and iterates, compounding small wins. Stops when an
   iteration produces no accepted candidates.

2. Directive-driven improvement (improve-directive.md): Externalizes improvement
   strategy into a human-editable Markdown file. Operators can specify priority
   gates, focus mode (false positives vs negatives), edit style preferences, and
   custom instructions — all injected into LLM diagnosis/fix prompts.

3. Prompt surface optimization (--prompt-surface): Extends the bounded surface
   to include LLM prompts within gates (vision.ts prompt, triangulation weights).
   The fix generator gets context about which regions are prompts vs logic,
   preferring prompt edits for prompt-related failures.

https://claude.ai/code/session_01SJkfKmU2V83UrCvgyH2JAD
@Born14
Copy link
Copy Markdown
Owner Author

Born14 commented Apr 5, 2026

Deferring until dirty count reaches 0 and the basic improve loop is stable.

What's merge-ready:

  • --continuous (hill-climbing iteration) — will cherry-pick when dirty = 0

What needs more work:

  • Directive file — not needed until discovery mode is active and we need to steer priorities
  • Prompt surface — marker strings are unvalidated against current gate code, needs design pass

Good research. Just not the priority right now. The priority is clearing the last 7 dirty scenarios and publishing v0.8.0 with clean sensors.

Born14 added a commit that referenced this pull request Apr 7, 2026
Extracts runSingleIteration() from runImproveLoop(). When maxIterations > 1,
the loop re-baselines after each accepted fix and runs again. Stops early
if an iteration produces no improvements.

Usage (disabled by default — single pass):
  bun run src/cli.ts improve --llm=gemini

Enable with --continuous:
  bun run src/cli.ts improve --llm=gemini --continuous
  bun run src/cli.ts improve --llm=gemini --continuous --max-iterations=10

Nightly.sh does NOT use --continuous (single pass per run, as before).
Enable when ready by adding --continuous to nightly.sh improve command.

Directive system and prompt surface optimization from PR #46 intentionally
NOT cherry-picked — deferred until needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Born14
Copy link
Copy Markdown
Owner Author

Born14 commented Apr 7, 2026

Continuous mode cherry-picked to main (997babb). Directive system and prompt surface optimization deferred — not needed. Closing.

@Born14 Born14 closed this Apr 7, 2026
@Born14 Born14 deleted the claude/evaluate-self-improvement-loop-3jCG4 branch April 7, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants