AutoPrompter: Autonomous Prompt Optimization System

AutoPrompter is an autonomous system designed to iteratively improve LLM prompts through a closed-loop optimization process. It merges the validation and metrics capabilities of tools like promptfoo with the iterative improvement logic of autoresearch.

System Architecture

The system operates in a continuous loop where an Optimizer LLM refines prompts for a Target LLM based on empirical performance data.

Dataset Generation: The Optimizer LLM (Gemini 3.1 Flash - customizable through config.yaml) generates a synthetic dataset of input/output pairs based on the task description.
Iterative Improvement:
- The Target LLM (Qwen 3.5 9b) is tested against the current prompt using the generated dataset.
- Performance is measured using a defined metric (Accuracy, F1, Semantic Similarity, etc.).
- The Optimizer LLM analyzes failures and successes to generate a refined prompt.
Experiment Ledger: Every iteration is recorded in a persistent ledger to prevent duplicate experiments and track progress.
Context Management: The system manages the history of experiments to provide the Optimizer LLM with relevant context without exceeding window limits.

Core Components

Optimizer LLM: google/gemini-3.1-flash-lite-preview (via OpenRouter)
Target LLM: qwen/qwen3.5-9b (via OpenRouter)
Metrics: Automated evaluation against expected outputs.
Ledger: JSON-based tracking of all experiments.

Installation

Clone the repository:

git clone https://github.com/gauravvij/AutoPrompter.git
cd AutoPrompter

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Configure API Key (for OpenRouter backend): Set the OPENROUTER_API_KEY environment variable or add it to /root/.config/openrouter/config.

Note: API key is only required when using OpenRouter backend. Local backends (Ollama, llama.cpp) do not require API keys.

Configuration

The system is configured via YAML files. Key fields include:

optimizer_llm: Model ID and parameters for the optimizer.
target_llm: Model ID and parameters for the target.
experiment: max_iterations, batch_size, and convergence thresholds.
task: name, description, and initial_prompt.
metric: type (e.g., accuracy, semantic_similarity) and target_score.
storage: Paths for the ledger, dataset, and results.

Supported Backends

AutoPrompter supports multiple LLM backends:

OpenRouter (default): Cloud-based LLM access via OpenRouter API
Ollama: Local LLM inference using Ollama server
llama.cpp: Local LLM inference using llama.cpp server

OpenRouter Configuration (Default)

optimizer_llm:
  backend: "openrouter"
  model: "google/gemini-3.1-flash-lite-preview"
  api_base: "https://openrouter.ai/api/v1"
  temperature: 0.7
  max_tokens: 4096

Ollama Configuration

optimizer_llm:
  backend: "ollama"
  model: "llama3.2"
  host: "http://localhost"
  port: 11434
  temperature: 0.7
  max_tokens: 4096

llama.cpp Configuration

optimizer_llm:
  backend: "llama_cpp"
  model: "llama-3.2-3b"
  host: "http://localhost"
  port: 8080
  temperature: 0.7
  max_tokens: 4096

Setting Up Local Backends

Ollama Setup

Install Ollama: https://ollama.ai
Pull a model:
```
ollama pull llama3.2
```
Start the server:
```
ollama serve
```
Use config_ollama.yaml or configure your own with backend: "ollama"

llama.cpp Setup

Build llama.cpp from source: https://github.com/ggerganov/llama.cpp
Download a GGUF model (e.g., from https://huggingface.co/TheBloke)

Start the server:

./llama-server -m llama-3.2-3b.Q4_K_M.gguf -c 4096 --host 0.0.0.0 --port 8080

Use config_llama_cpp.yaml or configure your own with backend: "llama_cpp"

Auto-Detection

You can also use backend: "auto" to automatically detect the available backend:

optimizer_llm:
  backend: "auto"
  model: "llama3.2"
  host: "http://localhost"
  port: 11434

The system will probe both Ollama and llama.cpp endpoints to determine which is available.

Usage

Command Line Interface

Run the optimization process using the main entry point:

python main.py --config config.yaml

Web UI (New!)

AutoPrompter now includes a web-based dashboard for real-time monitoring and control:

python web_ui.py

Then open http://localhost:5000 in your browser.

Web UI Features:

Real-time Dashboard: Live updates of optimization progress with score history charts
Configuration Builder: Interactive form to create and edit optimization configs
Checkpoint Management: Save, load, and manage optimization checkpoints
Prompt Diff Visualization: Compare prompt iterations side-by-side with highlighted changes
Export/Import: Save complete optimization runs to JSON and reload them later
Log Streaming: Real-time log output via Server-Sent Events

Specific Task Examples

The repository includes pre-configured tasks:

Blogging: Optimize prompts for high-quality blog post generation.
```
python main.py --config config_blogging.yaml
```
Math: Optimize prompts for solving complex mathematical problems.
```
python main.py --config config_math.yaml
```
Reasoning: Optimize prompts for logical reasoning and chain-of-thought tasks.
```
python main.py --config config_reasoning.yaml
```

Using Local Backends

Ollama: Use the Ollama backend for local inference
```
python main.py --config config_ollama.yaml
```
llama.cpp: Use the llama.cpp backend for local inference
```
python main.py --config config_llama_cpp.yaml
```

Note: Make sure your local backend server is running before starting the optimization process.

Command Line Overrides

You can override configuration values directly from the CLI:

python main.py --config config.yaml --max-iterations 50 --override experiment.batch_size=10

Recent Enhancements

Robustness & Stability Improvements

Baseline Evaluation Fix: best_score now initializes from actual baseline prompt evaluation instead of 0.0
Statistical Significance Testing: Welch's t-test with bootstrap fallback prevents noise-driven prompt switches (p<0.05 threshold)
Stagnation Detection: Increased threshold from 2 to 5 iterations before triggering diversification
Semantic Duplicate Detection: Embedding-based similarity check (SentenceTransformer, 0.95 threshold) prevents re-evaluating near-identical prompts
Prompt Complexity Tracking: Monitors length, word count, and instruction count to detect "longer but not better" patterns

Parallel Execution & Robustness Testing

Parallel Experiment Executor: ThreadPoolExecutor for simultaneous candidate evaluation (max_workers=3)
Robustness Testing Framework: Adversarial input generation (typos, ambiguity, format variations) with consistency scoring

Web UI (New)

Flask-based web interface with REST API and Server-Sent Events
Real-time dashboard with Chart.js visualization
Checkpoint management and configuration builder
Prompt diff visualization with syntax highlighting
Export/import functionality for saving optimization runs

See CHANGELOG.md for detailed version history.

License

MIT

NEO - A fully autonomous AI Engineer

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
static		static
templates		templates
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
config.yaml		config.yaml
config_blogging.yaml		config_blogging.yaml
config_llama_cpp.yaml		config_llama_cpp.yaml
config_math.yaml		config_math.yaml
config_ollama.yaml		config_ollama.yaml
config_reasoning.yaml		config_reasoning.yaml
config_web.yaml		config_web.yaml
experiment_ledger.json		experiment_ledger.json
main.py		main.py
requirements.txt		requirements.txt
web_ui.py		web_ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoPrompter: Autonomous Prompt Optimization System

System Architecture

Core Components

Installation

Configuration

Supported Backends

OpenRouter Configuration (Default)

Ollama Configuration

llama.cpp Configuration

Setting Up Local Backends

Ollama Setup

llama.cpp Setup

Auto-Detection

Usage

Command Line Interface

Web UI (New!)

Specific Task Examples

Using Local Backends

Command Line Overrides

Recent Enhancements

Robustness & Stability Improvements

Parallel Execution & Robustness Testing

Web UI (New)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoPrompter: Autonomous Prompt Optimization System

System Architecture

Core Components

Installation

Configuration

Supported Backends

OpenRouter Configuration (Default)

Ollama Configuration

llama.cpp Configuration

Setting Up Local Backends

Ollama Setup

llama.cpp Setup

Auto-Detection

Usage

Command Line Interface

Web UI (New!)

Specific Task Examples

Using Local Backends

Command Line Overrides

Recent Enhancements

Robustness & Stability Improvements

Parallel Execution & Robustness Testing

Web UI (New)

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages