AutoPrompter is an autonomous system designed to iteratively improve LLM prompts through a closed-loop optimization process. It merges the validation and metrics capabilities of tools like promptfoo with the iterative improvement logic of autoresearch.
The system operates in a continuous loop where an Optimizer LLM refines prompts for a Target LLM based on empirical performance data.
- Dataset Generation: The Optimizer LLM (Gemini 3.1 Flash - customizable through config.yaml) generates a synthetic dataset of input/output pairs based on the task description.
- Iterative Improvement:
- The Target LLM (Qwen 3.5 9b) is tested against the current prompt using the generated dataset.
- Performance is measured using a defined metric (Accuracy, F1, Semantic Similarity, etc.).
- The Optimizer LLM analyzes failures and successes to generate a refined prompt.
- Experiment Ledger: Every iteration is recorded in a persistent ledger to prevent duplicate experiments and track progress.
- Context Management: The system manages the history of experiments to provide the Optimizer LLM with relevant context without exceeding window limits.
- Optimizer LLM:
google/gemini-3.1-flash-lite-preview(via OpenRouter) - Target LLM:
qwen/qwen3.5-9b(via OpenRouter) - Metrics: Automated evaluation against expected outputs.
- Ledger: JSON-based tracking of all experiments.
-
Clone the repository:
git clone https://github.com/gauravvij/AutoPrompter.git cd AutoPrompter -
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Configure API Key (for OpenRouter backend): Set the
OPENROUTER_API_KEYenvironment variable or add it to/root/.config/openrouter/config.Note: API key is only required when using OpenRouter backend. Local backends (Ollama, llama.cpp) do not require API keys.
The system is configured via YAML files. Key fields include:
optimizer_llm: Model ID and parameters for the optimizer.target_llm: Model ID and parameters for the target.experiment:max_iterations,batch_size, and convergence thresholds.task:name,description, andinitial_prompt.metric:type(e.g.,accuracy,semantic_similarity) andtarget_score.storage: Paths for the ledger, dataset, and results.
AutoPrompter supports multiple LLM backends:
- OpenRouter (default): Cloud-based LLM access via OpenRouter API
- Ollama: Local LLM inference using Ollama server
- llama.cpp: Local LLM inference using llama.cpp server
optimizer_llm:
backend: "openrouter"
model: "google/gemini-3.1-flash-lite-preview"
api_base: "https://openrouter.ai/api/v1"
temperature: 0.7
max_tokens: 4096optimizer_llm:
backend: "ollama"
model: "llama3.2"
host: "http://localhost"
port: 11434
temperature: 0.7
max_tokens: 4096optimizer_llm:
backend: "llama_cpp"
model: "llama-3.2-3b"
host: "http://localhost"
port: 8080
temperature: 0.7
max_tokens: 4096- Install Ollama: https://ollama.ai
- Pull a model:
ollama pull llama3.2
- Start the server:
ollama serve
- Use
config_ollama.yamlor configure your own withbackend: "ollama"
- Build llama.cpp from source: https://github.com/ggerganov/llama.cpp
- Download a GGUF model (e.g., from https://huggingface.co/TheBloke)
- Start the server:
./llama-server -m llama-3.2-3b.Q4_K_M.gguf -c 4096 --host 0.0.0.0 --port 8080
- Use
config_llama_cpp.yamlor configure your own withbackend: "llama_cpp"
You can also use backend: "auto" to automatically detect the available backend:
optimizer_llm:
backend: "auto"
model: "llama3.2"
host: "http://localhost"
port: 11434The system will probe both Ollama and llama.cpp endpoints to determine which is available.
Run the optimization process using the main entry point:
python main.py --config config.yamlAutoPrompter now includes a web-based dashboard for real-time monitoring and control:
python web_ui.pyThen open http://localhost:5000 in your browser.
Web UI Features:
- Real-time Dashboard: Live updates of optimization progress with score history charts
- Configuration Builder: Interactive form to create and edit optimization configs
- Checkpoint Management: Save, load, and manage optimization checkpoints
- Prompt Diff Visualization: Compare prompt iterations side-by-side with highlighted changes
- Export/Import: Save complete optimization runs to JSON and reload them later
- Log Streaming: Real-time log output via Server-Sent Events
The repository includes pre-configured tasks:
- Blogging: Optimize prompts for high-quality blog post generation.
python main.py --config config_blogging.yaml
- Math: Optimize prompts for solving complex mathematical problems.
python main.py --config config_math.yaml
- Reasoning: Optimize prompts for logical reasoning and chain-of-thought tasks.
python main.py --config config_reasoning.yaml
-
Ollama: Use the Ollama backend for local inference
python main.py --config config_ollama.yaml
-
llama.cpp: Use the llama.cpp backend for local inference
python main.py --config config_llama_cpp.yaml
Note: Make sure your local backend server is running before starting the optimization process.
You can override configuration values directly from the CLI:
python main.py --config config.yaml --max-iterations 50 --override experiment.batch_size=10- Baseline Evaluation Fix:
best_scorenow initializes from actual baseline prompt evaluation instead of 0.0 - Statistical Significance Testing: Welch's t-test with bootstrap fallback prevents noise-driven prompt switches (p<0.05 threshold)
- Stagnation Detection: Increased threshold from 2 to 5 iterations before triggering diversification
- Semantic Duplicate Detection: Embedding-based similarity check (SentenceTransformer, 0.95 threshold) prevents re-evaluating near-identical prompts
- Prompt Complexity Tracking: Monitors length, word count, and instruction count to detect "longer but not better" patterns
- Parallel Experiment Executor: ThreadPoolExecutor for simultaneous candidate evaluation (max_workers=3)
- Robustness Testing Framework: Adversarial input generation (typos, ambiguity, format variations) with consistency scoring
- Flask-based web interface with REST API and Server-Sent Events
- Real-time dashboard with Chart.js visualization
- Checkpoint management and configuration builder
- Prompt diff visualization with syntax highlighting
- Export/import functionality for saving optimization runs
See CHANGELOG.md for detailed version history.
MIT
NEO - A fully autonomous AI Engineer