Skip to content

VectorInstitute/helix

helix

code checks unit tests codecov PyPI GitHub License

Inspired by karpathy/autoresearch, helix generalizes the idea of autonomous AI research loops beyond LLM training. Give an agent a codebase, a metric, and a fixed time budget. It experiments overnight. You wake up to results.

The git history is the research trail. experiments.tsv is the proof. Anyone can clone a helix, run it on their hardware, and independently verify every result.

Concepts

Term Meaning
helix A git repo containing helix.yaml + program.md + a codebase the agent can modify
helix.yaml Machine-readable spec: what to optimize, how to measure it, which files are editable
program.md Human-written instructions for the agent: domain knowledge, constraints, techniques to try
experiments.tsv Append-only ledger of every experiment: commit, metric, status, description
helix run CLI command that launches an autonomous session on your hardware

Quick start

helix is agent-agnostic. Pick a backend or bring your own.

Backend Install Requires
ClaudeBackend (default) pip install 'helices[claude]' Claude Code CLI
GeminiBackend pip install helices Gemini CLI
Custom pip install helices Implement the AgentBackend protocol

Run an existing helix

# from within a helix directory (one that has helix.yaml)
helix run              # start a session tagged with today's date
helix run --tag exp1   # custom tag
helix status           # show current best and recent experiments

Examples

helix-examples is a curated gallery of standalone helices, each in its own repo and included as a git submodule.

git clone --recurse-submodules git@github.com:VectorInstitute/helix-examples.git
cd helix-examples/inference-opt
uv run prepare.py   # one-time: download model + dataset
helix run

The first example, helix-inference-opt, optimizes inference throughput for a causal language model on WikiText-2. The agent modifies infer.py (batching, quantization, torch.compile, etc.) and automatically merges improvements back to main.

Writing your own helix

The typical starting point is an existing research codebase. helix init drops the helix layer on top without touching your code.

cd my-research-project        # your existing git repo
pip install 'helices[claude]'
helix init . --domain "AI/ML" --description "Optimize X for task Y."

helix init is non-destructive: it skips any file that already exists, so running it against a repo with an existing pyproject.toml or uv.lock is safe.

Then:

  1. Edit helix.yaml: set scope.editable to the files the agent may modify, and set evaluate.command to your evaluation script.
  2. Edit program.md: describe your codebase, goal, constraints, and techniques to try.
  3. Run helix run.

If you are starting from scratch:

helix init my-project --domain "AI/ML" --description "Optimize X for task Y."
cd my-project && git init
# add your codebase, fill in helix.yaml and program.md, then:
helix run

Minimal helix.yaml

name: my-helix
domain: AI/ML
description: Optimize X for task Y.

scope:
  editable: [train.py]
  readonly: [evaluate.py, program.md, helix.yaml]

metrics:
  primary:
    name: accuracy
    optimize: maximize
  evaluate:
    command: python evaluate.py
    timeout_seconds: 120
    output_format: pattern
    patterns:
      primary: '^accuracy:\s+([\d.]+)'

About

Autonomous research loops. Reproducible, shareable, verifiable.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages