Augur

Runtime-aware debugging agents for live execution, structured trajectories, and benchmarked repair loops.

English

Overview

Augur is a runtime-aware debugging agent platform. It treats debugging as a stateful control problem over a real or simulated execution trace, not as a single prompt over source code.

The core loop is simple:

Observe the current runtime state.
Ask the agent to choose the next debugger action or repair proposal.
Execute the action in the environment.
Run verification when a candidate fix appears.
Record the full trajectory for replay, inspection, and evaluation.

That gives Augur a useful shape for both systems research and practical demos: live debugger integration, structured runtime traces, explicit verifier feedback, and a batch runner for repeatable experiments.

What Augur does

Runs a multi-step debugging loop over structured runtime observations.
Supports both simulated traces and live debugger sessions.
Normalizes decisions, tool results, verifier outcomes, and reflection notes into one trajectory schema.
Switches between providers through a unified config layer instead of hardcoded model branches.
Exposes evaluation-friendly APIs for creating, stepping, replaying, and inspecting runs.
Includes a web inspector, a VS Code bridge, and a CLI benchmark runner.

Why runtime-aware debugging

Source-only prompting is often enough to suggest a patch. It is much worse at answering questions like:

What line is actually failing right now?
Which values made the program drift into a bad state?
Should the next move be stepping, inspecting, setting a breakpoint, or proposing a fix?
Did the patch actually repair the behavior, or did it only sound plausible?

Augur is built around those questions. Each run captures execution context, not just code context, and verification sits inside the loop rather than at the end of a chat.

System architecture

`augur-debugger-service`

The backend runtime. It resolves provider config, owns the agent loop, manages structured runs, talks to live debug sessions, and exposes the main HTTP API.

`augur-web-visualizer`

The browser-based inspector. It shows trajectory steps, run summaries, observations, decisions, verifier results, and candidate fixes. It can inspect both research runs and attached live sessions.

`augur-vscode-extension`

The IDE entry point. It starts or attaches debugger sessions, collects runtime context from VS Code, and asks the backend for decisions using the same provider/config vocabulary as the rest of the system.

`augur-ci-agent`

The CLI runner. It supports single-run debugging, batch evaluation, and benchmark-style execution over DebugBench and DSDBench shaped datasets.

Agent loop

Each trajectory step follows the same structure:

observation Source location, pause reason, call stack, locals, and raw debugger metadata.
decision The model's chosen action, such as stepping, continuing, or proposing a fix.
toolResult The normalized effect of that action inside the runtime.
verifier A pass, fail, or continue signal, with command and output summaries when verification runs.
reflection A compact note that carries verifier failures or next-step guidance into the following decision.

This gives Augur a clean replay format for case studies, demos, and evaluation.

Unified provider configuration

Augur uses one config model across the backend, web UI, VS Code extension, and CLI runner.

Supported providers:

openai_compatible
gemini
anthropic
mock

Config precedence:

CLI flags > request config > env vars > benchmark defaults > built-in defaults

Common environment variables:

AUGUR_PROVIDER=openai_compatible
AUGUR_MODEL=qwen2.5-coder-32b-instruct
AUGUR_BASE_URL=http://localhost:8000/v1
AUGUR_API_KEY=
AUGUR_API_KEY_ENV=OPENAI_API_KEY
AUGUR_MAX_STEPS=24
AUGUR_TIMEOUT_MS=10000
AUGUR_MAX_REPAIR_ATTEMPTS=2
AUGUR_VERIFIER_COMMAND=
AUGUR_VERIFIER_EXPECTED_OUTPUT=
AUGUR_ENTRY_FILE_NAME=main.py

Provider-specific fallback variables are also supported:

OPENAI_API_KEY, OPENAI_BASE_URL
GEMINI_API_KEY, GEMINI_API_BASE_URL
CLAUDE_API_KEY

This makes it easy to switch between hosted APIs, OpenAI-compatible local servers, and deterministic mock runs without changing the surrounding workflow.

Runtime API

Main endpoints:

GET /api/v1/providers Returns supported providers, required fields, legacy env fallbacks, and example flag shapes.
POST /api/v1/generateSimulation Builds a simulated execution trace from source code and a provider config.
POST /api/v1/getDebugAction Requests a debugging decision from the selected provider.
POST /api/v1/runs Creates a structured runtime run.
POST /api/v1/runs/:id/step Advances a run by one or more agent steps.
GET /api/v1/runs Lists recent runs.
GET /api/v1/runs/:id Returns a run with its full trajectory and summary.
GET /api/v1/session/:id/runtime Normalizes a paused live session into a structured observation.
POST /api/v1/session/:id/runtime/decision Generates a trajectory-shaped decision for a live debugger stop.

Verification model

When the agent proposes a fix, Augur can verify it in a temporary workspace. The verifier writes a temp copy of the target file, runs the configured command, and records whether the repair passed.

That behavior is useful for evaluation because it keeps verification explicit and reproducible without mutating the checked-out workspace by default.

Benchmark runner

The CLI runner supports two modes:

Single run

cd augur-ci-agent
npm install
npm start -- --provider mock --model mock path/to/script.py

Batch evaluation

npm start -- --benchmark debugbench --dataset ./samples/debugbench-sample.json --output-dir ./outputs/debugbench --provider mock --model mock

Supported benchmark shape:

Python-executable subsets of DebugBench-style datasets
Python-executable subsets of DSDBench-style datasets

Unsupported cases are marked explicitly instead of being skipped silently.

Artifacts written to --output-dir:

trajectories.jsonl
results.json
summary.json
summary.csv

Summary metrics include:

success rate
verifier pass rate
average steps
average repair attempts
unsupported rate
per-provider breakdown

Quick start

1. Start the backend

cd augur-debugger-service
npm install
npm test
npm start

2. Start the web inspector

cd ../augur-web-visualizer
npm install
npm run build
npm run dev

Open http://localhost:5173.

3. Run the CLI runner

cd ../augur-ci-agent
npm install
npm test
npm start

4. Use the VS Code extension

Open the augur-vscode-extension folder in VS Code.
Run the extension host.
Use Augur: Debug in Live Panel from the file or folder context menu.
The extension will start the backend services, create a live session, and open the web inspector.

Example provider setups

Gemini

$env:AUGUR_PROVIDER="gemini"
$env:AUGUR_MODEL="gemini-2.5-pro"
$env:GEMINI_API_KEY="your-key"

Anthropic

$env:AUGUR_PROVIDER="anthropic"
$env:AUGUR_MODEL="claude-sonnet-4-5-20250929"
$env:CLAUDE_API_KEY="your-key"

OpenAI-compatible local server

npm start -- --provider openai_compatible --model qwen2.5-coder-32b-instruct --base-url http://localhost:8000/v1

Mock provider

npm start -- --provider mock --model mock

Repository layout

augur-debugger-service/   backend runtime, providers, trajectories, verification, live session APIs
augur-web-visualizer/     React-based trajectory inspector and runtime UI
augur-vscode-extension/   VS Code integration and live-session launcher
augur-ci-agent/           CLI runner and benchmark harness
docs/plans/               design notes and implementation records

Use cases

Research demos for agent systems and software engineering agents
Interactive runtime debugging experiments
Verifier-guided repair loops over small Python programs
Case studies that need replayable trajectories instead of ad hoc logs
Provider comparisons under one shared runtime scaffold

License

This project is released under the Non-Commercial License. See LICENSE for details.

中文

项目简介

Augur 是一个面向软件调试场景的 runtime-aware agent 平台。它不把调试看成“对着源码问一次模型”，而是把它组织成一个围绕真实运行时状态展开的多步闭环：

观察当前运行时状态
让 agent 选择下一步调试动作或修复建议
在环境里执行动作
出现候选修复时运行 verifier
记录完整轨迹，供回放、分析和评测

因此 Augur 更像一个调试智能体系统，而不是一个单次问答式的 AI 调试器。

它能做什么

在结构化 runtime observation 上运行多步调试 loop
同时支持 simulated trace 和 live debugger session
把 decision、tool result、verifier、reflection 统一记录到同一套 trajectory schema
通过统一配置层切换多种模型和接口
提供面向评测的 HTTP API、Web inspector、VS Code 接入和 CLI runner

为什么强调 runtime-aware

只看源码时，模型很容易给出“看起来像修复”的答案，但很难回答这些更关键的问题：

现在到底停在什么位置？
哪些变量值把程序带到了错误状态？
下一步应该 step、inspect、continue，还是直接 propose fix？
补丁到底修没修好，还是只是听起来合理？

Augur 的设计重点就在这里：把运行时信息和 verifier 都放进 agent loop 里，而不是只把它们留在调试器日志或事后测试里。

系统结构

`augur-debugger-service`

后端运行时核心。负责 provider 配置解析、agent loop、trajectory 管理、live session 集成，以及主要 API。

`augur-web-visualizer`

浏览器里的轨迹检查器。可以查看 run summary、trajectory step、runtime observation、verifier 结果和 candidate fix。

`augur-vscode-extension`

VS Code 入口。用于启动或接入 live 调试会话，从编辑器里采集运行时上下文，并通过统一配置协议向后端请求决策。

`augur-ci-agent`

命令行 runner。支持单次运行、批量评测，以及面向 DebugBench / DSDBench 风格数据集的 benchmark 执行。

agent loop

Augur 里的每一步轨迹都包含五个部分：

observation 源码位置、暂停原因、调用栈、局部变量和调试元数据
decision 模型给出的下一步动作
toolResult 该动作在环境中的标准化执行结果
verifier 当出现候选修复时的验证结果
reflection 对 verifier 失败原因或下一步方向的压缩反馈

这样一条 run 不只是“最后修没修好”，而是一个可以被回放、分析、比较的完整调试轨迹。

统一配置层

Augur 的后端、Web、VS Code 扩展和 CLI 共用同一套配置模型。

支持的 provider：

openai_compatible
gemini
anthropic
mock

配置优先级：

CLI 参数 > 请求体 config > 环境变量 > benchmark 默认值 > 内置默认值

常用环境变量：

AUGUR_PROVIDER=openai_compatible
AUGUR_MODEL=qwen2.5-coder-32b-instruct
AUGUR_BASE_URL=http://localhost:8000/v1
AUGUR_API_KEY=
AUGUR_API_KEY_ENV=OPENAI_API_KEY
AUGUR_MAX_STEPS=24
AUGUR_TIMEOUT_MS=10000
AUGUR_MAX_REPAIR_ATTEMPTS=2
AUGUR_VERIFIER_COMMAND=
AUGUR_VERIFIER_EXPECTED_OUTPUT=
AUGUR_ENTRY_FILE_NAME=main.py

后端也兼容 provider 自己的旧环境变量：

OPENAI_API_KEY, OPENAI_BASE_URL
GEMINI_API_KEY, GEMINI_API_BASE_URL
CLAUDE_API_KEY

这让你可以在 hosted API、OpenAI-compatible 本地服务和 mock 运行之间平滑切换，而不用改整套工作流。

核心 API

GET /api/v1/providers 返回支持的 provider、必要字段、legacy env fallback 和示例参数
POST /api/v1/generateSimulation 基于源码和 provider 配置生成 simulated trace
POST /api/v1/getDebugAction 请求一次调试决策
POST /api/v1/runs 创建结构化 runtime run
POST /api/v1/runs/:id/step 推进 run 的 agent loop
GET /api/v1/runs 列出最近的 run
GET /api/v1/runs/:id 获取完整轨迹和 summary
GET /api/v1/session/:id/runtime 把 live debugger stop 标准化成 observation
POST /api/v1/session/:id/runtime/decision 为 live stop 生成 trajectory 形状的决策记录

verifier 机制

当 agent 提出修复建议时，Augur 可以在临时工作目录里执行 verifier。它会写入临时文件、运行配置好的命令，并记录修复是否通过。

这种设计的好处是：验证过程明确、可复现，而且默认不会直接修改当前工作区。

benchmark runner

CLI runner 支持两种模式。

单次运行

cd augur-ci-agent
npm install
npm start -- --provider mock --model mock path/to/script.py

批量评测

npm start -- --benchmark debugbench --dataset ./samples/debugbench-sample.json --output-dir ./outputs/debugbench --provider mock --model mock

支持的数据形状：

DebugBench 风格数据集中的 Python 可执行子集
DSDBench 风格数据集中的 Python 可执行子集

不支持的 case 会被明确标成 unsupported，而不是悄悄跳过。

输出文件：

trajectories.jsonl
results.json
summary.json
summary.csv

统计指标包括：

success rate
verifier pass rate
average steps
average repair attempts
unsupported rate
per-provider breakdown

快速开始

1. 启动后端

cd augur-debugger-service
npm install
npm test
npm start

2. 启动前端

cd ../augur-web-visualizer
npm install
npm run build
npm run dev

浏览器打开 http://localhost:5173。

3. 运行 CLI runner

cd ../augur-ci-agent
npm install
npm test
npm start

4. 使用 VS Code 扩展

在 VS Code 中打开 augur-vscode-extension
运行 extension host
在文件或文件夹右键菜单中使用 Augur: Debug in Live Panel
扩展会拉起后端服务、创建 live session，并打开 Web inspector

provider 配置示例

Gemini

$env:AUGUR_PROVIDER="gemini"
$env:AUGUR_MODEL="gemini-2.5-pro"
$env:GEMINI_API_KEY="your-key"

Anthropic

$env:AUGUR_PROVIDER="anthropic"
$env:AUGUR_MODEL="claude-sonnet-4-5-20250929"
$env:CLAUDE_API_KEY="your-key"

OpenAI-compatible 本地服务

npm start -- --provider openai_compatible --model qwen2.5-coder-32b-instruct --base-url http://localhost:8000/v1

Mock provider

npm start -- --provider mock --model mock

仓库结构

augur-debugger-service/   后端 runtime、provider、trajectory、verification、live session API
augur-web-visualizer/     React 轨迹检查器和运行时可视化界面
augur-vscode-extension/   VS Code 集成与 live session 启动入口
augur-ci-agent/           CLI runner 与 benchmark harness
docs/plans/               设计文档与实施记录

适用场景

Agent / LLM Systems 的研究展示
软件调试智能体的交互式实验
带 verifier 的小规模程序修复闭环
需要可回放 trajectory 的 case study
在同一 runtime scaffold 下比较不同 provider 的行为

许可证

本项目使用 Non-Commercial License。详见 LICENSE。

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
assets		assets
augur-ci-agent		augur-ci-agent
augur-debugger-service		augur-debugger-service
augur-vscode-extension		augur-vscode-extension
augur-web-visualizer		augur-web-visualizer
site		site
.gitignore		.gitignore
License		License
README.md		README.md
install_windows_menu.bat		install_windows_menu.bat
launcher.js		launcher.js
uninstall_windows_menu.bat		uninstall_windows_menu.bat

Folders and files

Latest commit

History

Repository files navigation

Augur

English

Overview

What Augur does

Why runtime-aware debugging

System architecture

augur-debugger-service

augur-web-visualizer

augur-vscode-extension

augur-ci-agent

Agent loop

Unified provider configuration

Runtime API

Verification model

Benchmark runner

Single run

Batch evaluation

Quick start

1. Start the backend

2. Start the web inspector

3. Run the CLI runner

4. Use the VS Code extension

Example provider setups

Gemini

Anthropic

OpenAI-compatible local server

Mock provider

Repository layout

Use cases

License

中文

项目简介

它能做什么

为什么强调 runtime-aware

系统结构

augur-debugger-service

augur-web-visualizer

augur-vscode-extension

augur-ci-agent

agent loop

统一配置层

核心 API

verifier 机制

benchmark runner

单次运行

批量评测

快速开始

1. 启动后端

2. 启动前端

3. 运行 CLI runner

4. 使用 VS Code 扩展

provider 配置示例

Gemini

Anthropic

OpenAI-compatible 本地服务

Mock provider

仓库结构

适用场景

许可证

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

`augur-debugger-service`

`augur-web-visualizer`

`augur-vscode-extension`

`augur-ci-agent`

`augur-debugger-service`

`augur-web-visualizer`

`augur-vscode-extension`

`augur-ci-agent`

Packages