Skip to content

UPwith-me/Augur-Runtime-Debugging-Agent

Repository files navigation

Augur

Runtime-aware debugging agents for live execution, structured trajectories, and benchmarked repair loops.

Augur Banner

License: Non-Commercial TypeScript React Vite

English | 中文


English

Overview

Augur is a runtime-aware debugging agent platform. It treats debugging as a stateful control problem over a real or simulated execution trace, not as a single prompt over source code.

The core loop is simple:

  1. Observe the current runtime state.
  2. Ask the agent to choose the next debugger action or repair proposal.
  3. Execute the action in the environment.
  4. Run verification when a candidate fix appears.
  5. Record the full trajectory for replay, inspection, and evaluation.

That gives Augur a useful shape for both systems research and practical demos: live debugger integration, structured runtime traces, explicit verifier feedback, and a batch runner for repeatable experiments.

What Augur does

  • Runs a multi-step debugging loop over structured runtime observations.
  • Supports both simulated traces and live debugger sessions.
  • Normalizes decisions, tool results, verifier outcomes, and reflection notes into one trajectory schema.
  • Switches between providers through a unified config layer instead of hardcoded model branches.
  • Exposes evaluation-friendly APIs for creating, stepping, replaying, and inspecting runs.
  • Includes a web inspector, a VS Code bridge, and a CLI benchmark runner.

Why runtime-aware debugging

Source-only prompting is often enough to suggest a patch. It is much worse at answering questions like:

  • What line is actually failing right now?
  • Which values made the program drift into a bad state?
  • Should the next move be stepping, inspecting, setting a breakpoint, or proposing a fix?
  • Did the patch actually repair the behavior, or did it only sound plausible?

Augur is built around those questions. Each run captures execution context, not just code context, and verification sits inside the loop rather than at the end of a chat.

System architecture

augur-debugger-service

The backend runtime. It resolves provider config, owns the agent loop, manages structured runs, talks to live debug sessions, and exposes the main HTTP API.

augur-web-visualizer

The browser-based inspector. It shows trajectory steps, run summaries, observations, decisions, verifier results, and candidate fixes. It can inspect both research runs and attached live sessions.

augur-vscode-extension

The IDE entry point. It starts or attaches debugger sessions, collects runtime context from VS Code, and asks the backend for decisions using the same provider/config vocabulary as the rest of the system.

augur-ci-agent

The CLI runner. It supports single-run debugging, batch evaluation, and benchmark-style execution over DebugBench and DSDBench shaped datasets.

Agent loop

Each trajectory step follows the same structure:

  1. observation Source location, pause reason, call stack, locals, and raw debugger metadata.
  2. decision The model's chosen action, such as stepping, continuing, or proposing a fix.
  3. toolResult The normalized effect of that action inside the runtime.
  4. verifier A pass, fail, or continue signal, with command and output summaries when verification runs.
  5. reflection A compact note that carries verifier failures or next-step guidance into the following decision.

This gives Augur a clean replay format for case studies, demos, and evaluation.

Unified provider configuration

Augur uses one config model across the backend, web UI, VS Code extension, and CLI runner.

Supported providers:

  • openai_compatible
  • gemini
  • anthropic
  • mock

Config precedence:

CLI flags > request config > env vars > benchmark defaults > built-in defaults

Common environment variables:

AUGUR_PROVIDER=openai_compatible
AUGUR_MODEL=qwen2.5-coder-32b-instruct
AUGUR_BASE_URL=http://localhost:8000/v1
AUGUR_API_KEY=
AUGUR_API_KEY_ENV=OPENAI_API_KEY
AUGUR_MAX_STEPS=24
AUGUR_TIMEOUT_MS=10000
AUGUR_MAX_REPAIR_ATTEMPTS=2
AUGUR_VERIFIER_COMMAND=
AUGUR_VERIFIER_EXPECTED_OUTPUT=
AUGUR_ENTRY_FILE_NAME=main.py

Provider-specific fallback variables are also supported:

  • OPENAI_API_KEY, OPENAI_BASE_URL
  • GEMINI_API_KEY, GEMINI_API_BASE_URL
  • CLAUDE_API_KEY

This makes it easy to switch between hosted APIs, OpenAI-compatible local servers, and deterministic mock runs without changing the surrounding workflow.

Runtime API

Main endpoints:

  • GET /api/v1/providers Returns supported providers, required fields, legacy env fallbacks, and example flag shapes.
  • POST /api/v1/generateSimulation Builds a simulated execution trace from source code and a provider config.
  • POST /api/v1/getDebugAction Requests a debugging decision from the selected provider.
  • POST /api/v1/runs Creates a structured runtime run.
  • POST /api/v1/runs/:id/step Advances a run by one or more agent steps.
  • GET /api/v1/runs Lists recent runs.
  • GET /api/v1/runs/:id Returns a run with its full trajectory and summary.
  • GET /api/v1/session/:id/runtime Normalizes a paused live session into a structured observation.
  • POST /api/v1/session/:id/runtime/decision Generates a trajectory-shaped decision for a live debugger stop.

Verification model

When the agent proposes a fix, Augur can verify it in a temporary workspace. The verifier writes a temp copy of the target file, runs the configured command, and records whether the repair passed.

That behavior is useful for evaluation because it keeps verification explicit and reproducible without mutating the checked-out workspace by default.

Benchmark runner

The CLI runner supports two modes:

Single run

cd augur-ci-agent
npm install
npm start -- --provider mock --model mock path/to/script.py

Batch evaluation

npm start -- --benchmark debugbench --dataset ./samples/debugbench-sample.json --output-dir ./outputs/debugbench --provider mock --model mock

Supported benchmark shape:

  • Python-executable subsets of DebugBench-style datasets
  • Python-executable subsets of DSDBench-style datasets

Unsupported cases are marked explicitly instead of being skipped silently.

Artifacts written to --output-dir:

  • trajectories.jsonl
  • results.json
  • summary.json
  • summary.csv

Summary metrics include:

  • success rate
  • verifier pass rate
  • average steps
  • average repair attempts
  • unsupported rate
  • per-provider breakdown

Quick start

1. Start the backend

cd augur-debugger-service
npm install
npm test
npm start

2. Start the web inspector

cd ../augur-web-visualizer
npm install
npm run build
npm run dev

Open http://localhost:5173.

3. Run the CLI runner

cd ../augur-ci-agent
npm install
npm test
npm start

4. Use the VS Code extension

  • Open the augur-vscode-extension folder in VS Code.
  • Run the extension host.
  • Use Augur: Debug in Live Panel from the file or folder context menu.
  • The extension will start the backend services, create a live session, and open the web inspector.

Example provider setups

Gemini

$env:AUGUR_PROVIDER="gemini"
$env:AUGUR_MODEL="gemini-2.5-pro"
$env:GEMINI_API_KEY="your-key"

Anthropic

$env:AUGUR_PROVIDER="anthropic"
$env:AUGUR_MODEL="claude-sonnet-4-5-20250929"
$env:CLAUDE_API_KEY="your-key"

OpenAI-compatible local server

npm start -- --provider openai_compatible --model qwen2.5-coder-32b-instruct --base-url http://localhost:8000/v1

Mock provider

npm start -- --provider mock --model mock

Repository layout

augur-debugger-service/   backend runtime, providers, trajectories, verification, live session APIs
augur-web-visualizer/     React-based trajectory inspector and runtime UI
augur-vscode-extension/   VS Code integration and live-session launcher
augur-ci-agent/           CLI runner and benchmark harness
docs/plans/               design notes and implementation records

Use cases

  • Research demos for agent systems and software engineering agents
  • Interactive runtime debugging experiments
  • Verifier-guided repair loops over small Python programs
  • Case studies that need replayable trajectories instead of ad hoc logs
  • Provider comparisons under one shared runtime scaffold

License

This project is released under the Non-Commercial License. See LICENSE for details.


中文

项目简介

Augur 是一个面向软件调试场景的 runtime-aware agent 平台。它不把调试看成“对着源码问一次模型”,而是把它组织成一个围绕真实运行时状态展开的多步闭环:

  1. 观察当前运行时状态
  2. 让 agent 选择下一步调试动作或修复建议
  3. 在环境里执行动作
  4. 出现候选修复时运行 verifier
  5. 记录完整轨迹,供回放、分析和评测

因此 Augur 更像一个调试智能体系统,而不是一个单次问答式的 AI 调试器。

它能做什么

  • 在结构化 runtime observation 上运行多步调试 loop
  • 同时支持 simulated trace 和 live debugger session
  • 把 decision、tool result、verifier、reflection 统一记录到同一套 trajectory schema
  • 通过统一配置层切换多种模型和接口
  • 提供面向评测的 HTTP API、Web inspector、VS Code 接入和 CLI runner

为什么强调 runtime-aware

只看源码时,模型很容易给出“看起来像修复”的答案,但很难回答这些更关键的问题:

  • 现在到底停在什么位置?
  • 哪些变量值把程序带到了错误状态?
  • 下一步应该 step、inspect、continue,还是直接 propose fix?
  • 补丁到底修没修好,还是只是听起来合理?

Augur 的设计重点就在这里:把运行时信息和 verifier 都放进 agent loop 里,而不是只把它们留在调试器日志或事后测试里。

系统结构

augur-debugger-service

后端运行时核心。负责 provider 配置解析、agent loop、trajectory 管理、live session 集成,以及主要 API。

augur-web-visualizer

浏览器里的轨迹检查器。可以查看 run summary、trajectory step、runtime observation、verifier 结果和 candidate fix。

augur-vscode-extension

VS Code 入口。用于启动或接入 live 调试会话,从编辑器里采集运行时上下文,并通过统一配置协议向后端请求决策。

augur-ci-agent

命令行 runner。支持单次运行、批量评测,以及面向 DebugBench / DSDBench 风格数据集的 benchmark 执行。

agent loop

Augur 里的每一步轨迹都包含五个部分:

  1. observation 源码位置、暂停原因、调用栈、局部变量和调试元数据
  2. decision 模型给出的下一步动作
  3. toolResult 该动作在环境中的标准化执行结果
  4. verifier 当出现候选修复时的验证结果
  5. reflection 对 verifier 失败原因或下一步方向的压缩反馈

这样一条 run 不只是“最后修没修好”,而是一个可以被回放、分析、比较的完整调试轨迹。

统一配置层

Augur 的后端、Web、VS Code 扩展和 CLI 共用同一套配置模型。

支持的 provider:

  • openai_compatible
  • gemini
  • anthropic
  • mock

配置优先级:

CLI 参数 > 请求体 config > 环境变量 > benchmark 默认值 > 内置默认值

常用环境变量:

AUGUR_PROVIDER=openai_compatible
AUGUR_MODEL=qwen2.5-coder-32b-instruct
AUGUR_BASE_URL=http://localhost:8000/v1
AUGUR_API_KEY=
AUGUR_API_KEY_ENV=OPENAI_API_KEY
AUGUR_MAX_STEPS=24
AUGUR_TIMEOUT_MS=10000
AUGUR_MAX_REPAIR_ATTEMPTS=2
AUGUR_VERIFIER_COMMAND=
AUGUR_VERIFIER_EXPECTED_OUTPUT=
AUGUR_ENTRY_FILE_NAME=main.py

后端也兼容 provider 自己的旧环境变量:

  • OPENAI_API_KEY, OPENAI_BASE_URL
  • GEMINI_API_KEY, GEMINI_API_BASE_URL
  • CLAUDE_API_KEY

这让你可以在 hosted API、OpenAI-compatible 本地服务和 mock 运行之间平滑切换,而不用改整套工作流。

核心 API

  • GET /api/v1/providers 返回支持的 provider、必要字段、legacy env fallback 和示例参数
  • POST /api/v1/generateSimulation 基于源码和 provider 配置生成 simulated trace
  • POST /api/v1/getDebugAction 请求一次调试决策
  • POST /api/v1/runs 创建结构化 runtime run
  • POST /api/v1/runs/:id/step 推进 run 的 agent loop
  • GET /api/v1/runs 列出最近的 run
  • GET /api/v1/runs/:id 获取完整轨迹和 summary
  • GET /api/v1/session/:id/runtime 把 live debugger stop 标准化成 observation
  • POST /api/v1/session/:id/runtime/decision 为 live stop 生成 trajectory 形状的决策记录

verifier 机制

当 agent 提出修复建议时,Augur 可以在临时工作目录里执行 verifier。它会写入临时文件、运行配置好的命令,并记录修复是否通过。

这种设计的好处是:验证过程明确、可复现,而且默认不会直接修改当前工作区。

benchmark runner

CLI runner 支持两种模式。

单次运行

cd augur-ci-agent
npm install
npm start -- --provider mock --model mock path/to/script.py

批量评测

npm start -- --benchmark debugbench --dataset ./samples/debugbench-sample.json --output-dir ./outputs/debugbench --provider mock --model mock

支持的数据形状:

  • DebugBench 风格数据集中的 Python 可执行子集
  • DSDBench 风格数据集中的 Python 可执行子集

不支持的 case 会被明确标成 unsupported,而不是悄悄跳过。

输出文件:

  • trajectories.jsonl
  • results.json
  • summary.json
  • summary.csv

统计指标包括:

  • success rate
  • verifier pass rate
  • average steps
  • average repair attempts
  • unsupported rate
  • per-provider breakdown

快速开始

1. 启动后端

cd augur-debugger-service
npm install
npm test
npm start

2. 启动前端

cd ../augur-web-visualizer
npm install
npm run build
npm run dev

浏览器打开 http://localhost:5173

3. 运行 CLI runner

cd ../augur-ci-agent
npm install
npm test
npm start

4. 使用 VS Code 扩展

  • 在 VS Code 中打开 augur-vscode-extension
  • 运行 extension host
  • 在文件或文件夹右键菜单中使用 Augur: Debug in Live Panel
  • 扩展会拉起后端服务、创建 live session,并打开 Web inspector

provider 配置示例

Gemini

$env:AUGUR_PROVIDER="gemini"
$env:AUGUR_MODEL="gemini-2.5-pro"
$env:GEMINI_API_KEY="your-key"

Anthropic

$env:AUGUR_PROVIDER="anthropic"
$env:AUGUR_MODEL="claude-sonnet-4-5-20250929"
$env:CLAUDE_API_KEY="your-key"

OpenAI-compatible 本地服务

npm start -- --provider openai_compatible --model qwen2.5-coder-32b-instruct --base-url http://localhost:8000/v1

Mock provider

npm start -- --provider mock --model mock

仓库结构

augur-debugger-service/   后端 runtime、provider、trajectory、verification、live session API
augur-web-visualizer/     React 轨迹检查器和运行时可视化界面
augur-vscode-extension/   VS Code 集成与 live session 启动入口
augur-ci-agent/           CLI runner 与 benchmark harness
docs/plans/               设计文档与实施记录

适用场景

  • Agent / LLM Systems 的研究展示
  • 软件调试智能体的交互式实验
  • 带 verifier 的小规模程序修复闭环
  • 需要可回放 trajectory 的 case study
  • 在同一 runtime scaffold 下比较不同 provider 的行为

许可证

本项目使用 Non-Commercial License。详见 LICENSE

About

An autonomous AI debugging agent powered by the Debug Adapter Protocol (DAP). Augur allows LLMs to interact with a program's live runtime state to diagnose issues. This repository contains both the VS Code plugin (executor) and a web-based simulator (visualizer).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages