You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Harness_Engineering_Regression_Copilot, abbreviated as HERC, is a repo-local AI regression testing CLI. It turns production failures, bad conversations, broken traces, historical complaints, and support tickets into maintainable regression cases, baseline responses, and report files.
The project fits teams that want to reliably turn “this broke before” into “this will be blocked next time.” It stays failure-first, local-first, and deterministic-first by default, with an emphasis on short, stable, reproducible engineering loops.
Across 920 protected historical instructions, shipped correctness rises from 92.8% without HERC to 100% with HERC, a net lift of +7.2 percentage points.
在 5000 个 case 的套件里,--changed 把执行面从 5000 个 case 缩到 3 个;在多轮复测里,执行量下降稳定保持 99.9%,总时间下降中位数是 27.2%。
In a 5000-case suite, --changed reduces execution from 5000 cases to 3; across repeated reruns, executed-case reduction stays at 99.9% and the median total time reduction is 27.2%.
After 1 warm-up plus 5 measured rounds, herc report --compare-previous still delivers a stable 60.8% median time reduction while compressing the flow to 1 command.
The package stays at 55.4 KB packed and 255.8 KB unpacked, ships with 2 runtime dependencies, and already runs in CI across macOS, Linux, Windows, plus Node 18/20.
--changed, exit codes, and markdown/json reports plug naturally into GitHub PRs and CI.
Cross-functional / 跨角色可用
Support、PM、QA、AI engineer、平台同学都能共享同一套失败资产。
Support, PM, QA, AI engineers, and platform teams can work from the same failure-derived assets.
Snapshot / 项目快照
Item / 项目
Value / 数值
Why it matters / 意义
CLI commands / CLI 命令
12
覆盖初始化、导入、提炼、审核、执行、报告全流程 / Covers bootstrap, import, distill, review, execution, and reporting
Run profiles / 执行档位
quick, standard, deep
可以在本地调试、常规回归、严格 gate 之间切换 / Switch between local debugging, standard regression, and stricter gates
Report formats / 报告格式
summary, markdown, json
同时照顾人读和机器集成 / Works for both human review and machine integration
Runtime dependencies / 运行时依赖
2
依赖面小,安装和维护都更轻 / Small dependency surface keeps installs and maintenance lightweight
Package footprint / 包体积
55.4 KB packed, 255.8 KB unpacked
适合本地优先和仓库内集成 / Well-suited to local-first and repo-embedded workflows
CI matrix / CI 矩阵
3 OS x 2 Node versions
当前 CI 覆盖 macOS、Linux、Windows,以及 Node 18/20 / Current CI covers macOS, Linux, Windows, and Node 18/20
Install / 安装
中文
English
当前推荐方式是直接从源码安装并通过 npm link 暴露 herc 命令。默认 deterministic 路径适合本地直接跑通,不需要先准备复杂外部基础设施。
The recommended path today is to install from source and expose the herc command with npm link. The default deterministic path is meant to run locally first, without requiring complex external infrastructure.
如果你只是想让 AI 在一台新机器上 clone 后立刻复现仓库里的公开 benchmark、结果 JSON 和白皮书,现在也可以直接跑一条跨平台命令:npm run benchmark:reproduce。
If you want an AI on a fresh machine to clone the repo and immediately reproduce the public benchmarks, JSON result files, and whitepaper, there is now a single cross-platform command: npm run benchmark:reproduce.
git clone https://github.com/Horace-Maxwell/Harness_Engineering_Regression_Copilot.git
cd Harness_Engineering_Regression_Copilot
npm ci
npm run build
npm link
herc version
If you do not want a global link, you can run commands directly with node dist/cli.js <command>. The repository also includes cross-platform installers: scripts/install-herc.sh and scripts/install-herc.ps1.
git clone https://github.com/Horace-Maxwell/Harness_Engineering_Regression_Copilot.git
cd Harness_Engineering_Regression_Copilot
npm ci
npm run benchmark:reproduce
This command builds the CLI, reruns the four public benchmark suites, refreshes benchmarks/results/*.json, and regenerates EVALUATION_WHITEPAPER.md. It does not require manual shell redirection or memorizing multiple script commands.
This path shows the shortest practical loop: initialize the workspace, import a real failure, distill it into a case, run the gate, and read the report.
Shell / Bash
herc init
printf'The answer approved a refund after 45 days.\n'| herc import --paste --append-tag refunds,policy
herc distill
herc run --profile standard
herc report --format summary
PowerShell
herc init
"The answer approved a refund after 45 days."| herc import --paste --append-tag refunds,policy
herc distill
herc run --profile standard
herc report --format summary
中文
English
初始化之后,你也可以继续用 herc doctor 检查工作区健康状态,用 herc inspect <caseId> 看单个 case 的结构化细节。
After initialization, you can also use herc doctor to check workspace health and herc inspect <caseId> to inspect one case in a structured view.
When a workflow metric has a CV above 10%, public summaries label it as directional uplift rather than a hard absolute performance promise.
Benchmark Method / 基准方法
Topic / 主题
中文
English
Baseline performance / 基础性能
测量原始失败到第一次红灯、手工 case 到首次通过、100 个确定性 case 执行,以及当前包体积。
Measures raw-failure-to-first-fail, manual-case-to-first-pass, 100 deterministic cases, and the current package footprint.
Workflow impact / 工作流收益
测量 Support 批量 incident 导入、AI engineer 从投诉到 gate、QA 接受 case、CI changed-only 和 deep profile 审核门禁。
Measures support batch import, AI engineer complaint-to-gate, QA accept flow, CI changed-only execution, and deep-profile review enforcement.
Adoption impact / 接入前后质量对照
在同一候选版本上做对照组与实验组比较:对照组直接发布,实验组先跑 HERC、修复失败项、再发布。
Compares a control and treatment on the same release candidate: the control ships directly, while the treatment runs HERC, fixes failures, and then ships.
Workflow upgrade impact / 工作流升级对照
专门评估新功能是否减少命令数、手工逻辑和误跑成本。
Isolates whether recent product improvements reduce commands, ad hoc logic, and wasted runs.
The 95% confidence intervals use bootstrap resampling at the protected-instruction level; before/after uplift uses paired resampling on the same instruction positions.
Reduction metrics / 下降类指标
时间下降和执行量下降统一按 (before - after) / before * 100 计算。
Time and execution reductions both use (before - after) / before * 100.
Benchmark Scorecard / 基准总览
Dimension / 维度
中文
English
Baseline runner performance / 基础执行性能
在稳定性复测里,从一条原始失败到第一次红灯的中位耗时是 464.6 ms;100 个确定性 case 的执行中位耗时是 144.7 ms。
In the stability study, HERC reaches the first failing gate from one raw failure in 464.6 ms median, and executes 100 deterministic cases in 144.7 ms median.
Real workflow throughput / 真实工作流吞吐
Support 批量导入 50 条 incident 并提炼成 case 的中位耗时是 542.2 ms,AI engineer 从投诉到红灯 gate 的中位耗时是 461.8 ms。
A support batch import of 50 incidents to draft cases finishes in 542.2 ms median, and an AI engineer can turn one complaint into a failing gate in 461.8 ms median.
After repeated reruns, report comparison time stays 60.8% lower, non-Git preflight time stays 66.3% lower, and 1000 wasted case executions are still avoided.
In 1 warm-up plus 5 measured rerun rounds, the core latency metrics stay mostly below 1% coefficient of variation; changed-only total time reduction lands at 11.7% CV, so the README frames it as directional uplift.
Baseline Performance / 基础性能
Scenario / 场景
Result / 结果
Why it matters / 意义
Raw failure to first failing gate / 原始失败到第一次红灯
464.6 ms median
适合本地调试、快速复现和修复回路 / Fast enough for local repro and fix loops
Manual case to first passing gate / 手工建 case 到首次通过
460.7 ms median
case 审核和 baseline 循环很短 / Keeps review and baseline loops short
100 deterministic cases / 100 个确定性 case
144.7 ms median
单次执行足够轻,适合高频本地运行 / Light enough for frequent local runs
Across 5 candidate release scenarios and 920 protected historical instructions, weighted correctness is 92.8% without HERC and reaches 100% after using HERC and fixing the failed cases.
Compresses a comparison workflow that previously needed 3 commands and 9 lines of ad hoc logic into 1 command; the repeated-rerun median time reduction is 60.8% with only 0.5% CV.
这些结果来自本地 darwin arm64 / Node v25.6.1 测试环境,主要用于说明工作流结构和相对执行成本,不应被当作所有机器上的统一性能保证。
These results were measured on a local darwin arm64 / Node v25.6.1 environment. They are intended to show workflow shape and relative execution cost, not a universal performance guarantee for every machine.
CLI Surface / CLI 能力一览
Command / 命令
中文
English
init
初始化 repo-local 工作区
Initialize the repo-local workspace
doctor
检查工作区、schema 和环境状态,支持 --quick 预检
Validate workspace, schema, and environment health, with --quick for fast preflight checks
import
从文件、stdin 和结构化格式导入失败
Import failures from files, stdin, and structured formats
distill
从 incident 生成 draft cases
Generate draft cases from incidents
create-case
不经 import 直接手工建 case
Create a case manually without import first
accept
标记 reviewed,并可写 baseline
Mark a case as reviewed and optionally write a baseline
set-status
切换 active、draft、muted、archived
Change a case to active, draft, muted, or archived
run
运行 gate,支持 --changed、--fail-on 等选项
Run gates with options such as --changed and --fail-on
report
读取最新结果,并可和上一次 run 做对比
Read the latest result and optionally compare it with the previous run
Every command supports --json, which makes shell scripts, CI, and GitHub Actions straightforward. run also supports options such as --changed, --strict-skips, --stdin-response, --response-file, and --fail-on, which helps it fit existing engineering workflows.
npm ci
npm run build
node dist/cli.js run --profile standard --fail-on failed,invalid,skipped
node dist/cli.js report --compare-previous
仓库内附带了一套 GitHub Repo Launch Kit。它可以生成高质量 README、社区文件、CI、docs 骨架和仓库结构模板,也可以为组织级 .github 仓库生成 community health files。
The repository ships with a built-in GitHub Repo Launch Kit. It can generate high-quality README files, community files, CI, docs skeletons, and repository structure starters, and it can also generate community health files for an organization-level .github repository.
npm run repo-kit:scaffold -- --meta github-repo-launch-kit/project.meta.example.json --target /tmp/my-repo --dry-run
Contributions are welcome across bug fixes, stronger tests, CLI ergonomics, benchmark improvements, and documentation. Before opening a PR, run the check below at minimum.