feat(audit): add ado-aw audit <build-id-or-url> command#691
feat(audit): add ado-aw audit <build-id-or-url> command#691jamesadevine wants to merge 2 commits into
ado-aw audit <build-id-or-url> command#691Conversation
Single-run audit: download a build's artifacts, run every analyzer
(firewall, MCP gateway, OTel, safe outputs, detection verdict, build
timeline, missing tools/data/noops), and emit a Markdown or JSON
report. ADO-side counterpart to `gh aw audit`.
New module tree under `src/audit/`:
- `model.rs` — `AuditData` (drift-compatible with gh-aw's top-level
contract; adds ADO-specific `detection_analysis`,
`safe_output_execution`, `rejected_safe_outputs` sections).
- `url.rs` — parses bare IDs, dev.azure.com URLs, legacy
visualstudio.com URLs, and on-prem Azure DevOps Server URLs (with
optional `&j=`/`&t=`/`&s=` job/step anchors).
- `cache.rs` — CLI-version-keyed `run-summary.json` with atomic writes.
- `analyzers/{firewall,policy,mcp,otel,safe_outputs,detection,missing,jobs}.rs`
— eight defensive NDJSON/REST analyzers.
- `findings.rs` — eight heuristic rules emitting severity-rated
findings + recommendations.
- `render/{console,json}.rs` — two renderers; JSON shape is the
public contract.
- `cli.rs` — orchestration: URL parse → auth → metadata fetch →
artifact download → analyzers → findings → cache → render.
Unified rejection trace: when the aggregate `THREAT_DETECTION_RESULT`
has any threat flag set, every proposal lands in
`not_processed_due_to_aggregate_gate` carrying the aggregate
`reasons[]`, exactly one severity-`high` `KeyFinding` is emitted, and a
`rejected_safe_outputs` rollup appears at the top level.
Pipeline-side runtime additions (so an `ado-aw audit` of an existing
build has the data it needs):
- `src/data/*-base.yml` (via `AdoAwMarkerExtension`): emits
`staging/aw_info.json` at runtime with engine, model, agent name,
source path, target, compiler version, and ADO build context.
- `src/execute.rs`: writes a per-item `safe-outputs-executed.ndjson`
in `<output-dir>` so the audit can show the proposed → detection →
executed trace.
CLI surface:
ado-aw audit <build-id-or-url>
-o, --output <dir> # default ./logs
--json
--org / --project / --pat
--artifacts <agent,detection,safe-outputs>
--no-cache
New dependencies: `zip` (artifact unpack), `wiremock` (dev only —
integration test mock server).
Tests: 80 new audit unit tests + 3 integration tests against a fake
ADO REST server (happy path, permission-denied, cache hit) using a
thin `ADO_AW_TEST_ORG_URL` test seam. 1740 total tests pass.
Docs: new `docs/audit.md`; updates to `docs/cli.md`, `README.md`,
`AGENTS.md` index, and `prompts/debug-ado-agentic-workflow.md` (Step 1
first-move + new Step 2a-prime + `AuditData` reference + jq-diff
fallback).
Out of scope (explicit follow-ups): diff mode, cross-run trends,
`--parse` log.md/firewall.md, job/step-anchored audit, MCP-exposed
audit, per-item detection verdict (upstream coordination with gh-aw),
partial-approval gating, AWF policy-manifest plumbing, AWF
token-usage.jsonl, `audit-manifest.json` build inventory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Well-structured feature addition with good error handling overall, but three actionable issues worth fixing before merge. Findings🐛 Bugs / Logic Issues
🔒 Security Concerns
|
Three issues raised by the Rust PR Reviewer on #691: 1. **Lexicographic sort wrong for multi-digit run IDs.** Previously `find_artifact_dir` / `find_verdict_path` / `top_level_dirs_with_prefix` picked the "lexicographically last" `<prefix>_<id>` directory, which sorts `_9` after `_10` (because `'9' > '1'`). On a build retry that produced both `analyzed_outputs_9` and `analyzed_outputs_10`, the older verdict would be read and the run could be mis-classified as safe. New `crate::audit::cmp_numeric_suffix` extracts the trailing token after the final `_`, parses it as `u64`, and compares numerically with a lexicographic tie-breaker for non-numeric suffixes. All three call sites now use it. Regression tests added in mod.rs, detection.rs, and cli.rs. 2. **Security: `ADO_AW_TEST_ORG_URL` was always active in production.** The override was `#[doc(hidden)]` but not gated by build mode, so a stray env var (debugging leftover, hostile CI environment) could silently redirect ADO REST calls to an attacker-controlled URL in a release binary. Gated on `cfg(debug_assertions)`: debug builds (`cargo test`, `cargo run`) keep the override AND emit a loud `warn!` on every invocation; release builds (all published artifacts via `cargo build --release`) replace the body with a no-op so a stray env var has no effect. The integration test in `tests/audit_it.rs` continues to work because `cargo test` builds in debug mode. 3. **Blocking `std::fs::read_dir` in async context.** `safe_outputs.rs` had two helpers (`top_level_dirs_with_prefix`, `collect_named_files`) using sync I/O from inside `async fn analyze_safe_outputs`. On a Tokio multi-thread runtime this blocks an executor thread for the duration of the directory walk. Both helpers converted to `async fn` using `tokio::fs::read_dir`. The recursive `collect_named_files` uses `Box::pin` to satisfy the async-recursion shape (consistent with the existing pattern in `crate::detect::scan_directory`). Tests: 1745 unit tests + 3 integration tests pass (up from 1740 — 5 new regression tests for the numeric-suffix bug). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good overall — well-structured module, solid test coverage, good error handling patterns throughout. A few specific concerns worth addressing before merge. Findings🐛 Bugs / Logic Issues
|
feat(audit): addado-aw audit <build-id-or-url>commandADO-side counterpart to
gh aw audit. Single-run audit only in this MVP: download a build's artifacts, run every analyzer (firewall, MCP gateway, OTel, safe outputs, detection verdict, build timeline, missing tools/data/noops), and emit a Markdown or JSON report.What ships
New
src/audit/module tree:model.rsAuditData— top-level JSON contract. Drift-compatible with gh-aw's shape; adds ADO-specificdetection_analysis,safe_output_execution,rejected_safe_outputssections.url.rsdev.azure.comURLs, legacy*.visualstudio.comURLs, on-prem Azure DevOps Server URLs (with optional&j=/&t=/&s=job/step anchors).cache.rs<output>/build-<id>/run-summary.jsonwith atomic temp-file + rename writes.analyzers/firewall.rsanalyzers/policy.rspolicy-manifest.json+audit.jsonl→ rule hit counts.analyzers/mcp.rsunreliableflagging), failures.analyzers/otel.rsaw_info.json→ metrics + engine config.analyzers/safe_outputs.rscontext.analyzers/detection.rsthreat-analysis.json→ DetectionAnalysis.analyzers/missing.rsanalyzers/jobs.rs/timelineREST →JobData[].findings.rsrender/console.rsrender/json.rscli.rsPipeline-side runtime additions (so
ado-aw auditof an existing build has the inputs it needs):src/data/*-base.ymltemplates emitstaging/aw_info.jsonat runtime (engine, model, agent name, source, target, version, build context). Generated by an extension toAdoAwMarkerExtension.src/execute.rswrites per-itemsafe-outputs-executed.ndjsonin<output-dir>so the audit can traceproposed → detection → executedper safe output.CLI surface
Unified rejection trace
When the aggregate
THREAT_DETECTION_RESULThas any threat flag set, every proposed safe output lands insafe_output_execution[*].status = not_processed_due_to_aggregate_gate, carries the aggregatereasons[](annotatedapplies_to_whole_batch: true), and exactly one severity-highKeyFindingis emitted summarizing which threat flags fired and how many proposals were dropped. A top-levelrejected_safe_outputsrollup mirrors the same info for--jsonconsumers.The threat-analysis prompt itself is unchanged — it's identical to gh-aw's today, and per-item verdicts will be coordinated upstream rather than forked.
Dependencies
zip— unpack downloaded ADO PipelineArtifacts.wiremock(dev only) — fake ADO REST server for the integration tests.Tests
tests/audit_it.rs) against a fake REST server: happy path, permission-denied, cache hit.Docs
docs/audit.md— accepted URL formats, flag table, output layout,AuditDatashape, cache behavior, permission-failure UX, out-of-scope follow-ups.docs/cli.md— newauditsubcommand block.README.md— one-line CLI entry.AGENTS.mdindex — pointer todocs/audit.mdunder "Compiler internals & operations".prompts/debug-ado-agentic-workflow.md— Step 1 first-move callout, new Step 2a-prime (runado-aw audit --jsonbefore raw MCP timeline/log calls),AuditDatatop-level-key reference table, jq-diff fallback note.create-/update-prompts intentionally untouched (post-run inspection is debug-flavored).Validation
cargo build✓cargo test✓ (1740 passed, 0 failed)cargo clippy --all-targets --all-features✓ (warnings only, all non-blocking style nits; no new errors)Explicitly out of scope (recorded as follow-ups)
ado-aw audit <a> <b>)ado-aw audit --last N)--parselog.md / firewall.md renderers (Rust-native, no JS bundle)agentic-pipelinesMCP tool for in-pipeline self-audit)audit-manifest.jsonbuild inventoryEach is recorded in the session plan under "Out-of-scope follow-ups" so they're not lost.