feat(audit): add `ado-aw audit <build-id-or-url>` command by jamesadevine · Pull Request #691 · githubnext/ado-aw

jamesadevine · 2026-05-21T15:01:47Z

`feat(audit)`: add `ado-aw audit <build-id-or-url>` command

ADO-side counterpart to gh aw audit. Single-run audit only in this MVP: download a build's artifacts, run every analyzer (firewall, MCP gateway, OTel, safe outputs, detection verdict, build timeline, missing tools/data/noops), and emit a Markdown or JSON report.

What ships

New src/audit/ module tree:

File	Role
`model.rs`	`AuditData` — top-level JSON contract. Drift-compatible with gh-aw's shape; adds ADO-specific `detection_analysis`, `safe_output_execution`, `rejected_safe_outputs` sections.
`url.rs`	Parses bare build IDs, `dev.azure.com` URLs, legacy `*.visualstudio.com` URLs, on-prem Azure DevOps Server URLs (with optional `&j=`/`&t=`/`&s=` job/step anchors).
`cache.rs`	CLI-version-keyed `<output>/build-<id>/run-summary.json` with atomic temp-file + rename writes.
`analyzers/firewall.rs`	AWF Squid proxy logs → per-domain stats (allowed/denied/mixed).
`analyzers/policy.rs`	AWF `policy-manifest.json` + `audit.jsonl` → rule hit counts.
`analyzers/mcp.rs`	MCPG NDJSON → tool usage, server health (`unreliable` flagging), failures.
`analyzers/otel.rs`	Copilot OTel + `aw_info.json` → metrics + engine config.
`analyzers/safe_outputs.rs`	Joins proposals + detection verdict + execution log keyed on `context`.
`analyzers/detection.rs`	`threat-analysis.json` → DetectionAnalysis.
`analyzers/missing.rs`	Missing-tool / missing-data / noop NDJSON entries.
`analyzers/jobs.rs`	ADO `/timeline` REST → `JobData[]`.
`findings.rs`	8 heuristic rules emitting severity-rated findings + recommendations.
`render/console.rs`	Markdown-style terminal renderer (section ordering mirrors gh-aw).
`render/json.rs`	Stable JSON contract for tooling.
`cli.rs`	Orchestration: URL parse → auth resolve → metadata fetch → artifact download → analyzers → findings → cache → render.

Pipeline-side runtime additions (so ado-aw audit of an existing build has the inputs it needs):

All four src/data/*-base.yml templates emit staging/aw_info.json at runtime (engine, model, agent name, source, target, version, build context). Generated by an extension to AdoAwMarkerExtension.
src/execute.rs writes per-item safe-outputs-executed.ndjson in <output-dir> so the audit can trace proposed → detection → executed per safe output.

CLI surface

ado-aw audit <build-id-or-url>
  -o, --output <dir>         # default ./logs (matches gh-aw operator muscle memory)
  --json                     # emit AuditData as JSON to stdout
  --org <url>                # ADO context overrides; auto-detected from git remote
  --project <name>
  --pat <token>              # also reads AZURE_DEVOPS_EXT_PAT
  --artifacts <agent,detection,safe-outputs>
  --no-cache

Unified rejection trace

When the aggregate THREAT_DETECTION_RESULT has any threat flag set, every proposed safe output lands in safe_output_execution[*].status = not_processed_due_to_aggregate_gate, carries the aggregate reasons[] (annotated applies_to_whole_batch: true), and exactly one severity-high KeyFinding is emitted summarizing which threat flags fired and how many proposals were dropped. A top-level rejected_safe_outputs rollup mirrors the same info for --json consumers.

The threat-analysis prompt itself is unchanged — it's identical to gh-aw's today, and per-item verdicts will be coordinated upstream rather than forked.

Dependencies

zip — unpack downloaded ADO PipelineArtifacts.
wiremock (dev only) — fake ADO REST server for the integration tests.

Tests

80 new audit unit tests across model, url, cache, analyzers, findings, renderers.
3 new integration tests (tests/audit_it.rs) against a fake REST server: happy path, permission-denied, cache hit.
Existing test suite untouched. 1740 tests pass total.

Docs

New docs/audit.md — accepted URL formats, flag table, output layout, AuditData shape, cache behavior, permission-failure UX, out-of-scope follow-ups.
docs/cli.md — new audit subcommand block.
README.md — one-line CLI entry.
AGENTS.md index — pointer to docs/audit.md under "Compiler internals & operations".
prompts/debug-ado-agentic-workflow.md — Step 1 first-move callout, new Step 2a-prime (run ado-aw audit --json before raw MCP timeline/log calls), AuditData top-level-key reference table, jq-diff fallback note. create-/update- prompts intentionally untouched (post-run inspection is debug-flavored).

Validation

cargo build ✓
cargo test ✓ (1740 passed, 0 failed)
cargo clippy --all-targets --all-features ✓ (warnings only, all non-blocking style nits; no new errors)

Explicitly out of scope (recorded as follow-ups)

Diff mode (ado-aw audit <a> <b>)
Cross-run trends (ado-aw audit --last N)
--parse log.md / firewall.md renderers (Rust-native, no JS bundle)
Job/step-anchored audit (anchors are parsed but normalised to the parent build in this MVP)
MCP-exposed audit (agentic-pipelines MCP tool for in-pipeline self-audit)
Per-item detection verdict NDJSON (coordinated upstream with gh-aw)
Partial-approval gating that consumes it
AWF policy-manifest plumbing
AWF firewall token-usage.jsonl opt-in
audit-manifest.json build inventory

Each is recorded in the session plan under "Out-of-scope follow-ups" so they're not lost.

Single-run audit: download a build's artifacts, run every analyzer (firewall, MCP gateway, OTel, safe outputs, detection verdict, build timeline, missing tools/data/noops), and emit a Markdown or JSON report. ADO-side counterpart to `gh aw audit`. New module tree under `src/audit/`: - `model.rs` — `AuditData` (drift-compatible with gh-aw's top-level contract; adds ADO-specific `detection_analysis`, `safe_output_execution`, `rejected_safe_outputs` sections). - `url.rs` — parses bare IDs, dev.azure.com URLs, legacy visualstudio.com URLs, and on-prem Azure DevOps Server URLs (with optional `&j=`/`&t=`/`&s=` job/step anchors). - `cache.rs` — CLI-version-keyed `run-summary.json` with atomic writes. - `analyzers/{firewall,policy,mcp,otel,safe_outputs,detection,missing,jobs}.rs` — eight defensive NDJSON/REST analyzers. - `findings.rs` — eight heuristic rules emitting severity-rated findings + recommendations. - `render/{console,json}.rs` — two renderers; JSON shape is the public contract. - `cli.rs` — orchestration: URL parse → auth → metadata fetch → artifact download → analyzers → findings → cache → render. Unified rejection trace: when the aggregate `THREAT_DETECTION_RESULT` has any threat flag set, every proposal lands in `not_processed_due_to_aggregate_gate` carrying the aggregate `reasons[]`, exactly one severity-`high` `KeyFinding` is emitted, and a `rejected_safe_outputs` rollup appears at the top level. Pipeline-side runtime additions (so an `ado-aw audit` of an existing build has the data it needs): - `src/data/*-base.yml` (via `AdoAwMarkerExtension`): emits `staging/aw_info.json` at runtime with engine, model, agent name, source path, target, compiler version, and ADO build context. - `src/execute.rs`: writes a per-item `safe-outputs-executed.ndjson` in `<output-dir>` so the audit can show the proposed → detection → executed trace. CLI surface: ado-aw audit <build-id-or-url> -o, --output <dir> # default ./logs --json --org / --project / --pat --artifacts <agent,detection,safe-outputs> --no-cache New dependencies: `zip` (artifact unpack), `wiremock` (dev only — integration test mock server). Tests: 80 new audit unit tests + 3 integration tests against a fake ADO REST server (happy path, permission-denied, cache hit) using a thin `ADO_AW_TEST_ORG_URL` test seam. 1740 total tests pass. Docs: new `docs/audit.md`; updates to `docs/cli.md`, `README.md`, `AGENTS.md` index, and `prompts/debug-ado-agentic-workflow.md` (Step 1 first-move + new Step 2a-prime + `AuditData` reference + jq-diff fallback). Out of scope (explicit follow-ups): diff mode, cross-run trends, `--parse` log.md/firewall.md, job/step-anchored audit, MCP-exposed audit, per-item detection verdict (upstream coordination with gh-aw), partial-approval gating, AWF policy-manifest plumbing, AWF token-usage.jsonl, `audit-manifest.json` build inventory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-21T15:07:25Z

🔍 Rust PR Review

Summary: Well-structured feature addition with good error handling overall, but three actionable issues worth fixing before merge.

Findings

🐛 Bugs / Logic Issues

src/audit/analyzers/detection.rs – find_verdict_path lexicographic sort is wrong for multi-digit run IDs

The function picks the "lexicographically last" analyzed_outputs_* directory using plain string comparison:
```
match &latest_dir {
    Some((current_name, _)) if name <= *current_name => {}
    _ => latest_dir = Some((name, path)),
}
```
"analyzed_outputs_9" sorts after "analyzed_outputs_10" because '9' > '1'. In any build where both exist (e.g. a retry where ADO incremented the numeric suffix), the wrong — older — verdict would be read and the run could be mis-classified as safe.

Fix: parse the trailing integer and compare numerically, or zero-pad during comparison:
```
fn suffix_number(name: &str) -> u64 {
    name.rsplit('_').next().and_then(|s| s.parse().ok()).unwrap_or(0)
}
// then compare suffix_number(&name) vs suffix_number(current_name)
```
The same pattern is used in safe_outputs.rs and jobs.rs (all rely on matches.sort() / lexicographic-last on directory names). All three need the same fix.

🔒 Security Concerns

src/ado/mod.rs – ADO_AW_TEST_ORG_URL override is always active in production

apply_test_org_url_override is a #[doc(hidden)] function, but it is called unconditionally from resolve_ado_context — not gated by #[cfg(test)]. Any user or CI environment that happens to have ADO_AW_TEST_ORG_URL set (e.g. a leftover env var from a previous debugging session) will silently redirect all ADO API calls — build metadata fetches, artifact lists, pipeline updates — to an attacker-controlled URL.

Either gate it:
```
#[cfg(test)]
fn apply_test_org_url_override(ctx: &mut AdoContext) { ... }
#[cfg(not(test))]
fn apply_test_org_url_override(_: &mut AdoContext) {}
```
Or document the env var in docs/cli.md as a supported override so operators can reason about it. The current state (always active, hidden from docs) is the worst of both worlds.

⚠️ Suggestions

src/audit/analyzers/safe_outputs.rs – synchronous std::fs::read_dir called from async context

top_level_dirs_with_prefix and collect_named_files use std::fs::read_dir (blocking), but they are called from async fn analyze_safe_outputs. On a Tokio multi-thread runtime this blocks an executor thread for the duration of the directory walk. The rest of the codebase (and the other analyzers in this PR) consistently use tokio::fs. These two helpers should be converted to async fn with tokio::fs::read_dir, or wrapped with tokio::task::spawn_blocking.

✅ What Looks Good

Zip extraction in ado/mod.rs correctly uses entry.enclosed_name() to reject path-traversal entries — this is the right approach and is well-tested.
append_execution_record swallows its own errors (logged at warn!) so audit-log failures never abort Stage 3 execution — correct design for a diagnostic side-channel.
cache.rs atomic temp-file + rename pattern is correct; the version-keyed cache invalidation is clean.
Error messages on 401/403 include the exact PAT scopes needed and a manual az pipelines fallback command — great UX.
80 new unit tests + 3 integration tests against a fake REST server is solid coverage for a feature this size.

Generated by Rust PR Reviewer for issue #691 · ● 10.3M · ◷

Three issues raised by the Rust PR Reviewer on #691: 1. **Lexicographic sort wrong for multi-digit run IDs.** Previously `find_artifact_dir` / `find_verdict_path` / `top_level_dirs_with_prefix` picked the "lexicographically last" `<prefix>_<id>` directory, which sorts `_9` after `_10` (because `'9' > '1'`). On a build retry that produced both `analyzed_outputs_9` and `analyzed_outputs_10`, the older verdict would be read and the run could be mis-classified as safe. New `crate::audit::cmp_numeric_suffix` extracts the trailing token after the final `_`, parses it as `u64`, and compares numerically with a lexicographic tie-breaker for non-numeric suffixes. All three call sites now use it. Regression tests added in mod.rs, detection.rs, and cli.rs. 2. **Security: `ADO_AW_TEST_ORG_URL` was always active in production.** The override was `#[doc(hidden)]` but not gated by build mode, so a stray env var (debugging leftover, hostile CI environment) could silently redirect ADO REST calls to an attacker-controlled URL in a release binary. Gated on `cfg(debug_assertions)`: debug builds (`cargo test`, `cargo run`) keep the override AND emit a loud `warn!` on every invocation; release builds (all published artifacts via `cargo build --release`) replace the body with a no-op so a stray env var has no effect. The integration test in `tests/audit_it.rs` continues to work because `cargo test` builds in debug mode. 3. **Blocking `std::fs::read_dir` in async context.** `safe_outputs.rs` had two helpers (`top_level_dirs_with_prefix`, `collect_named_files`) using sync I/O from inside `async fn analyze_safe_outputs`. On a Tokio multi-thread runtime this blocks an executor thread for the duration of the directory walk. Both helpers converted to `async fn` using `tokio::fs::read_dir`. The recursive `collect_named_files` uses `Box::pin` to satisfy the async-recursion shape (consistent with the existing pattern in `crate::detect::scan_directory`). Tests: 1745 unit tests + 3 integration tests pass (up from 1740 — 5 new regression tests for the numeric-suffix bug). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-21T16:55:46Z

🔍 Rust PR Review

Summary: Looks good overall — well-structured module, solid test coverage, good error handling patterns throughout. A few specific concerns worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

src/execute.rs:312-316 — is_budget_exhausted relies on brittle string matching
```
fn is_budget_exhausted(result: &ExecutionResult) -> bool {
    !result.success
        && result.message.starts_with("Skipped")
        && result.message.contains("maximum ")
        && result.message.contains("already reached")
}
```
This function drives whether the audit NDJSON gets status: "budget_exhausted" vs status: "failed". It cross-checks the human-readable message generated in enforce_budget (line 616: "Skipped{}: maximum {} count ({}) already reached..."). If that message is ever rephrased — e.g. changing "maximum" to "max", adding a prefix — budget-exhausted entries silently become "failed" in every audit log with no compile-time warning.

The right fix is to add a structural flag to ExecutionResult (e.g. budget_exhausted: bool) set in enforce_budget, and key off that in execution_record_status. This makes the intent explicit and refactor-safe.
src/execute.rs:320 — execution_record_status maps is_warning() → "skipped"
```
} else if result.is_warning() {
    "skipped"
}
```
is_warning() is true for tools like noop and missing-tool that succeed with a warning (no ADO credentials). These show up as status: "skipped" in the audit log, which is semantically misleading — they ran successfully, they just couldn't persist a result. "warning" or "no_op" would be more accurate and avoids confusion with the actual budget-skip case.

⚠️ Suggestions

src/audit/url.rs:1 — #![allow(dead_code)] on the whole file

The file-level attribute silences all unused-item warnings. If this was added to suppress specific MVP-incomplete items (e.g. job_id/step_id on ParsedBuildRef which are parsed but not yet consumed), prefer targeted #[allow(dead_code)] on just those fields so future dead code in the file doesn't go undetected.
src/audit/cache.rs — fs::rename atomicity on Windows

The atomic temp-write + rename pattern is correct on Linux. Worth a comment noting this is intentionally Linux-only — fs::rename is not atomic on Windows when the destination file exists. Not a bug for ADO hosted agents, just a future-proofing note.
src/compile/extensions/ado_aw_marker.rs — unwrap() in marker_json / aw_info_json
```
fn aw_info_json(&self) -> String {
    serde_json::to_string(&serde_json::json!({...})).unwrap()
}
```
These are provably safe (serializing a json!({}) literal never fails), but per the project style they deserve a // infallible: serializing a static json! literal never fails comment to make that explicit.

✅ What Looks Good

Atomic cache writes (temp → rename → cleanup on failure) correctly implemented in cache.rs.
The heredoc in ado_aw_marker.rs uses a single-quoted delimiter (<<'AW_INFO_EOF') preventing ADO variable expansion in the JSON content. The 4-space YAML indentation strips cleanly to leave the terminator at column 0 after YAML processing — correct.
append_execution_record correctly swallows I/O errors with warn! rather than propagating — Stage 3 audit logging must never abort the actual execution.
Detection-gate rejection path in analyzers/safe_outputs.rs uses expect() only behind the gate_fired guard, making the invariant clear.
wiremock-based integration tests in tests/audit_it.rs are the right approach — no real network calls in CI.
EXECUTED_NDJSON_FILENAME shared via ndjson:: module rather than duplicated as string literals across execute.rs and the audit analyzers.

Generated by Rust PR Reviewer for issue #691 · ● 11.3M · ◷

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audit): add `ado-aw audit <build-id-or-url>` command#691

feat(audit): add `ado-aw audit <build-id-or-url>` command#691
jamesadevine wants to merge 2 commits into
mainfrom
feat/audit-command

jamesadevine commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesadevine commented May 21, 2026

feat(audit): add ado-aw audit <build-id-or-url> command

What ships

CLI surface

Unified rejection trace

Dependencies

Tests

Docs

Validation

Explicitly out of scope (recorded as follow-ups)

Uh oh!

github-actions Bot commented May 21, 2026

🔍 Rust PR Review

Findings

🐛 Bugs / Logic Issues

🔒 Security Concerns

⚠️ Suggestions

✅ What Looks Good

Uh oh!

github-actions Bot commented May 21, 2026

🔍 Rust PR Review

Findings

🐛 Bugs / Logic Issues

⚠️ Suggestions

✅ What Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`feat(audit)`: add `ado-aw audit <build-id-or-url>` command