docs: add integration test coverage guide with gap analysis#1036
docs: add integration test coverage guide with gap analysis#1036
Conversation
Comprehensive analysis of all integration tests, CI workflows, and smoke tests to improve visibility into what's covered and what's missing. Key findings: - Most integration tests (20+ files) don't run in CI - --block-domains and --env-all are completely untested - DNS restriction enforcement is unverified - Package manager tests only query registries, never install Includes 6 detailed analysis documents covering domain/network, chroot, protocol/security, container/ops, CI/smoke, and test infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build Test: Bun ✅
Overall: PASS 🎉 Bun version:
|
🟢 Node.js Build Test Results
Overall: PASS
|
|
Smoke Test Results — PASS
|
Rust Build Test Results
Overall: PASS ✅
|
Smoke Test Results — Copilot Engine✅ GitHub MCP: Last 2 merged PRs: #1033 Overall: PASS | PR author:
|
Deno Build Test Results
Overall: ✅ PASS Deno version: 2.7.1
|
Go Build Test Results ✅
Overall: PASS
|
C++ Build Test Results
Overall: PASS
|
.NET Build Test Results
Overall: ✅ PASS Run outputhello-world: json-parse:
|
|
chore: remove smoke-gemini workflow | fix: always set NO_PROXY to bypass Squid for localhost
|
There was a problem hiding this comment.
Pull request overview
Adds a new documentation set describing the repository’s integration-test/CI coverage, including a heat map and detailed per-area gap analysis, and links it from CLAUDE.md so contributors can find it.
Changes:
- Add
docs/INTEGRATION-TESTS.mdas a central integration test coverage guide with prioritized gaps/recommendations. - Add six deep-dive “test-analysis” documents covering major integration-test areas and CI/workflows.
- Link the new guide from
CLAUDE.md.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/INTEGRATION-TESTS.md | New top-level integration test coverage guide (heat map + key gaps). |
| docs/test-analysis/domain-network.md | Analysis of domain/network integration tests and coverage gaps. |
| docs/test-analysis/chroot.md | Analysis of chroot integration tests and coverage gaps. |
| docs/test-analysis/protocol-security.md | Analysis of protocol/security integration tests and gaps. |
| docs/test-analysis/container-ops.md | Analysis of container/ops integration tests and gaps. |
| docs/test-analysis/ci-smoke.md | Inventory/analysis of CI + smoke/build-test workflows and gaps. |
| docs/test-analysis/test-infra.md | Analysis of test fixtures/runner/cleanup and workflow postprocessing. |
| CLAUDE.md | Adds a link to the new integration test coverage guide. |
Comments suppressed due to low confidence (1)
docs/test-analysis/ci-smoke.md:237
- This section references a
smoke-gemini.lock.ymlworkflow andsmoke-gemini.mdsource, but neither file exists under.github/workflows/in the repo. The doc should either remove Gemini from the smoke-test list or add the missing workflow/source files so the documentation reflects reality.
### 11. `smoke-gemini.lock.yml` — Smoke Gemini
**Source**: `smoke-gemini.md`
| Attribute | Value |
|-----------|-------|
| **What it tests** | Gemini engine with same extended tool suite as Codex smoke test |
| **Engine** | `gemini` |
| **Triggers** | Every 12h, PR, manual dispatch |
| **Timeout** | 15 minutes |
| **Real-world mapping** | Validates Gemini (Google) engine works through AWF — important for multi-engine support |
| **Gaps** | Same as Codex. Identical test requirements — could share test definition via imports. |
| **Integration test relationship** | Same as Codex — tests a different engine path through the same infrastructure |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
docs/test-analysis/ci-smoke.md
Outdated
| | **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 13 workflows | Real AI agent execution inside AWF sandbox | | ||
| | **CI** | Hand-written GitHub Actions (.yml) | 12 workflows | Build, lint, type-check, security, coverage | |
There was a problem hiding this comment.
The workflow counts in the tier table don't match the repository: there are 28 .lock.yml workflows in .github/workflows/ (not 13) and 15 hand-written .yml workflows (not 12). Consider either updating the counts or clarifying that this row is only smoke+build-test workflows rather than all .lock.yml workflows.
This issue also appears on line 225 of the same file.
| | **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 13 workflows | Real AI agent execution inside AWF sandbox | | |
| | **CI** | Hand-written GitHub Actions (.yml) | 12 workflows | Build, lint, type-check, security, coverage | | |
| | **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 28 workflows | Real AI agent execution inside AWF sandbox | | |
| | **CI** | Hand-written GitHub Actions (.yml) | 15 workflows | Build, lint, type-check, security, coverage | |
docs/test-analysis/test-infra.md
Outdated
| | Remove `depth: 1` shallow clone | Full checkout needed | | ||
| | Replace `--image-tag X --skip-pull` with `--build-local` | Use locally-built container images | | ||
|
|
||
| Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images. |
There was a problem hiding this comment.
postprocess-smoke-workflows.ts currently enumerates 29 workflow paths (5 smoke + 8 build-test + 13 agentic + 3 secret-digger), so "Processes 30+" is inaccurate. Also, the script expects .github/workflows/smoke-gemini.lock.yml, which is missing—worth calling out here or updating the text once the missing workflow is added.
| Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images. | |
| Processes 29 workflow files (5 smoke, 8 build-test, 13 agentic, 3 secret-digger) across the suite. Note: the script currently expects `.github/workflows/smoke-gemini.lock.yml`; if that workflow does not exist, either add it or update the script to keep the expected workflow list in sync. Ensures CI tests use the current source code rather than stale published images. |
| 3. **Real-world attack simulation**: The credential hiding tests simulate actual exfiltration attacks (base64, xxd, grep patterns). | ||
| 4. **Custom matchers**: The `toSucceed()`, `toFail()`, `toExitWithCode()` matchers provide clear, readable assertions. | ||
| 5. **Bypass prevention**: Tests specifically cover the chroot bypass vulnerability (Test 8) that was previously discovered and fixed. | ||
| 6. **Comprehensive API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation. |
There was a problem hiding this comment.
The API proxy tests do cover all three providers for healthchecks/env wiring, but end-to-end request routing is only tested for Anthropic (there’s no equivalent routing test for OpenAI or Copilot). This bullet reads stronger than the actual coverage; suggest rewording to avoid implying full routing/isolation verification for all three providers.
| 6. **Comprehensive API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation. | |
| 6. **Broad API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) for healthchecks and env wiring; end-to-end request routing and credential isolation are currently verified in depth only for Anthropic. |
docs/INTEGRATION-TESTS.md
Outdated
| │ Smoke Tests (5 workflows) │ | ||
| │ Real AI agents (Claude, Copilot, Codex, Gemini) │ |
There was a problem hiding this comment.
This overview box and the later "Smoke Tests" count assume 5 smoke workflows including Gemini, but the repo currently only has 4 smoke workflows (smoke-claude, smoke-copilot, smoke-codex, smoke-chroot). Either adjust the counts/engine list here or add the missing Gemini workflow so the guide stays accurate.
| │ Smoke Tests (5 workflows) │ | |
| │ Real AI agents (Claude, Copilot, Codex, Gemini) │ | |
| │ Smoke Tests (4 workflows) │ | |
| │ Smoke workflows (claude, copilot, codex, chroot) │ |
docs/INTEGRATION-TESTS.md
Outdated
| Dependency audit ❌ ❌ ✅ ❌ ❌ | ||
|
|
||
| * ⚠️ = Tests exist but have significant gaps (see detailed docs) | ||
| ** = Tests exist but are skip'd |
There was a problem hiding this comment.
Minor wording: "skip'd" is nonstandard in documentation; "skipped" would be clearer and more professional.
| ** = Tests exist but are skip'd | |
| ** = Tests exist but are skipped |
- Fix workflow counts (28 lock.yml, 15 hand-written) in ci-smoke.md - Fix postprocess script count (29 files) and note missing smoke-gemini - Clarify API proxy coverage (routing tested only for Anthropic) - Fix smoke test count from 5 to 4 (Gemini workflow was removed) - Fix "skip'd" to "skipped" - Remove gaps/action items from INTEGRATION-TESTS.md (moved to #1039) - Keep INTEGRATION-TESTS.md as pure coverage reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build Test: Bun Results
Overall: ✅ PASS
|
|
🤖 Smoke test results for
Overall: PASS
|
Go Build Test Results ✅
Overall: PASS
|
.NET Build Test Results
Overall: PASS Run outputhello-world:
|
C++ Build Test Results
Overall: PASS
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Build Test: Node.js Results
Overall: PASS ✅
|
🦕 Deno Build Test Results
Overall: ✅ PASS
|
|
Smoke test results
Overall: PASS
|
|
Test Results:
|
Java Build Test Results
Overall: PASS ✅ All Maven projects compiled and tests passed successfully via Squid proxy.
|
Summary
docs/INTEGRATION-TESTS.md) with a heat map of what's tested vs. not, critical gaps identified, and prioritized recommendationsdocs/test-analysis/covering domain/network, chroot, protocol/security, container/ops, CI/smoke workflows, and test infrastructureCLAUDE.mddocumentation sectionKey Findings
Critical gaps discovered:
test-integration.ymlis actually just a type-check — 20+ integration test files have no CI pipeline--block-domainsflag is completely untested (file is a misnomer)--env-all(the primary production mode) has zero testsgit pushwith authentication is untesteddescribe.skip'dFiles
docs/INTEGRATION-TESTS.mddocs/test-analysis/domain-network.mddocs/test-analysis/chroot.mddocs/test-analysis/protocol-security.mddocs/test-analysis/container-ops.mddocs/test-analysis/ci-smoke.mddocs/test-analysis/test-infra.mdCLAUDE.mdTest plan
🤖 Generated with Claude Code