docs: add integration test coverage guide with gap analysis by Mossaka · Pull Request #1036 · github/gh-aw-firewall

Mossaka · 2026-02-25T18:54:41Z

Summary

Adds a comprehensive integration test coverage guide (docs/INTEGRATION-TESTS.md) with a heat map of what's tested vs. not, critical gaps identified, and prioritized recommendations
Includes 6 detailed per-area analysis documents under docs/test-analysis/ covering domain/network, chroot, protocol/security, container/ops, CI/smoke workflows, and test infrastructure
Links the new guide from CLAUDE.md documentation section

Key Findings

Critical gaps discovered:

test-integration.yml is actually just a type-check — 20+ integration test files have no CI pipeline
--block-domains flag is completely untested (file is a misnomer)
--env-all (the primary production mode) has zero tests
DNS restriction enforcement is unverified
Package manager tests only query registries, never install packages
git push with authentication is untested
Docker warning tests are entirely describe.skip'd

Files

File	Description
`docs/INTEGRATION-TESTS.md`	Main guide with overview, heat map, gaps, priorities
`docs/test-analysis/domain-network.md`	Domain filtering, DNS, network security (6 test files)
`docs/test-analysis/chroot.md`	Chroot sandbox, languages, package managers (5 test files)
`docs/test-analysis/protocol-security.md`	Protocol support, credentials, tokens (8 test files)
`docs/test-analysis/container-ops.md`	Containers, volumes, git, env vars (7 test files)
`docs/test-analysis/ci-smoke.md`	All 27 CI/smoke/build-test workflows
`docs/test-analysis/test-infra.md`	Test runner, batch pattern, cleanup strategy
`CLAUDE.md`	Added link to new docs

Test plan

Verify all markdown links resolve correctly
Review gap analysis against current test files for accuracy
Confirm heat map reflects actual CI workflow configurations

🤖 Generated with Claude Code

Comprehensive analysis of all integration tests, CI workflows, and smoke tests to improve visibility into what's covered and what's missing. Key findings: - Most integration tests (20+ files) don't run in CI - --block-domains and --env-all are completely untested - DNS restriction enforcement is unverified - Package manager tests only query registries, never install Includes 6 detailed analysis documents covering domain/network, chroot, protocol/security, container/ops, CI/smoke, and test infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-02-25T18:58:31Z

Build Test: Bun ✅

Project	Install	Tests	Status
elysia	✅	1/1	PASS
hono	✅	1/1	PASS

Overall: PASS 🎉

Bun version: 1.3.9

Generated by Build Test Bun for issue #1036

github-actions · 2026-02-25T18:58:33Z

🟢 Node.js Build Test Results

Project	Install	Tests	Status
clsx	✅	All passed	PASS
execa	✅	All passed	PASS
p-limit	✅	All passed	PASS

Overall: PASS

Generated by Build Test Node.js for issue #1036

github-actions · 2026-02-25T18:58:39Z

Smoke Test Results — PASS

Test	Result
GitHub MCP (last 2 merged PRs: #1033 "chore: remove smoke-gemini workflow", #1032 "fix: always set NO_PROXY to bypass Squid for localhost")	✅
Playwright (github.com title contains "GitHub")	✅
File write (`smoke-test-claude-22411251082.txt`)	✅
Bash verify (cat file)	✅

💥 [THE END] — Illustrated by Smoke Claude for issue #1036

github-actions · 2026-02-25T18:58:43Z

Rust Build Test Results

Project	Build	Tests	Status
fd	✅	1/1	PASS
zoxide	✅	1/1	PASS

Overall: PASS ✅

Generated by Build Test Rust for issue #1036

github-actions · 2026-02-25T18:58:49Z

Smoke Test Results — Copilot Engine

✅ GitHub MCP: Last 2 merged PRs: #1033 chore: remove smoke-gemini workflow, #1032 fix: always set NO_PROXY to bypass Squid for localhost (both by @Mossaka)
✅ Playwright: https://github.com title contains "GitHub" ✓
✅ File Write: /tmp/gh-aw/agent/smoke-test-copilot-22411251036.txt created and verified
✅ Bash: cat confirmed file contents

Overall: PASS | PR author: @Mossaka | No assignees

📰 BREAKING: Report filed by Smoke Copilot for issue #1036

github-actions · 2026-02-25T18:58:59Z

Deno Build Test Results

Project	Tests	Status
oak	1/1	✅ PASS
std	1/1	✅ PASS

Overall: ✅ PASS

Deno version: 2.7.1

Generated by Build Test Deno for issue #1036

github-actions · 2026-02-25T18:59:01Z

Go Build Test Results ✅

Project	Download	Tests	Status
color	✅	PASS	PASS
env	✅	PASS	PASS
uuid	✅	PASS	PASS

Overall: PASS

Generated by Build Test Go for issue #1036

github-actions · 2026-02-25T18:59:01Z

C++ Build Test Results

Project	CMake	Build	Status
fmt	✅	✅	PASS
json	✅	✅	PASS

Overall: PASS

Generated by Build Test C++ for issue #1036

github-actions · 2026-02-25T18:59:14Z

.NET Build Test Results

Project	Restore	Build	Run	Status
hello-world	✅	✅	✅	PASS
json-parse	✅	✅	✅	PASS

Overall: ✅ PASS

Run output

hello-world: Hello, World!

json-parse:

{
  "Name": "AWF Test",
  "Version": 1,
  "Success": true
}
Name: AWF Test, Success: True

Generated by Build Test .NET for issue #1036

github-actions · 2026-02-25T19:00:20Z

chore: remove smoke-gemini workflow | fix: always set NO_PROXY to bypass Squid for localhost
Test 1 ✅
Test 2 ✅
Test 3 ✅
Test 4 ❌
Test 5 ✅
Test 6 ✅
Test 7 ✅
Test 8 ✅
Overall: FAIL

🔮 The oracle has spoken through Smoke Codex for issue #1036

Copilot

Pull request overview

Adds a new documentation set describing the repository’s integration-test/CI coverage, including a heat map and detailed per-area gap analysis, and links it from CLAUDE.md so contributors can find it.

Changes:

Add docs/INTEGRATION-TESTS.md as a central integration test coverage guide with prioritized gaps/recommendations.
Add six deep-dive “test-analysis” documents covering major integration-test areas and CI/workflows.
Link the new guide from CLAUDE.md.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
docs/INTEGRATION-TESTS.md	New top-level integration test coverage guide (heat map + key gaps).
docs/test-analysis/domain-network.md	Analysis of domain/network integration tests and coverage gaps.
docs/test-analysis/chroot.md	Analysis of chroot integration tests and coverage gaps.
docs/test-analysis/protocol-security.md	Analysis of protocol/security integration tests and gaps.
docs/test-analysis/container-ops.md	Analysis of container/ops integration tests and gaps.
docs/test-analysis/ci-smoke.md	Inventory/analysis of CI + smoke/build-test workflows and gaps.
docs/test-analysis/test-infra.md	Analysis of test fixtures/runner/cleanup and workflow postprocessing.
CLAUDE.md	Adds a link to the new integration test coverage guide.

Comments suppressed due to low confidence (1)

docs/test-analysis/ci-smoke.md:237

This section references a smoke-gemini.lock.yml workflow and smoke-gemini.md source, but neither file exists under .github/workflows/ in the repo. The doc should either remove Gemini from the smoke-test list or add the missing workflow/source files so the documentation reflects reality.

### 11. `smoke-gemini.lock.yml` — Smoke Gemini

**Source**: `smoke-gemini.md`

| Attribute | Value |
|-----------|-------|
| **What it tests** | Gemini engine with same extended tool suite as Codex smoke test |
| **Engine** | `gemini` |
| **Triggers** | Every 12h, PR, manual dispatch |
| **Timeout** | 15 minutes |
| **Real-world mapping** | Validates Gemini (Google) engine works through AWF — important for multi-engine support |
| **Gaps** | Same as Codex. Identical test requirements — could share test definition via imports. |
| **Integration test relationship** | Same as Codex — tests a different engine path through the same infrastructure |

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T19:00:46Z

docs/test-analysis/ci-smoke.md

+| **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 13 workflows | Real AI agent execution inside AWF sandbox |
+| **CI** | Hand-written GitHub Actions (.yml) | 12 workflows | Build, lint, type-check, security, coverage |


The workflow counts in the tier table don't match the repository: there are 28 .lock.yml workflows in .github/workflows/ (not 13) and 15 hand-written .yml workflows (not 12). Consider either updating the counts or clarifying that this row is only smoke+build-test workflows rather than all .lock.yml workflows.

This issue also appears on line 225 of the same file.

Suggested change

| **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 13 workflows | Real AI agent execution inside AWF sandbox |

| **CI** | Hand-written GitHub Actions (.yml) | 12 workflows | Build, lint, type-check, security, coverage |

| **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 28 workflows | Real AI agent execution inside AWF sandbox |

| **CI** | Hand-written GitHub Actions (.yml) | 15 workflows | Build, lint, type-check, security, coverage |

Copilot · 2026-02-25T19:00:47Z

docs/test-analysis/test-infra.md

+| Remove `depth: 1` shallow clone | Full checkout needed |
+| Replace `--image-tag X --skip-pull` with `--build-local` | Use locally-built container images |
+
+Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images.


postprocess-smoke-workflows.ts currently enumerates 29 workflow paths (5 smoke + 8 build-test + 13 agentic + 3 secret-digger), so "Processes 30+" is inaccurate. Also, the script expects .github/workflows/smoke-gemini.lock.yml, which is missing—worth calling out here or updating the text once the missing workflow is added.

Suggested change

Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images.

Processes 29 workflow files (5 smoke, 8 build-test, 13 agentic, 3 secret-digger) across the suite. Note: the script currently expects `.github/workflows/smoke-gemini.lock.yml`; if that workflow does not exist, either add it or update the script to keep the expected workflow list in sync. Ensures CI tests use the current source code rather than stale published images.

Copilot · 2026-02-25T19:00:48Z

docs/test-analysis/protocol-security.md

+3. **Real-world attack simulation**: The credential hiding tests simulate actual exfiltration attacks (base64, xxd, grep patterns).
+4. **Custom matchers**: The `toSucceed()`, `toFail()`, `toExitWithCode()` matchers provide clear, readable assertions.
+5. **Bypass prevention**: Tests specifically cover the chroot bypass vulnerability (Test 8) that was previously discovered and fixed.
+6. **Comprehensive API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation.


The API proxy tests do cover all three providers for healthchecks/env wiring, but end-to-end request routing is only tested for Anthropic (there’s no equivalent routing test for OpenAI or Copilot). This bullet reads stronger than the actual coverage; suggest rewording to avoid implying full routing/isolation verification for all three providers.

Suggested change

6. **Comprehensive API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation.

6. **Broad API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) for healthchecks and env wiring; end-to-end request routing and credential isolation are currently verified in depth only for Anthropic.

Copilot · 2026-02-25T19:00:48Z

docs/INTEGRATION-TESTS.md

+│  Smoke Tests (5 workflows)                          │
+│  Real AI agents (Claude, Copilot, Codex, Gemini)    │


This overview box and the later "Smoke Tests" count assume 5 smoke workflows including Gemini, but the repo currently only has 4 smoke workflows (smoke-claude, smoke-copilot, smoke-codex, smoke-chroot). Either adjust the counts/engine list here or add the missing Gemini workflow so the guide stays accurate.

Suggested change

│ Smoke Tests (5 workflows) │

│ Real AI agents (Claude, Copilot, Codex, Gemini) │

│ Smoke Tests (4 workflows) │

│ Smoke workflows (claude, copilot, codex, chroot) │

Copilot · 2026-02-25T19:00:48Z

docs/INTEGRATION-TESTS.md

+Dependency audit                  ❌      ❌         ✅    ❌      ❌
+
+* ⚠️ = Tests exist but have significant gaps (see detailed docs)
+** = Tests exist but are skip'd


Minor wording: "skip'd" is nonstandard in documentation; "skipped" would be clearer and more professional.

Suggested change

** = Tests exist but are skip'd

** = Tests exist but are skipped

- Fix workflow counts (28 lock.yml, 15 hand-written) in ci-smoke.md - Fix postprocess script count (29 files) and note missing smoke-gemini - Clarify API proxy coverage (routing tested only for Anthropic) - Fix smoke test count from 5 to 4 (Gemini workflow was removed) - Fix "skip'd" to "skipped" - Remove gaps/action items from INTEGRATION-TESTS.md (moved to #1039) - Keep INTEGRATION-TESTS.md as pure coverage reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-02-25T19:55:41Z

Build Test: Bun Results

Project	Install	Tests	Status
elysia	✅	1/1	PASS
hono	✅	1/1	PASS

Overall: ✅ PASS

Bun v1.3.9 — all tests passed across both projects.

Generated by Build Test Bun for issue #1036

github-actions · 2026-02-25T19:55:51Z

🤖 Smoke test results for @Mossaka's PR:

✅ GitHub MCP: Last 2 merged PRs: feat: group --help flags by category, hide dev-only options #1035 "feat: group --help flags by category, hide dev-only options", chore: remove smoke-gemini workflow #1033 "chore: remove smoke-gemini workflow"
✅ Playwright: github.com title contains "GitHub"
✅ File write: /tmp/gh-aw/agent/smoke-test-copilot-22413323701.txt created
✅ Bash: File verified via cat

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot for issue #1036

github-actions · 2026-02-25T19:55:52Z

Go Build Test Results ✅

Project	Download	Tests	Status
color	✅	PASS	PASS
env	✅	PASS	PASS
uuid	✅	PASS	PASS

Overall: PASS

Generated by Build Test Go for issue #1036

github-actions · 2026-02-25T19:55:55Z

.NET Build Test Results

Project	Restore	Build	Run	Status
hello-world	✅	✅	✅	PASS
json-parse	✅	✅	✅	PASS

Overall: PASS

Run output

hello-world:

Hello, World!
```

**json-parse:**
```
{
  "Name": "AWF Test",
  "Version": 1,
  "Success": true
}
Name: AWF Test, Success: True

Generated by Build Test .NET for issue #1036

github-actions · 2026-02-25T19:55:55Z

C++ Build Test Results

Project	CMake	Build	Status
fmt	✅	✅	PASS
json	✅	✅	PASS

Overall: PASS

Generated by Build Test C++ for issue #1036

github-actions · 2026-02-25T19:56:17Z

🦀 Rust Build Test Results

Project	Build	Tests	Status
fd	✅	1/1	PASS
zoxide	✅	1/1	PASS

Overall: ✅ PASS

Generated by Build Test Rust for issue #1036

github-actions · 2026-02-25T19:56:17Z

Build Test: Node.js Results

Project	Install	Tests	Status
clsx	✅	PASS	PASS
execa	✅	PASS	PASS
p-limit	✅	PASS	PASS

Overall: PASS ✅

Generated by Build Test Node.js for issue #1036

github-actions · 2026-02-25T19:56:20Z

🦕 Deno Build Test Results

Project	Tests	Status
oak	1/1	✅ PASS
std	1/1	✅ PASS

Overall: ✅ PASS

Generated by Build Test Deno for issue #1036

github-actions · 2026-02-25T19:56:23Z

Smoke test results

✅ GitHub MCP: feat: group --help flags by category, hide dev-only options #1035 feat: group --help flags by category, chore(deps): bump the all-github-actions group across 1 directory with 13 updates #1017 chore(deps): bump all-github-actions
✅ Playwright: github.com title contains "GitHub"
✅ File write: smoke-test-claude-22413323698.txt created
✅ Bash verify: file contents confirmed

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1036

github-actions · 2026-02-25T19:56:45Z

Test Results:
GitHub MCP merged PRs: ✅ feat: group --help flags by category, hide dev-only options; chore: remove smoke-gemini workflow
safeinputs-gh pr list: ✅
Playwright title check: ✅
Tavily web search: ❌ (Tavily MCP not available)
File write: ✅
Bash cat: ✅
Discussion query + comment: ✅
Build npm ci && npm run build: ✅
Overall status: FAIL

🔮 The oracle has spoken through Smoke Codex for issue #1036

github-actions · 2026-02-25T19:57:19Z

Java Build Test Results

Project	Compile	Tests	Status
gson	✅	1/1	PASS
caffeine	✅	1/1	PASS

Overall: PASS ✅

All Maven projects compiled and tests passed successfully via Squid proxy.

Generated by Build Test Java for issue #1036

Copilot AI review requested due to automatic review settings February 25, 2026 18:54

Copilot started reviewing on behalf of Mossaka February 25, 2026 18:55 View session

github-actions bot added the build-test-bun label Feb 25, 2026

github-actions bot added the build-test-node label Feb 25, 2026

github-actions bot added the smoke-claude label Feb 25, 2026

github-actions bot added the build-test-rust label Feb 25, 2026

github-actions bot added the smoke-copilot label Feb 25, 2026

github-actions bot added the build-test-deno label Feb 25, 2026

github-actions bot added build-test-go build-test-cpp labels Feb 25, 2026

github-actions bot mentioned this pull request Feb 25, 2026

[agentics] No-Op Runs #769

Open

github-actions bot added the build-test-dotnet label Feb 25, 2026

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Mossaka mentioned this pull request Feb 25, 2026

Integration test coverage gaps and recommended actions #1039

Open

Mossaka merged commit c2ebc6d into main Feb 25, 2026
71 checks passed

Mossaka deleted the docs/integration-test-coverage-guide branch February 25, 2026 19:56

github-actions bot added the build-test-java label Feb 25, 2026

		\| Smoke/Build-Test \| gh-aw compiled workflows (.lock.yml) \| 13 workflows \| Real AI agent execution inside AWF sandbox \|
		\| CI \| Hand-written GitHub Actions (.yml) \| 12 workflows \| Build, lint, type-check, security, coverage \|

	Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images.
	Processes 29 workflow files (5 smoke, 8 build-test, 13 agentic, 3 secret-digger) across the suite. Note: the script currently expects `.github/workflows/smoke-gemini.lock.yml`; if that workflow does not exist, either add it or update the script to keep the expected workflow list in sync. Ensures CI tests use the current source code rather than stale published images.

	6. Comprehensive API proxy coverage: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation.
	6. Broad API proxy coverage: Tests cover all three API providers (OpenAI, Anthropic, Copilot) for healthchecks and env wiring; end-to-end request routing and credential isolation are currently verified in depth only for Anthropic.

		│ Smoke Tests (5 workflows) │
		│ Real AI agents (Claude, Copilot, Codex, Gemini) │

	** = Tests exist but are skip'd
	** = Tests exist but are skipped

Conversation

Mossaka commented Feb 25, 2026

Summary

Key Findings

Files

Test plan

Uh oh!

github-actions bot commented Feb 25, 2026

Build Test: Bun ✅

Uh oh!

github-actions bot commented Feb 25, 2026

🟢 Node.js Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Rust Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

Smoke Test Results — Copilot Engine

Uh oh!

github-actions bot commented Feb 25, 2026

Deno Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

Go Build Test Results ✅

Uh oh!

github-actions bot commented Feb 25, 2026

C++ Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

.NET Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 25, 2026

Build Test: Bun Results

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Go Build Test Results ✅

Uh oh!

github-actions bot commented Feb 25, 2026

.NET Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

C++ Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

🦀 Rust Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

Build Test: Node.js Results

Uh oh!

github-actions bot commented Feb 25, 2026

🦕 Deno Build Test Results

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!