Skip to content

docs: add integration test coverage guide with gap analysis#1036

Merged
Mossaka merged 2 commits intomainfrom
docs/integration-test-coverage-guide
Feb 25, 2026
Merged

docs: add integration test coverage guide with gap analysis#1036
Mossaka merged 2 commits intomainfrom
docs/integration-test-coverage-guide

Conversation

@Mossaka
Copy link
Collaborator

@Mossaka Mossaka commented Feb 25, 2026

Summary

  • Adds a comprehensive integration test coverage guide (docs/INTEGRATION-TESTS.md) with a heat map of what's tested vs. not, critical gaps identified, and prioritized recommendations
  • Includes 6 detailed per-area analysis documents under docs/test-analysis/ covering domain/network, chroot, protocol/security, container/ops, CI/smoke workflows, and test infrastructure
  • Links the new guide from CLAUDE.md documentation section

Key Findings

Critical gaps discovered:

  1. test-integration.yml is actually just a type-check — 20+ integration test files have no CI pipeline
  2. --block-domains flag is completely untested (file is a misnomer)
  3. --env-all (the primary production mode) has zero tests
  4. DNS restriction enforcement is unverified
  5. Package manager tests only query registries, never install packages
  6. git push with authentication is untested
  7. Docker warning tests are entirely describe.skip'd

Files

File Description
docs/INTEGRATION-TESTS.md Main guide with overview, heat map, gaps, priorities
docs/test-analysis/domain-network.md Domain filtering, DNS, network security (6 test files)
docs/test-analysis/chroot.md Chroot sandbox, languages, package managers (5 test files)
docs/test-analysis/protocol-security.md Protocol support, credentials, tokens (8 test files)
docs/test-analysis/container-ops.md Containers, volumes, git, env vars (7 test files)
docs/test-analysis/ci-smoke.md All 27 CI/smoke/build-test workflows
docs/test-analysis/test-infra.md Test runner, batch pattern, cleanup strategy
CLAUDE.md Added link to new docs

Test plan

  • Verify all markdown links resolve correctly
  • Review gap analysis against current test files for accuracy
  • Confirm heat map reflects actual CI workflow configurations

🤖 Generated with Claude Code

Comprehensive analysis of all integration tests, CI workflows, and smoke
tests to improve visibility into what's covered and what's missing.

Key findings:
- Most integration tests (20+ files) don't run in CI
- --block-domains and --env-all are completely untested
- DNS restriction enforcement is unverified
- Package manager tests only query registries, never install

Includes 6 detailed analysis documents covering domain/network, chroot,
protocol/security, container/ops, CI/smoke, and test infrastructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 25, 2026 18:54
@github-actions
Copy link
Contributor

Build Test: Bun ✅

Project Install Tests Status
elysia 1/1 PASS
hono 1/1 PASS

Overall: PASS 🎉

Bun version: 1.3.9

Generated by Build Test Bun for issue #1036

@github-actions
Copy link
Contributor

🟢 Node.js Build Test Results

Project Install Tests Status
clsx All passed PASS
execa All passed PASS
p-limit All passed PASS

Overall: PASS

Generated by Build Test Node.js for issue #1036

@github-actions
Copy link
Contributor

Smoke Test Results — PASS

Test Result
GitHub MCP (last 2 merged PRs: #1033 "chore: remove smoke-gemini workflow", #1032 "fix: always set NO_PROXY to bypass Squid for localhost")
Playwright (github.com title contains "GitHub")
File write (smoke-test-claude-22411251082.txt)
Bash verify (cat file)

💥 [THE END] — Illustrated by Smoke Claude for issue #1036

@github-actions
Copy link
Contributor

Rust Build Test Results

Project Build Tests Status
fd 1/1 PASS
zoxide 1/1 PASS

Overall: PASS

Generated by Build Test Rust for issue #1036

@github-actions
Copy link
Contributor

Smoke Test Results — Copilot Engine

GitHub MCP: Last 2 merged PRs: #1033 chore: remove smoke-gemini workflow, #1032 fix: always set NO_PROXY to bypass Squid for localhost (both by @Mossaka)
Playwright: https://github.com title contains "GitHub" ✓
File Write: /tmp/gh-aw/agent/smoke-test-copilot-22411251036.txt created and verified
Bash: cat confirmed file contents

Overall: PASS | PR author: @Mossaka | No assignees

📰 BREAKING: Report filed by Smoke Copilot for issue #1036

@github-actions
Copy link
Contributor

Deno Build Test Results

Project Tests Status
oak 1/1 ✅ PASS
std 1/1 ✅ PASS

Overall: ✅ PASS

Deno version: 2.7.1

Generated by Build Test Deno for issue #1036

@github-actions
Copy link
Contributor

Go Build Test Results ✅

Project Download Tests Status
color PASS PASS
env PASS PASS
uuid PASS PASS

Overall: PASS

Generated by Build Test Go for issue #1036

@github-actions
Copy link
Contributor

C++ Build Test Results

Project CMake Build Status
fmt PASS
json PASS

Overall: PASS

Generated by Build Test C++ for issue #1036

@github-actions
Copy link
Contributor

.NET Build Test Results

Project Restore Build Run Status
hello-world PASS
json-parse PASS

Overall: ✅ PASS

Run output

hello-world: Hello, World!

json-parse:

{
  "Name": "AWF Test",
  "Version": 1,
  "Success": true
}
Name: AWF Test, Success: True

Generated by Build Test .NET for issue #1036

@github-actions
Copy link
Contributor

chore: remove smoke-gemini workflow | fix: always set NO_PROXY to bypass Squid for localhost
Test 1 ✅
Test 2 ✅
Test 3 ✅
Test 4 ❌
Test 5 ✅
Test 6 ✅
Test 7 ✅
Test 8 ✅
Overall: FAIL

🔮 The oracle has spoken through Smoke Codex for issue #1036

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation set describing the repository’s integration-test/CI coverage, including a heat map and detailed per-area gap analysis, and links it from CLAUDE.md so contributors can find it.

Changes:

  • Add docs/INTEGRATION-TESTS.md as a central integration test coverage guide with prioritized gaps/recommendations.
  • Add six deep-dive “test-analysis” documents covering major integration-test areas and CI/workflows.
  • Link the new guide from CLAUDE.md.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docs/INTEGRATION-TESTS.md New top-level integration test coverage guide (heat map + key gaps).
docs/test-analysis/domain-network.md Analysis of domain/network integration tests and coverage gaps.
docs/test-analysis/chroot.md Analysis of chroot integration tests and coverage gaps.
docs/test-analysis/protocol-security.md Analysis of protocol/security integration tests and gaps.
docs/test-analysis/container-ops.md Analysis of container/ops integration tests and gaps.
docs/test-analysis/ci-smoke.md Inventory/analysis of CI + smoke/build-test workflows and gaps.
docs/test-analysis/test-infra.md Analysis of test fixtures/runner/cleanup and workflow postprocessing.
CLAUDE.md Adds a link to the new integration test coverage guide.
Comments suppressed due to low confidence (1)

docs/test-analysis/ci-smoke.md:237

  • This section references a smoke-gemini.lock.yml workflow and smoke-gemini.md source, but neither file exists under .github/workflows/ in the repo. The doc should either remove Gemini from the smoke-test list or add the missing workflow/source files so the documentation reflects reality.
### 11. `smoke-gemini.lock.yml` — Smoke Gemini

**Source**: `smoke-gemini.md`

| Attribute | Value |
|-----------|-------|
| **What it tests** | Gemini engine with same extended tool suite as Codex smoke test |
| **Engine** | `gemini` |
| **Triggers** | Every 12h, PR, manual dispatch |
| **Timeout** | 15 minutes |
| **Real-world mapping** | Validates Gemini (Google) engine works through AWF — important for multi-engine support |
| **Gaps** | Same as Codex. Identical test requirements — could share test definition via imports. |
| **Integration test relationship** | Same as Codex — tests a different engine path through the same infrastructure |

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 28 to 29
| **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 13 workflows | Real AI agent execution inside AWF sandbox |
| **CI** | Hand-written GitHub Actions (.yml) | 12 workflows | Build, lint, type-check, security, coverage |
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow counts in the tier table don't match the repository: there are 28 .lock.yml workflows in .github/workflows/ (not 13) and 15 hand-written .yml workflows (not 12). Consider either updating the counts or clarifying that this row is only smoke+build-test workflows rather than all .lock.yml workflows.

This issue also appears on line 225 of the same file.

Suggested change
| **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 13 workflows | Real AI agent execution inside AWF sandbox |
| **CI** | Hand-written GitHub Actions (.yml) | 12 workflows | Build, lint, type-check, security, coverage |
| **Smoke/Build-Test** | gh-aw compiled workflows (.lock.yml) | 28 workflows | Real AI agent execution inside AWF sandbox |
| **CI** | Hand-written GitHub Actions (.yml) | 15 workflows | Build, lint, type-check, security, coverage |

Copilot uses AI. Check for mistakes.
| Remove `depth: 1` shallow clone | Full checkout needed |
| Replace `--image-tag X --skip-pull` with `--build-local` | Use locally-built container images |

Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postprocess-smoke-workflows.ts currently enumerates 29 workflow paths (5 smoke + 8 build-test + 13 agentic + 3 secret-digger), so "Processes 30+" is inaccurate. Also, the script expects .github/workflows/smoke-gemini.lock.yml, which is missing—worth calling out here or updating the text once the missing workflow is added.

Suggested change
Processes 30+ workflow files across smoke tests, build tests, and agentic workflows. Ensures CI tests use the current source code rather than stale published images.
Processes 29 workflow files (5 smoke, 8 build-test, 13 agentic, 3 secret-digger) across the suite. Note: the script currently expects `.github/workflows/smoke-gemini.lock.yml`; if that workflow does not exist, either add it or update the script to keep the expected workflow list in sync. Ensures CI tests use the current source code rather than stale published images.

Copilot uses AI. Check for mistakes.
3. **Real-world attack simulation**: The credential hiding tests simulate actual exfiltration attacks (base64, xxd, grep patterns).
4. **Custom matchers**: The `toSucceed()`, `toFail()`, `toExitWithCode()` matchers provide clear, readable assertions.
5. **Bypass prevention**: Tests specifically cover the chroot bypass vulnerability (Test 8) that was previously discovered and fixed.
6. **Comprehensive API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API proxy tests do cover all three providers for healthchecks/env wiring, but end-to-end request routing is only tested for Anthropic (there’s no equivalent routing test for OpenAI or Copilot). This bullet reads stronger than the actual coverage; suggest rewording to avoid implying full routing/isolation verification for all three providers.

Suggested change
6. **Comprehensive API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) and verify credential isolation.
6. **Broad API proxy coverage**: Tests cover all three API providers (OpenAI, Anthropic, Copilot) for healthchecks and env wiring; end-to-end request routing and credential isolation are currently verified in depth only for Anthropic.

Copilot uses AI. Check for mistakes.
Comment on lines 28 to 29
│ Smoke Tests (5 workflows) │
│ Real AI agents (Claude, Copilot, Codex, Gemini) │
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overview box and the later "Smoke Tests" count assume 5 smoke workflows including Gemini, but the repo currently only has 4 smoke workflows (smoke-claude, smoke-copilot, smoke-codex, smoke-chroot). Either adjust the counts/engine list here or add the missing Gemini workflow so the guide stays accurate.

Suggested change
│ Smoke Tests (5 workflows) │
Real AI agents (Claude, Copilot, Codex, Gemini)
│ Smoke Tests (4 workflows) │
Smoke workflows (claude, copilot, codex, chroot)

Copilot uses AI. Check for mistakes.
Dependency audit ❌ ❌ ✅ ❌ ❌

* ⚠️ = Tests exist but have significant gaps (see detailed docs)
** = Tests exist but are skip'd
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor wording: "skip'd" is nonstandard in documentation; "skipped" would be clearer and more professional.

Suggested change
** = Tests exist but are skip'd
** = Tests exist but are skipped

Copilot uses AI. Check for mistakes.
- Fix workflow counts (28 lock.yml, 15 hand-written) in ci-smoke.md
- Fix postprocess script count (29 files) and note missing smoke-gemini
- Clarify API proxy coverage (routing tested only for Anthropic)
- Fix smoke test count from 5 to 4 (Gemini workflow was removed)
- Fix "skip'd" to "skipped"
- Remove gaps/action items from INTEGRATION-TESTS.md (moved to #1039)
- Keep INTEGRATION-TESTS.md as pure coverage reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Build Test: Bun Results

Project Install Tests Status
elysia 1/1 PASS
hono 1/1 PASS

Overall: ✅ PASS

Bun v1.3.9 — all tests passed across both projects.

Generated by Build Test Bun for issue #1036

@github-actions
Copy link
Contributor

🤖 Smoke test results for @Mossaka's PR:

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot for issue #1036

@github-actions
Copy link
Contributor

Go Build Test Results ✅

Project Download Tests Status
color PASS PASS
env PASS PASS
uuid PASS PASS

Overall: PASS

Generated by Build Test Go for issue #1036

@github-actions
Copy link
Contributor

.NET Build Test Results

Project Restore Build Run Status
hello-world PASS
json-parse PASS

Overall: PASS

Run output

hello-world:

Hello, World!
```

**json-parse:**
```
{
  "Name": "AWF Test",
  "Version": 1,
  "Success": true
}
Name: AWF Test, Success: True

Generated by Build Test .NET for issue #1036

@github-actions
Copy link
Contributor

C++ Build Test Results

Project CMake Build Status
fmt PASS
json PASS

Overall: PASS

Generated by Build Test C++ for issue #1036

@github-actions
Copy link
Contributor

🦀 Rust Build Test Results

Project Build Tests Status
fd 1/1 PASS
zoxide 1/1 PASS

Overall: ✅ PASS

Generated by Build Test Rust for issue #1036

@github-actions
Copy link
Contributor

Build Test: Node.js Results

Project Install Tests Status
clsx PASS PASS
execa PASS PASS
p-limit PASS PASS

Overall: PASS

Generated by Build Test Node.js for issue #1036

@github-actions
Copy link
Contributor

🦕 Deno Build Test Results

Project Tests Status
oak 1/1 ✅ PASS
std 1/1 ✅ PASS

Overall: ✅ PASS

Generated by Build Test Deno for issue #1036

@github-actions
Copy link
Contributor

Smoke test results

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1036

@github-actions
Copy link
Contributor

Test Results:
GitHub MCP merged PRs: ✅ feat: group --help flags by category, hide dev-only options; chore: remove smoke-gemini workflow
safeinputs-gh pr list: ✅
Playwright title check: ✅
Tavily web search: ❌ (Tavily MCP not available)
File write: ✅
Bash cat: ✅
Discussion query + comment: ✅
Build npm ci && npm run build: ✅
Overall status: FAIL

🔮 The oracle has spoken through Smoke Codex for issue #1036

@Mossaka Mossaka merged commit c2ebc6d into main Feb 25, 2026
71 checks passed
@Mossaka Mossaka deleted the docs/integration-test-coverage-guide branch February 25, 2026 19:56
@github-actions
Copy link
Contributor

Java Build Test Results

Project Compile Tests Status
gson 1/1 PASS
caffeine 1/1 PASS

Overall: PASS

All Maven projects compiled and tests passed successfully via Squid proxy.

Generated by Build Test Java for issue #1036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants