[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment

## 📊 Current CI/CD Pipeline Status

The repository has a **well-structured, multi-layered CI/CD pipeline** with 44 workflow files (both traditional YAML and agentic Markdown workflows). The pipeline covers build verification, linting, type checking, unit/integration testing, security scanning, and agentic smoke tests. However, several recent runs are failing and key coverage gaps exist.

**Recent run health (2026-02-24):**
| Workflow | Status |
|---|---|
| PR Title Check | ✅ Success |
| Dependency Vulnerability Audit | ✅ Success |
| Secret Digger (Copilot) | ✅ Success |
| Issue Monster | ✅ Success |
| Smoke Chroot | ❌ Failure |
| Examples Test | ❌ Failure |
| Build Test .NET | ❌ Failure |
| Build Test Rust | ❌ Failure |
| Build Test Deno | ❌ Failure |
| Chroot Integration Tests | ❌ Failure |

The spike in integration test failures is a signal that flakiness or environment drift may be affecting reliability.

---

## ✅ Existing Quality Gates

### Static Analysis & Build
- **ESLint** (`lint.yml`) — TypeScript source linting, runs on all PRs
- **TypeScript type check** (`test-integration.yml`) — `tsc --noEmit`, runs on all PRs
- **Build Verification** (`build.yml`) — Lint + build across Node 20 & 22, verifies `dist/cli.js` exists
- **PR Title Check** (`pr-title.yml`) — Semantic commit format, allowed scopes enforced

### Testing
- **Unit test coverage** (`test-coverage.yml`) — Runs Jest with coverage, posts diff-comparison comment on PRs, fails on regression
- **Examples Test** (`test-examples.yml`) — Runs all `examples/*.sh` scripts end-to-end
- **Test Setup Action** (`test-action.yml`) — Tests `action.yml` install across latest/specific/invalid versions
- **Chroot Integration Tests** (`test-chroot.yml`) — Multi-job: languages, package managers, procfs, edge cases
- **Agentic Build Tests** (`build-test-*.md`) — Live ecosystem tests (Node, Go, Rust, Java, .NET, Deno, Bun, C++) via GitHub Copilot agent

### Security
- **CodeQL** (`codeql.yml`) — `javascript-typescript` + `actions` analysis, security-extended queries, runs on PRs and weekly
- **Container Security Scan** (`container-scan.yml`) — Trivy scanning agent + squid images for CRITICAL/HIGH CVEs
- **Dependency Vulnerability Audit** (`dependency-audit.yml`) — `npm audit --audit-level=high` for main + docs-site packages
- **Security Guard** (`security-guard.md`) — Claude-based agentic review of PR security impact

### Operational
- **CI Doctor** (`ci-doctor.md`) — Post-run analysis of failed workflow runs
- **Dependency Security Monitor** (`dependency-security-monitor.md`) — Daily automated monitoring
- **Secret Digger** (3 engines) — Hourly agentic scans for leaked secrets

---

## 🔍 Identified Gaps

### 🔴 High Priority

#### 1. Critically Low Unit Test Coverage with Low Thresholds
**File-by-file breakdown reveals alarming gaps:**
| File | Statement Coverage | Notes |
|---|---|---|
| `cli.ts` | **0%** | Entry point, signal handlers, error paths |
| `docker-manager.ts` | **18%** | Most complex file, 250 statements |
| `host-iptables.ts` | 83% | Good but some branches uncovered |

The coverage thresholds in `jest.config.js` are set far below acceptable production standards:
```js
coverageThreshold: { global: { branches: 30, functions: 35, lines: 38, statements: 38 } }
```
A codebase where the primary entrypoint and the core Docker orchestration layer have near-zero unit coverage means PRs can introduce regressions in these paths undetected.

#### 2. Container Scan Doesn't Run on All PRs
`container-scan.yml` only triggers on changes to `containers/**` or the workflow file itself. A PR that bumps a `FROM` base image reference indirectly (e.g., via `ubuntu/squid:latest`) or changes security-sensitive container configuration won't trigger a scan if the path filter is not matched.

#### 3. No Shell Script Linting (ShellCheck)
The repository contains numerous critical shell scripts in `containers/agent/` (`setup-iptables.sh`, `entrypoint.sh`) and `scripts/ci/`. These scripts implement security-critical iptables rules and cleanup logic. There is no ShellCheck linting in CI to catch issues like unquoted variables, command injection risks, or portability problems.

#### 4. Integration Test Reliability — Multiple Active Failures
Six integration-level workflows are currently failing. Without reliable green integration tests, PRs cannot confidently use these as quality gates. The failing workflows represent the most critical end-to-end verification of the firewall's core functionality.

#### 5. No Required Status Checks Documented/Enforced
There is no documented set of **required** passing checks that block PR merge. The PR title check runs, but it's not clear which of the many workflows are enforced as merge blockers in branch protection rules.

---

### 🟡 Medium Priority

#### 6. No Coverage Trend Tracking (Codecov/Coveralls)
The `test-coverage.yml` generates LCOV reports and uploads them as artifacts, but does not integrate with a coverage trend service (Codecov, Coveralls, etc.). This means:
- No coverage badge in README
- No historical trend visibility
- No per-file coverage diff in PR checks as a dedicated check (only bot comment)

The COVERAGE_SUMMARY.md in the repo is a static snapshot — not a living metric.

#### 7. Smoke Tests Are Reaction-Gated, Not Auto-Run on All PRs
The agentic smoke tests (`smoke-claude.md`, `smoke-copilot.md`, `smoke-codex.md`, `smoke-gemini.md`) require emoji reactions to trigger on PRs (e.g., 👍, ❤️, 🎉). While they do run on a 12h schedule, changes in a PR aren't automatically smoke-tested against all engines unless a maintainer adds the reaction. This creates a window where breaking changes can merge without live firewall validation.

The exception is `smoke-chroot.md` which has path filters on `src/**` and `containers/**`, but it's currently failing.

#### 8. No ARM64 Runner Tests Despite ARM64 Documentation
The `docs/compatibility.md` references ARM64 support, but no workflow runs tests on `ubuntu-24.04-arm` or equivalent. All CI runs use `ubuntu-latest` (x86-64). Binary builds and container tests should be validated on ARM64.

#### 9. No Artifact Size Monitoring
There's no check on the size of build artifacts (`dist/`, binary files via `pkg`). Over time, dependency additions can silently inflate the CLI binary size. A size threshold check would catch this early.

#### 10. Missing Mutation Testing
The unit test suite achieves its coverage threshold by executing code paths but the tests may not actually verify correctness of all branches. Mutation testing (e.g., Stryker.js) would reveal whether tests are genuinely detecting bugs or merely providing coverage numbers.

---

### 🟢 Low Priority

#### 11. No Broken Link Checking for Documentation
`docs-site/` is deployed to GitHub Pages but there's no link validation in CI. Broken internal/external links in documentation degrade developer experience silently.

#### 12. No Changelog/Commit Validation Beyond Title
While PR titles are validated for Conventional Commits format, there's no validation that `CHANGELOG` or release notes are updated for feature PRs, and no conventional-changelog generation in the release workflow.

#### 13. No Code Complexity Enforcement
No cyclomatic complexity or cognitive complexity limits are enforced. `docker-manager.ts` (250 statements, 81 branches) is already a complexity hotspot with 18% coverage.

#### 14. `pelis-agent-factory-advisor` Workflow Not Compiled
The `agenticworkflows-status` output shows `pelis-agent-factory-advisor` has `compiled: No`. This workflow will not execute as intended since the lock file is missing or stale.

---

## 📋 Actionable Recommendations

### 1. Raise Coverage Thresholds Incrementally
**Issue:** Thresholds are too low (30-38%), CI accepts near-zero coverage for critical files.  
**Solution:** Increase thresholds by 5% per quarter and add per-file thresholds for `cli.ts` and `docker-manager.ts`:
```js
coverageThreshold: {
  global: { branches: 45, functions: 50, lines: 50, statements: 50 },
  './src/cli.ts': { lines: 20 },
  './src/docker-manager.ts': { lines: 30 }
}
```
**Complexity:** Low | **Impact:** High

### 2. Add ShellCheck to CI
**Issue:** No linting for security-critical shell scripts.  
**Solution:** Add a job to `lint.yml`:
```yaml
- name: Run ShellCheck
  uses: ludeeus/action-shellcheck@master
  with:
    scandir: './containers'
    additional_files: 'scripts/ci/*.sh'
```
**Complexity:** Low | **Impact:** High

### 3. Run Container Scan on All PRs (Remove Path Filter)
**Issue:** Security scan only runs when `containers/**` changes.  
**Solution:** Remove the `paths:` filter from `container-scan.yml` PR trigger, or add a separate lightweight scan job that runs on every PR.  
**Complexity:** Low | **Impact:** High

### 4. Compile the `pelis-agent-factory-advisor` Workflow
**Issue:** Workflow shows `compiled: No` and won't execute.  
**Solution:** Run `gh aw compile .github/workflows/pelis-agent-factory-advisor.md` and commit the generated lock file.  
**Complexity:** Low | **Impact:** Medium

### 5. Integrate Coverage with Codecov
**Issue:** No trend tracking or PR coverage diff as a dedicated check.  
**Solution:** Add to `test-coverage.yml`:
```yaml
- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v5
  with:
    files: ./coverage/lcov.info
    fail_ci_if_error: false
```
**Complexity:** Low | **Impact:** Medium

### 6. Document and Enforce Required Status Checks
**Issue:** No clear documentation of which checks must pass before merge.  
**Solution:** Update branch protection rules for `main` to require: `Build Verification`, `ESLint`, `TypeScript Type Check`, `Test Coverage Report`, `PR Title Check`, and `CodeQL`. Document these in `CONTRIBUTING.md`.  
**Complexity:** Low | **Impact:** High

### 7. Add ARM64 Runner Job
**Issue:** No ARM64 CI validation despite documented support.  
**Solution:** Add an ARM64 matrix entry to `build.yml`:
```yaml
matrix:
  include:
    - os: ubuntu-latest
      arch: x86-64
    - os: ubuntu-24.04-arm
      arch: arm64
```
**Complexity:** Medium | **Impact:** Medium

### 8. Auto-trigger Smoke Tests on `src/**` Changes
**Issue:** Smoke tests require manual reactions for PR-level validation.  
**Solution:** Add path filters to smoke workflow triggers so that PRs touching `src/**` or `containers/**` automatically run at least one smoke test without requiring a reaction. Keep reactions as an additional trigger.  
**Complexity:** Low | **Impact:** Medium

### 9. Fix Active Integration Test Failures
**Issue:** 6 workflows currently failing, undermining test reliability.  
**Solution:** Investigate and fix the failing `Smoke Chroot`, `Examples Test`, `Chroot Integration Tests`, and `Build Test` (Rust, Deno, .NET) failures before adding new quality gates.  
**Complexity:** Medium | **Impact:** High (prerequisite for reliable gates)

### 10. Add Documentation Link Checker
**Issue:** No broken link validation for `docs-site/`.  
**Solution:** Add a workflow step using `lychee` or `markdown-link-check` to validate links in docs as part of the `deploy-docs.yml` build step.  
**Complexity:** Low | **Impact:** Low

---

## 📈 Metrics Summary

| Metric | Value |
|---|---|
| Total workflow files | 44 (29 agentic .md + 15 YAML) |
| PR-triggered workflows | ~15 |
| Scheduled workflows | ~12 |
| Current overall statement coverage | ~38% |
| Coverage threshold (statements) | 38% |
| `cli.ts` coverage | **0%** |
| `docker-manager.ts` coverage | **18%** |
| Unit tests passing | 135/135 |
| Recent integration test failures | **6** |
| Security workflows active | 6 (CodeQL, Trivy, npm audit, security-guard, secret-diggers, dependency-monitor) |

> *Assessment generated on 2026-02-24. Coverage data from `COVERAGE_SUMMARY.md`. Workflow run data from GitHub Actions API.*

---

> **Note:** This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
>
> **Tip:** Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.




> Generated by [CI/CD Pipelines and Integration Tests Gap Assessment](https://github.com/github/gh-aw-firewall/actions/runs/22372465873)
> - [x] expires  on Mar 3, 2026, 10:22 PM UTC

Workflow	Status
PR Title Check	✅ Success
Dependency Vulnerability Audit	✅ Success
Secret Digger (Copilot)	✅ Success
Issue Monster	✅ Success
Smoke Chroot	❌ Failure
Examples Test	❌ Failure
Build Test .NET	❌ Failure
Build Test Rust	❌ Failure
Build Test Deno	❌ Failure
Chroot Integration Tests	❌ Failure

File	Statement Coverage	Notes
`cli.ts`	0%	Entry point, signal handlers, error paths
`docker-manager.ts`	18%	Most complex file, 250 statements
`host-iptables.ts`	83%	Good but some branches uncovered

Metric	Value
Total workflow files	44 (29 agentic .md + 15 YAML)
PR-triggered workflows	~15
Scheduled workflows	~12
Current overall statement coverage	~38%
Coverage threshold (statements)	38%
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
Unit tests passing	135/135
Recent integration test failures	6
Security workflows active	6 (CodeQL, Trivy, npm audit, security-guard, secret-diggers, dependency-monitor)

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1024

Description

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Static Analysis & Build

Testing

Security

Operational

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage with Low Thresholds

2. Container Scan Doesn't Run on All PRs

3. No Shell Script Linting (ShellCheck)

4. Integration Test Reliability — Multiple Active Failures

5. No Required Status Checks Documented/Enforced

🟡 Medium Priority

6. No Coverage Trend Tracking (Codecov/Coveralls)

7. Smoke Tests Are Reaction-Gated, Not Auto-Run on All PRs

8. No ARM64 Runner Tests Despite ARM64 Documentation

9. No Artifact Size Monitoring

10. Missing Mutation Testing

🟢 Low Priority

11. No Broken Link Checking for Documentation

12. No Changelog/Commit Validation Beyond Title

13. No Code Complexity Enforcement

14. pelis-agent-factory-advisor Workflow Not Compiled

📋 Actionable Recommendations

1. Raise Coverage Thresholds Incrementally

2. Add ShellCheck to CI

3. Run Container Scan on All PRs (Remove Path Filter)

4. Compile the pelis-agent-factory-advisor Workflow

5. Integrate Coverage with Codecov

6. Document and Enforce Required Status Checks

7. Add ARM64 Runner Job

8. Auto-trigger Smoke Tests on src/** Changes

9. Fix Active Integration Test Failures

10. Add Documentation Link Checker

📈 Metrics Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

14. `pelis-agent-factory-advisor` Workflow Not Compiled

4. Compile the `pelis-agent-factory-advisor` Workflow

8. Auto-trigger Smoke Tests on `src/**` Changes