Gating criteria for releasing the dual-mode Agent Mail (MCP server + CLI).
Primary Beads: br-3vwi.12.1, br-3vwi.12.2 Track: br-3vwi.12 (Rollout governance, release gates, feedback loop) Last Updated: 2026-02-13
- Phase 0 packet complete: CI + local validation evidence attached
- Phase 1 packet complete: 24-48h canary metrics + incident log review attached
- Phase 2 packet complete: 25%/50%/100% ring promotions each signed separately
- Phase 3 packet complete: GA sign-off + ongoing monitoring owners recorded
- Kill-switch owner for each V2 surface is named and on-call reachable
- Rollback communication template reviewed and ready for use
| Gate family | Hard threshold (promotion blocker) | Machine-check source | Artifact evidence |
|---|---|---|---|
| Unit + integration correctness | Pass rate = 100% (fail=0) |
cargo test --workspace and CI gate entry Unit + integration tests=status:pass |
CI logs + gate report JSON |
| Dual-mode E2E correctness | Pass rate = 100% (fail=0) for E2E dual-mode and E2E mode matrix |
am e2e run --project . dual_mode, am e2e run --project . mode_matrix, and CI gate report |
tests/artifacts/dual_mode/*/run_summary.json |
| Security/privacy | Pass rate = 100% (fail=0) for E2E security/privacy |
am e2e run --project . security_privacy and CI gate report |
tests/artifacts/security_privacy/*/* |
| Accessibility | Pass rate = 100% (fail=0) for E2E TUI accessibility |
am e2e run --project . tui_a11y and CI gate report |
tests/artifacts/tui_a11y/*/* |
| Cross-platform native command portability | Pass rate = 100% (fail=0) for native command matrix on Linux/macOS/Windows |
CI job native-command-matrix in .github/workflows/ci.yml |
tests/artifacts/cli/native_command_matrix/<os>/summary.json |
| Performance budgets | perf_security_regressions=status:pass + perf_guardrails=status:pass with no budget/delta violations |
cargo test -p mcp-agent-mail-cli --test perf_security_regressions -- --nocapture, cargo test -p mcp-agent-mail-cli --test perf_guardrails -- --nocapture, and CI gate report |
tests/artifacts/cli/perf_security/*, tests/artifacts/cli/perf_guardrails/*, benchmark artifacts |
| Determinism | Golden/export checks report zero mismatches | am golden verify and static export tests |
benches/golden/checksums.sha256, tests/artifacts/share/*/* |
| Automation/governance | CI report has decision=\"go\", release_eligible=true, and sign-off row completed |
am ci --report tests/artifacts/ci/gate_report.json |
tests/artifacts/ci/gate_report.json, sign-off ledger row |
| Gate family | Required bead outputs | Evidence path |
|---|---|---|
| Unit/integration + harness coverage | br-3vwi.10 track outputs (mode_matrix_harness, semantic_conformance) |
CI logs + tests/artifacts/dual_mode/* |
| Security/privacy | br-3vwi.10.14 security/privacy E2E suite |
tests/artifacts/security_privacy/* |
| Accessibility | br-3vwi.10.13 keyboard/focus/contrast suite |
tests/artifacts/tui_a11y/* |
| Cross-platform portability | br-3lc7f native command matrix evidence |
tests/artifacts/cli/native_command_matrix/<os>/summary.json |
| Performance | br-3vwi.10.11 perf regression script pack |
perf regression logs and trend artifacts |
| Deterministic replay/export | br-3vwi.10.19 + br-3vwi.10.22 |
replay artifacts + share/export artifacts |
| Rollout governance + operator readiness | br-3vwi.11.1 + br-3vwi.12.1 + br-3vwi.12.2 |
docs/ROLLOUT_PLAYBOOK.md, docs/RELEASE_CHECKLIST.md, CI gate report |
-
am serve-httpstarts server + TUI with one command -
mcp-agent-mail serve --no-tuiruns headless server -
am serve-http --path api/am serve-http --path mcpswitches transport modes -
am serve-http --no-authdisables authentication for local dev - Auth token auto-discovered from
~/.mcp_agent_mail/.env - All 34 MCP tools respond correctly
- All 20+ MCP resources return correct data
- Startup probes catch and report common failures (port, storage, DB)
- Graceful shutdown flushes commit queue
- Native deploy verification path available:
am share deploy verify-live <url> --bundle <bundle_dir>
- MCP binary (
mcp-agent-mail) denies CLI-only commands with exit 2 - CLI binary (
am) accepts all 22+ command families - Denial message includes command name, allowed commands, and remediation hint
- No env variable (
INTERFACE_MODE, etc.) can bypass the denial gate - Case variants of allowed commands are denied (e.g.,
Serve,CONFIG) -
mcp-agent-mail serve --helpexits 0 -
mcp-agent-mail configexits 0 - All CLI parity commands implemented (messaging, contacts, reservations, agents, tooling)
- Dashboard: event stream, sparkline, counters
- Messages: browse, search, filter
- Threads: correlation, drill-down
- Agents: roster with recency indicators
- Reservations: TTL countdowns, status
- Tool Metrics: per-tool latency, call counts
- System Health: connection probes, disk/memory
- Command palette (Ctrl+P) with all actions
- Help overlay (?) with screen-specific keybindings
- Theme cycling (Shift+T) across 5 themes
- MCP/API mode toggle (m)
- Workspace tests pass (
cargo test— 1000+ tests) - Conformance tests pass (
cargo test -p mcp-agent-mail-conformance) - Clippy clean:
cargo clippy --workspace -- -D warnings - Format clean:
cargo fmt --all -- --check - No keybinding conflicts (automated test)
- E2E:
amstarts and reaches ready state - E2E: TUI interaction flows (search, timeline, palette)
- E2E: MCP/API mode switching
- E2E: stdio transport
- E2E: CLI commands
- E2E: verify-live failure matrix + compatibility-wrapper delegation checks
- Stress tests pass (concurrent agents, pool exhaustion)
- Mode matrix harness: 22 CLI-allow + 16 MCP-deny + 2 MCP-allow
cargo test -p mcp-agent-mail-cli --test mode_matrix_harness - Semantic conformance: 10 SC tests (DB parity, validation, drift report)
cargo test -p mcp-agent-mail-cli --test semantic_conformance - Perf/security regressions: 13 tests (latency budgets, bypass attempts)
cargo test -p mcp-agent-mail-cli --test perf_security_regressions - Perf migration guardrails: native-vs-legacy budgets + unavailable rationale capture
cargo test -p mcp-agent-mail-cli --test perf_guardrails - Help snapshots match golden fixtures
cargo test -p mcp-agent-mail-cli --test help_snapshots - E2E dual-mode: 84+ assertions (7 sections)
am e2e run --project . dual_mode - E2E mode matrix: 42+ assertions
am e2e run --project . mode_matrix
- Startup probes complete in <2 seconds
- Event ring buffer bounded (no memory leak under load)
- Commit coalescer batches effectively under load
- DB pool sized appropriately (25 + 75 overflow)
- No sustained lock contention at steady state
- README includes dual-mode interface section
- Operator runbook: startup, controls, troubleshooting, diagnostics
- Developer guide: adding screens, actions, keybindings, tests
- Recovery runbook: SQLite corruption, archive rebuild
- ADR-001: dual-mode invariants documented
- Migration guide: before/after command mappings
- Legacy script shim deprecation/rollback policy documented (
docs/SPEC-script-migration-matrix.md, T10.5) - Rollout playbook: phased plan + kill-switch procedure
- AGENTS.md: dual-mode reminder for agents
- Verify-live contract + compatibility strategy documented (
docs/SPEC-verify-live-contract.md)
Before release, verify test artifacts are consistent and complete:
# 1. Dual-mode E2E artifacts exist and show 0 failures
ls tests/artifacts/dual_mode/*/run_summary.json
cat tests/artifacts/dual_mode/*/run_summary.json
# e2e_fail must be 0, e2e_pass >= 84
# 2. No failure bundles generated
ls tests/artifacts/dual_mode/*/failures/
# Should be empty (no fail_*.json files)
# 3. Per-step structured logs exist
ls tests/artifacts/dual_mode/*/steps/step_*.json | wc -l
# Should be >= 42 (one per test step)
# 4. Golden snapshot checksums are current
am golden verify
# All checksums must match
# 4b. Native check-inbox command is available
am check-inbox --help
# Compatibility shim fallback (only for native regressions):
# PATH="/data/tmp/cargo-target/release:$PATH" legacy/hooks/check_inbox.sh --help
# 5. Verify-live E2E artifacts exist (native path authoritative)
am e2e run --project . share_verify_live
ls tests/artifacts/share_verify_live/*/case_*/command.txt
ls tests/artifacts/share_verify_live/*/case_*/check_trace.jsonl
# If command unavailable in current binary, suite emits deterministic SKIP with reason
# 6. Golden denial fixtures exist
ls tests/fixtures/golden_snapshots/mcp_deny_*.txt
# At least 5 files (share, guard, doctor, archive, migrate)
# 7. Machine-readable gate report exists and is release-eligible
am ci --report tests/artifacts/ci/gate_report.json
jq '.decision, .release_eligible, .thresholds, (.gates | length)' tests/artifacts/ci/gate_report.json
# decision must be "go", release_eligible must be true, thresholds must have zero failed_gates-
Run the full CI suite (includes all dual-mode gates):
am ci --report tests/artifacts/ci/gate_report.json
Confirm gate report decision:
jq '.decision' tests/artifacts/ci/gate_report.json # "go" required for promotion
-
Or run individual gates:
cargo test --workspace cargo test -p mcp-agent-mail-conformance cargo test -p mcp-agent-mail-cli --test mode_matrix_harness cargo test -p mcp-agent-mail-cli --test semantic_conformance cargo test -p mcp-agent-mail-cli --test perf_security_regressions cargo test -p mcp-agent-mail-cli --test perf_guardrails cargo test -p mcp-agent-mail-cli --test help_snapshots am e2e run --project . dual_mode am e2e run --project . mode_matrix am e2e run --project . security_privacy am e2e run --project . tui_a11y
-
Manual smoke test:
# MCP denial gate works: mcp-agent-mail share 2>&1 # Should exit 2 with denial message # CLI accepts all commands: am share --help # Should exit 0 am doctor check --json # Should exit 0 with JSON output # Native deployment validation path: am share deploy verify-live https://example.github.io/agent-mail --bundle /tmp/agent-mail-bundle --json > /tmp/verify-live.json jq '.verdict, .summary' /tmp/verify-live.json # Cloudflare Pages tooling + validation path: am share deploy tooling /tmp/agent-mail-bundle test -f /tmp/agent-mail-bundle/.github/workflows/deploy-cf-pages.yml test -f /tmp/agent-mail-bundle/wrangler.toml.template am share deploy verify-live https://example.pages.dev --bundle /tmp/agent-mail-bundle --json > /tmp/verify-live-cf.json jq '.verdict, .summary' /tmp/verify-live-cf.json # MCP server starts: mcp-agent-mail serve --help # Should exit 0
-
Start
amand verify TUI:- TUI renders correctly (all 11 screens load)
- Keybindings respond (Tab, 1-8, ?, q)
- Command palette opens (Ctrl+P)
- System Health shows green status
-
Test headless mode:
mcp-agent-mail serve --no-tui & curl -s http://127.0.0.1:8765/mcp/ \ -H "Authorization: Bearer $HTTP_BEARER_TOKEN" \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' # Should return 34 tools
| Signal | How to check | Expected range | Action if abnormal |
|---|---|---|---|
| MCP denial rate | grep "not an MCP server command" <logs> |
0 from agents | Investigate agent config |
| CLI exit codes | Operator workflow logs | Exit 0 for all commands | Check migration guide |
| DB lock contention | resource://tooling/locks |
No increase from baseline | Check pool sizing |
| Tool latency p95 | resource://tooling/metrics |
Within baseline SLOs | Profile hot path |
| Disk usage growth | du -sh ~/.mcp_agent_mail/ |
Stable growth rate | Check archive retention |
| Error rate | Application logs | No new error classes | Triage by error type |
| Error class | Severity | Likely cause | Runbook action |
|---|---|---|---|
| Exit code 2 from agent sessions | High | Agent invoking CLI command on MCP binary | Fix agent config |
| "not an MCP server command" in MCP logs | Medium | Misrouted command | Check binary path in config |
| "database is locked" spike | Medium | Pool exhaustion under new load | Increase pool size |
| Panic/backtrace in denial stderr | Critical | Bug in denial gate | Activate kill-switch |
| CLI command succeeds on MCP binary | Critical | Denial gate bypass | Activate kill-switch immediately |
- Non-critical: File a bead, fix in next release
- Medium: Fix within 24 hours, deploy hotfix
- High/Critical: Execute kill-switch procedure
# Count denial gate hits in last hour
grep -c "not an MCP server command" /var/log/mcp-agent-mail/*.log
# Check for any panics
grep -i "panic\|backtrace" /var/log/mcp-agent-mail/*.log
# Tool latency percentiles (via MCP resource)
curl -s http://127.0.0.1:8765/mcp/ \
-d '{"jsonrpc":"2.0","id":1,"method":"resources/read","params":{"uri":"resource://tooling/metrics"}}' \
| jq '.result.contents[0].text | fromjson | .tools[] | {name, call_count, p95_ms}'
# Active locks
curl -s http://127.0.0.1:8765/mcp/ \
-d '{"jsonrpc":"2.0","id":1,"method":"resources/read","params":{"uri":"resource://tooling/locks"}}' \
| jq '.result.contents[0].text | fromjson | .summary'| Surface | Activation boundary | Kill-switch action | Primary owner | Secondary owner |
|---|---|---|---|---|
| MCP interface mode | AM_INTERFACE_MODE policy + binary separation |
Clear CLI mode env and redeploy MCP binary | Runtime owner | Release owner |
CLI workflows (am) |
CLI binary rollout ring | Roll back am to last-known-good release |
CLI owner | Runtime owner |
| TUI console | TUI_ENABLED=true and launch profile |
Restart with --no-tui |
TUI owner | Runtime owner |
| Static export pipeline | publish workflow gates | Disable publish jobs and hold exports | Docs/release owner | CLI owner |
| Build slots/worktrees | WORKTREES_ENABLED=true only after canary |
Set WORKTREES_ENABLED=false and restart |
Runtime owner | Storage owner |
| Local auth posture | bearer/JWT policy | Re-enable strict auth (HTTP_ALLOW_LOCALHOST_UNAUTHENTICATED=0) |
Security owner | Runtime owner |
| Milestone | Target timeline | Channel | Required payload |
|---|---|---|---|
| Incident acknowledged | <= 5 minutes | on-call chat | incident ID, suspected surface, owner |
| Kill-switch decision | <= 10 minutes | incident bridge | go/no-go with rationale |
| Rollback executed | <= 15 minutes | deployment channel | command/runbook step + env scope |
| Operator notice | <= 20 minutes | operator channel + thread | user impact + workaround |
| Evidence bundle posted | <= 60 minutes | bead thread + incident doc | logs, artifacts, reproduction |
Fill one row per phase promotion decision.
- Run full gates and emit report:
am ci --report tests/artifacts/ci/gate_report.json. - Confirm report fields:
decision == "go"andrelease_eligible == true. - Attach at least one artifact link per gate family (correctness, security/privacy, accessibility, performance, determinism).
- Record owner, UTC timestamp, and rationale in the ledger row for the phase transition.
- If any threshold fails, mark decision
no-go, document blocker bead IDs, and do not promote.
Run a non-quick gate report at least once every 24 hours during active rollout and within 12 hours before any promotion decision.
run_ts="$(date -u +%Y%m%d_%H%M%S)"
am ci --report "tests/artifacts/ci/${run_ts}/case_02_report.json"
jq '.decision, .release_eligible, .summary' "tests/artifacts/ci/${run_ts}/case_02_report.json"Latest non-quick artifact snapshot:
tests/artifacts/ci/20260213_031050/case_02_report.jsondecision="no-go",release_eligible=false,summary={total:13, pass:4, fail:9, skip:0}
Owner rotation (weekly, Monday 00:00 UTC handoff):
| Primary owner | Backup owner | Responsibility |
|---|---|---|
| Release owner | CI maintainer | Run non-quick report, update docs with latest artifact path + verdict, record ledger evidence links |
| CI maintainer | Agent integration lead | Verify report schema/completeness and flag stale artifact age (>24h) |
| Agent integration lead | On-call operator | Confirm rollout thread + bead updates reference latest artifact before promotion |
| Phase | Decision (go/no-go) |
Owner | UTC timestamp | Rationale | Evidence links |
|---|---|---|---|---|---|
| Phase 0 -> Phase 1 | |||||
| Phase 1 -> Phase 2 | |||||
| Phase 2 (25% -> 50%) | |||||
| Phase 2 (50% -> 100%) | |||||
| Phase 3 (GA confirmation) |
- Latest release-candidate gate artifact is non-quick and stored at
tests/artifacts/ci/20260213_031050/case_02_report.json - Current reference non-quick artifact reviewed:
tests/artifacts/ci/20260213_031050/case_02_report.json(mode=full, reviewed 2026-02-13T03:17Z) - Gate decision is
goandrelease_eligibleistrue - Projected-vs-observed summary is updated in
docs/ROLLOUT_PLAYBOOK.mdSection 9 - Follow-up bead set is reviewed and triaged:
br-3vwi.12.3.1(SearchScope compile blockers)br-3vwi.12.3.2(clippy rate-limiter lint blockers)br-3vwi.12.3.3(non-quick gate cadence + artifact publication)
- Owner and timestamp recorded in the sign-off ledger row for this phase