-
Notifications
You must be signed in to change notification settings - Fork 12
Description
π Executive Summary
gh-aw-firewall has reached an impressive Level 4/5 agentic workflow maturity with ~27 active workflows spanning security, testing, documentation, CI/CD, and code quality automation. The most impactful near-term improvements are: fixing 3 uncompiled workflows that are currently broken, adding a workflow health meta-agent (Audit Workflows) to observe the growing agent ecosystem, and adding an Issue Triage agent to handle the evident backlog of unlabeled issues. Given the repository's security-critical domain, there is also a strong case for a Breaking Change Checker to protect CLI users.
π Patterns Learned from Pelis Agent Factory
Key Patterns from the Documentation Site
The Pelis Agent Factory documents several workflow archetypes directly applicable here:
-
Continuous Quality β Agents like Code Simplifier and Duplicate Code Detector run daily against recent commits, proposing PRs that preserve function while improving clarity. This is distinct from periodic cleanup sprints.
-
Meta-Agents (Audit Workflows) β When running 20+ agents, a dedicated meta-agent that monitors agent runs, tracks costs, and surfaces failures becomes essential. The factory's Audit Workflows and Metrics Collector fill this role with 93 audit discussions created.
-
Fault Investigation with Proposals β CI Doctor doesn't just open issues; it analyzes logs, proposes fixes, and opens PRs directly (69% merge rate in the factory).
-
Skip-if-match Guard Rails β Workflows use
skip-if-matchto prevent flooding when a similar PR/issue already exists, avoiding duplication from repeated scheduling. -
Chained Workflows β Issue Monster feeds issues to Copilot coding agent; the coding agent creates PRs; Sub Issue Closer and Mergefest handle the aftermath. Coordination multiplies individual workflow value.
-
Cache-Memory for Persistence β The Issue Duplication Detector uses cache-memory to store issue signatures across runs, avoiding expensive full re-scans each time.
Patterns from the githubnext/agentics Repository
The agentics repo provides reference implementations for:
- daily-test-improver β identifies coverage gaps and implements new tests incrementally (directly comparable to this repo's test-coverage-improver)
- link-checker β finds and fixes broken links in documentation sites
- import-workflow β slash-command workflow for importing reference workflows into new repos
- daily-workflow-sync β keeps workflows synchronized with upstream templates
How This Repo Compares
This repository matches or exceeds the factory's implementation in:
- Security automation (red-team scanning, daily reviews, PR guard, dependency monitoring)
- Multi-engine testing (smoke tests for Claude, Codex, Copilot, and Chroot)
- Documentation maintenance (doc-maintainer, CLI flag checker)
This repository lags in:
- Workflow observability (no meta-agent monitoring other agents)
- Continuous code quality (no code simplifier or duplicate detector)
- Release automation (no changeset/version bump agent)
- Issue organization (no issue triage labels, no sub-issue arborist)
π Current Agentic Workflow Inventory
| Workflow | Purpose | Trigger | Assessment |
|---|---|---|---|
build-test-{bun,cpp,deno,go,java,node,rust} |
Test AWF works as firewall for 8 ecosystems | PR | β Strong β unique domain-specific coverage |
build-test-dotnet |
.NET ecosystem smoke test | PR | |
ci-cd-gaps-assessment |
Analyze CI/CD pipeline gaps | Daily | β Good β feeds continuous improvement |
ci-doctor |
Investigate CI failures, propose fixes | workflow_run |
|
cli-flag-consistency-checker |
Sync CLI flags vs docs | Weekly | β Good β relevant for CLI tool |
dependency-security-monitor |
Dependency vulnerability monitoring | Daily | β Strong β security-appropriate |
doc-maintainer |
Sync docs with code changes | Daily | β Good β essential for this complex tool |
issue-duplication-detector |
Flag duplicate issues | On issue open | β Good β uses cache-memory pattern correctly |
issue-monster |
Dispatch issues to Copilot coding agent | Hourly / on issue open | |
pelis-agent-factory-advisor |
Agentic maturity analysis | Daily | β This report |
plan |
/plan slash command for issue breakdown |
Slash command | β Good β useful ChatOps |
secret-digger-{claude,codex,copilot} |
Hourly red-team credential scanning | Hourly (3Γ) | β Excellent β domain-critical, multi-engine |
security-guard |
PR-level security review (Claude) | PR | β Strong β essential for security tool |
security-review |
Daily comprehensive threat modeling | Daily | β Strong β deep analysis |
smoke-{chroot,claude,codex,copilot} |
End-to-end firewall smoke tests | PR + Schedule | β Strong β multi-engine validation |
test-coverage-improver |
Improve test coverage for security-critical paths | Weekly | β Good β security-focused coverage |
update-release-notes |
Generate release notes on publish | On release | β Good β automates release ceremony |
π Actionable Recommendations
P0 β Implement Immediately
P0.1 β Fix 3 Uncompiled Workflows
What: Three workflows (ci-doctor, issue-monster, build-test-dotnet) show compiled: No in the workflow status. They are effectively disabled.
Why: CI Doctor is one of the most valuable workflows in the factory with a 69% PR merge rate. With 15+ open failure issues in the tracker, it should be working. Issue Monster is the dispatcher that feeds the entire Copilot coding agent pipeline.
How:
- Run
gh aw compile .github/workflows/ci-doctor.md - Run
gh aw compile .github/workflows/issue-monster.md - Run
gh aw compile .github/workflows/build-test-dotnet.md - Run
npx tsx scripts/ci/postprocess-smoke-workflows.tsafter any smoke-related changes.
Effort: Low (configuration/tooling fix, no new code)
P0.2 β Issue Triage Agent
What: Automatically label incoming issues (bug, feature, enhancement, documentation, question, help-wanted) and leave a friendly comment explaining the label.
Why: The open issues list shows many issues with no labels: [agentics] Secret Digger failed, firewall process takes 10sec to shutdown, CI failure issues. Manual triage is a bottleneck. This is the "hello world" of agentic automation per the factory and takes minutes to configure.
How: Add a new workflow triggered on issues: [opened, reopened]:
---
name: Issue Triage Agent
on:
issues:
types: [opened, reopened]
workflow_dispatch:
permissions:
issues: read
tools:
github:
toolsets: [issues, labels]
safe-outputs:
add-labels:
allowed: [bug, feature, enhancement, documentation, question, security, ci-failure, help-wanted, good-first-issue]
add-comment:
max: 1
timeout-minutes: 5
---
Triage new issues in $\{\{ github.repository }}. For issue #$\{\{ github.event.issue.number }},
analyze the title and body and add exactly one label from the allowed set.
Skip if the issue already has labels. After labeling, comment explaining why.Effort: Low β copy from factory template, customize allowed labels
P1 β Plan for Near-Term
P1.1 β Audit Workflows (Meta-Agent)
What: A daily workflow that scans all agentic workflow runs, tracks costs, identifies failures, and surfaces patterns across the 27-workflow ecosystem.
Why: The factory's Audit Workflows created 93 discussion reports and opened 9 issues, 4 of which led to downstream PRs. With 27 workflows this repo has passed the complexity threshold where a meta-observer becomes essential. Currently there's no way to know if the agents collectively are healthy, efficient, or regressing.
How:
---
name: Audit Workflows
description: Daily meta-agent that audits all workflow runs for errors, costs, and quality patterns
on:
schedule: daily
workflow_dispatch:
permissions:
contents: read
actions: read
tools:
agentic-workflows:
github:
toolsets: [default, actions]
cache-memory:
key: audit-workflows
safe-outputs:
create-discussion:
title-prefix: "[Audit] "
category: "general"
create-issue:
labels: [workflow-health]
max: 3
timeout-minutes: 20
---
Analyze the last 24h of agentic workflow runs. For each workflow:
- Report success/failure rate
- Flag workflows with 0 successful runs in 48h
- Identify unusually long runs (potential hangs)
- Track turn counts and estimate costs
- Create issues for workflows with consistent failuresEffort: Medium β needs the agentic-workflows MCP tool (already available) and cache-memory for trend tracking
P1.2 β Breaking Change Checker
What: A workflow that runs on PRs and/or daily to detect backward-incompatible changes to the CLI interface, API contracts, or configuration format.
Why: This is a distributed CLI tool installed in CI pipelines across organizations. Breaking changes to --allow-domains behavior, exit codes, or Docker Compose configuration can silently break hundreds of pipelines. The factory's Breaking Change Checker created alert issues like flagging CLI version incompatibilities. Especially critical before releases.
How:
---
name: Breaking Change Checker
on:
pull_request:
paths: ["src/cli.ts", "src/types.ts", "src/docker-manager.ts", "src/squid-config.ts"]
types: [opened, synchronize]
workflow_dispatch:
permissions:
contents: read
pull-requests: read
tools:
github:
toolsets: [default]
bash: ["git log:*", "cat:*", "grep:*"]
safe-outputs:
add-comment:
max: 1
create-issue:
labels: [breaking-change]
max: 1
timeout-minutes: 10
---
Analyze PR #$\{\{ github.event.pull_request.number }} for breaking changes to:
- CLI flags (removals, renames, behavior changes in src/cli.ts)
- Exit code semantics
- Docker Compose configuration format
- Domain whitelist pattern matching behavior
- Environment variable names/semantics
Alert with a comment if breaking changes are detected. Create an issue if critical.Effort: Medium β needs good domain knowledge baked into the prompt
P1.3 β Workflow Health Manager (ci-doctor Enhancement)
What: Add the missing workflows to ci-doctor's watch list and ensure ci-doctor is triggered for all new workflows added. Consider creating a dedicated Workflow Health Manager that monitors the health and activity of all agentic workflows.
Why: Currently ci-doctor watches ~26 workflows but build-test-dotnet (currently in the list) and other recently added workflows may be missing. Each time a new workflow is added, someone must manually update the ci-doctor watch list. A Workflow Health Manager automatically discovers all workflows.
How:
- Immediately: Verify ci-doctor's workflow list against actual workflows (add any missing)
- Near-term: Create a separate
workflow-health-manager.mdthat usesagenticworkflows-statusto detect unhealthy workflows without needing an explicit list
Effort: Low-Medium
P2 β Consider for Roadmap
P2.1 β Code Simplifier
What: A daily agent that analyzes recent commits for complexity and creates PRs proposing simplifications (early returns, extracted helpers, shorter expressions).
Why: The factory's Code Simplifier achieved 83% PR merge rate. TypeScript/Node.js is directly supported. src/docker-manager.ts (1500+ lines) and containers/agent/entrypoint.sh are prime candidates for simplification.
How: gh aw add-wizard https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/code-simplifier.md then customize for TypeScript.
Effort: Low (add wizard, customize)
P2.2 β Changeset Generator (Release Automation)
What: A workflow that analyzes commits since the last release, proposes a version bump (major/minor/patch based on conventional commits), and opens a PR with updated CHANGELOG.
Why: The factory's Changeset workflow achieved 78% merge rate (22/28 PRs merged). This repo already uses conventional commits via commitlint, making version bump analysis straightforward.
How: gh aw add-wizard https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/changeset.md β customize to trigger after a set of PRs merge to main.
Effort: Low-Medium
P2.3 β Documentation Link Checker
What: Weekly scan of docs-site/ for broken internal and external links, creating issues or PRs to fix them.
Why: The docs site at docs-site/src/content/docs/ contains ~20+ markdown pages with many external links. The agentics repo has a link-checker.md reference implementation. Broken links hurt user trust for a security-sensitive tool.
How: gh aw add-wizard githubnext/agentics/link-checker then configure for the docs-site directory.
Effort: Low (add wizard, configure paths)
P2.4 β PR Auto-Review Requester
What: When a PR is opened that touches security-critical files, automatically request review from the appropriate team/person.
Why: Security Guard and the smoke tests provide automated validation, but there's no automatic human review request for high-risk changes. Pairing automated security guard with human review assignment would close that gap.
Effort: Low β simple pull_request workflow with safe-outputs: request-review
P3 β Future Ideas
P3.1 β Container Base Image Security Monitor
What: Weekly check for new CVEs in the base images used (ubuntu/squid:latest, ubuntu:22.04). Create issues when critical CVEs are published against pinned base images.
Why: This is a firewall tool where container security is critical. Docker Hub and security advisories publish CVE data for base images.
Effort: Medium β needs container registry API access or GHSA database queries
P3.2 β Performance Benchmark Tracker
What: Monthly or per-release tracking of AWF startup time and container launch latency as a discussion report.
Why: Users have opened issue #1103 about shutdown taking 10sec. A performance tracker would catch regressions before they reach users.
Effort: Medium β needs integration test infra to measure timing
P3.3 β Duplicate Code Detector
What: Daily semantic analysis of recently modified TypeScript files for duplicate patterns that could be extracted as shared utilities.
Why: src/docker-manager.ts (1500+ lines) likely has extraction opportunities. The factory's version achieved 79% merge rate.
Effort: High β requires Serena integration or language server setup
π Maturity Assessment
| Dimension | Current | Notes |
|---|---|---|
| Security Automation | 5/5 | Red-team scanning, PR guard, daily reviews, dependency monitoring β best-in-class |
| Testing Automation | 4/5 | Multi-engine smoke tests, coverage improver; missing: perf benchmarks |
| Documentation Automation | 4/5 | Daily doc sync, CLI flag checker; missing: link checker |
| Code Quality Automation | 3/5 | Missing: code simplifier, duplicate detector |
| Issue/PR Management | 3/5 | Have: monster, duplication; missing: triage, arborist, mergefest |
| Workflow Observability | 2/5 | No meta-agent monitoring the 27 workflows |
| Release Automation | 3/5 | Have: release notes; missing: changeset/version bump |
| Overall | 4/5 | Advanced β top 5% of repositories for agentic workflow adoption |
Current Level: 4 β Advanced adopter with comprehensive automation across most dimensions.
Target Level: 4.5 β Close remaining gaps in observability and code quality; fix broken compilations.
Gap Analysis:
- Quick wins (< 1 day each): Fix 3 uncompiled workflows; add issue triage agent
- Medium effort (1-3 days each): Audit Workflows meta-agent; Breaking Change Checker
- Longer term: Code Simplifier; Changeset Generator; Performance Benchmarks
π Comparison with Best Practices
What This Repository Does Exceptionally Well
- Domain-specific smoke testing: Running 4 different AI engines (Claude, Codex, Copilot, Chroot) as smoke tests is beyond anything in the standard factory β genuinely innovative
- Red-team security: 3 concurrent hourly secret-digging agents across different engines is a unique security posture appropriate for a tool that handles AI agent credentials
- Security-domain specialization: Combining automated Security Guard (PR-level) + daily security review + dependency monitoring is a model other security-focused repos should emulate
- Import/shared patterns: The
shared/directory with imported fragments (mcp-pagination, version-reporting, secret-audit) demonstrates good workflow modularity
What Could Improve
- Workflow observability gap: Running 27 workflows without an Audit Workflows meta-agent means failures can go unnoticed unless someone checks manually. The factory considers this essential at this scale.
- Code quality automation absent: No code simplifier or duplicate detector β somewhat surprising given the size of
docker-manager.tsand other complex files - Compilation hygiene: 3 workflows in
compiled: Nostate suggests the compile step is sometimes forgotten after edits. Consider adding a CI check that fails if any.mdworkflow lacks a current.lock.yml
Unique Opportunities Given the Security Domain
- AWF could validate itself: A workflow that runs AWF against itself (meta-firewall test) would be a powerful integration test
- Domain allowlist regression testing: An agent that monitors whether recent PRs accidentally weakened domain filtering semantics
- Container hardening monitor: Track drift in container security configuration (capabilities, seccomp, network mode) across releases
π Notes for Future Runs
Tracked in cache-memory at /tmp/gh-aw/cache-memory/advisor-notes.md
Items to track over time:
- Were the 3 uncompiled workflows fixed? (ci-doctor, issue-monster, build-test-dotnet)
- Was Issue Triage Agent added?
- Was Audit Workflows meta-agent added?
- Were Breaking Change Checker or CI Coach added?
- Did issue backlog size decrease (currently 15+ open issues)?
Observed since last report (Feb 2026 β Mar 2026):
test-coverage-improverwas added (new since last cycle)security-reviewwas added (new since last cycle)- Open issues grew: multiple
[aw]build failures, smoke test failures visible in tracker - PRs feat(proxy): add token-based rate limiting via response parsingΒ #1066, feat(proxy): make Copilot API target configurable for enterprise environmentsΒ #1064 suggest active API proxy feature development
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by Pelis Agent Factory Advisor
- expires on Mar 8, 2026, 3:34 AM UTC