Skip to content

[Pelis Agent Factory Advisor] Agentic Workflow Maturity Report β€” Mar 2026Β #1111

@github-actions

Description

@github-actions

πŸ“Š Executive Summary

gh-aw-firewall has reached an impressive Level 4/5 agentic workflow maturity with ~27 active workflows spanning security, testing, documentation, CI/CD, and code quality automation. The most impactful near-term improvements are: fixing 3 uncompiled workflows that are currently broken, adding a workflow health meta-agent (Audit Workflows) to observe the growing agent ecosystem, and adding an Issue Triage agent to handle the evident backlog of unlabeled issues. Given the repository's security-critical domain, there is also a strong case for a Breaking Change Checker to protect CLI users.


πŸŽ“ Patterns Learned from Pelis Agent Factory

Key Patterns from the Documentation Site

The Pelis Agent Factory documents several workflow archetypes directly applicable here:

  1. Continuous Quality – Agents like Code Simplifier and Duplicate Code Detector run daily against recent commits, proposing PRs that preserve function while improving clarity. This is distinct from periodic cleanup sprints.

  2. Meta-Agents (Audit Workflows) – When running 20+ agents, a dedicated meta-agent that monitors agent runs, tracks costs, and surfaces failures becomes essential. The factory's Audit Workflows and Metrics Collector fill this role with 93 audit discussions created.

  3. Fault Investigation with Proposals – CI Doctor doesn't just open issues; it analyzes logs, proposes fixes, and opens PRs directly (69% merge rate in the factory).

  4. Skip-if-match Guard Rails – Workflows use skip-if-match to prevent flooding when a similar PR/issue already exists, avoiding duplication from repeated scheduling.

  5. Chained Workflows – Issue Monster feeds issues to Copilot coding agent; the coding agent creates PRs; Sub Issue Closer and Mergefest handle the aftermath. Coordination multiplies individual workflow value.

  6. Cache-Memory for Persistence – The Issue Duplication Detector uses cache-memory to store issue signatures across runs, avoiding expensive full re-scans each time.

Patterns from the githubnext/agentics Repository

The agentics repo provides reference implementations for:

  • daily-test-improver – identifies coverage gaps and implements new tests incrementally (directly comparable to this repo's test-coverage-improver)
  • link-checker – finds and fixes broken links in documentation sites
  • import-workflow – slash-command workflow for importing reference workflows into new repos
  • daily-workflow-sync – keeps workflows synchronized with upstream templates

How This Repo Compares

This repository matches or exceeds the factory's implementation in:

  • Security automation (red-team scanning, daily reviews, PR guard, dependency monitoring)
  • Multi-engine testing (smoke tests for Claude, Codex, Copilot, and Chroot)
  • Documentation maintenance (doc-maintainer, CLI flag checker)

This repository lags in:

  • Workflow observability (no meta-agent monitoring other agents)
  • Continuous code quality (no code simplifier or duplicate detector)
  • Release automation (no changeset/version bump agent)
  • Issue organization (no issue triage labels, no sub-issue arborist)

πŸ“‹ Current Agentic Workflow Inventory

Workflow Purpose Trigger Assessment
build-test-{bun,cpp,deno,go,java,node,rust} Test AWF works as firewall for 8 ecosystems PR βœ… Strong β€” unique domain-specific coverage
build-test-dotnet .NET ecosystem smoke test PR ⚠️ NOT COMPILED β€” broken
ci-cd-gaps-assessment Analyze CI/CD pipeline gaps Daily βœ… Good β€” feeds continuous improvement
ci-doctor Investigate CI failures, propose fixes workflow_run ⚠️ NOT COMPILED β€” broken; also misses some workflows in list
cli-flag-consistency-checker Sync CLI flags vs docs Weekly βœ… Good β€” relevant for CLI tool
dependency-security-monitor Dependency vulnerability monitoring Daily βœ… Strong β€” security-appropriate
doc-maintainer Sync docs with code changes Daily βœ… Good β€” essential for this complex tool
issue-duplication-detector Flag duplicate issues On issue open βœ… Good β€” uses cache-memory pattern correctly
issue-monster Dispatch issues to Copilot coding agent Hourly / on issue open ⚠️ NOT COMPILED β€” broken
pelis-agent-factory-advisor Agentic maturity analysis Daily βœ… This report
plan /plan slash command for issue breakdown Slash command βœ… Good β€” useful ChatOps
secret-digger-{claude,codex,copilot} Hourly red-team credential scanning Hourly (3Γ—) βœ… Excellent β€” domain-critical, multi-engine
security-guard PR-level security review (Claude) PR βœ… Strong β€” essential for security tool
security-review Daily comprehensive threat modeling Daily βœ… Strong β€” deep analysis
smoke-{chroot,claude,codex,copilot} End-to-end firewall smoke tests PR + Schedule βœ… Strong β€” multi-engine validation
test-coverage-improver Improve test coverage for security-critical paths Weekly βœ… Good β€” security-focused coverage
update-release-notes Generate release notes on publish On release βœ… Good β€” automates release ceremony

πŸš€ Actionable Recommendations


P0 β€” Implement Immediately

P0.1 β€” Fix 3 Uncompiled Workflows

What: Three workflows (ci-doctor, issue-monster, build-test-dotnet) show compiled: No in the workflow status. They are effectively disabled.

Why: CI Doctor is one of the most valuable workflows in the factory with a 69% PR merge rate. With 15+ open failure issues in the tracker, it should be working. Issue Monster is the dispatcher that feeds the entire Copilot coding agent pipeline.

How:

  1. Run gh aw compile .github/workflows/ci-doctor.md
  2. Run gh aw compile .github/workflows/issue-monster.md
  3. Run gh aw compile .github/workflows/build-test-dotnet.md
  4. Run npx tsx scripts/ci/postprocess-smoke-workflows.ts after any smoke-related changes.

Effort: Low (configuration/tooling fix, no new code)


P0.2 β€” Issue Triage Agent

What: Automatically label incoming issues (bug, feature, enhancement, documentation, question, help-wanted) and leave a friendly comment explaining the label.

Why: The open issues list shows many issues with no labels: [agentics] Secret Digger failed, firewall process takes 10sec to shutdown, CI failure issues. Manual triage is a bottleneck. This is the "hello world" of agentic automation per the factory and takes minutes to configure.

How: Add a new workflow triggered on issues: [opened, reopened]:

---
name: Issue Triage Agent
on:
  issues:
    types: [opened, reopened]
  workflow_dispatch:
permissions:
  issues: read
tools:
  github:
    toolsets: [issues, labels]
safe-outputs:
  add-labels:
    allowed: [bug, feature, enhancement, documentation, question, security, ci-failure, help-wanted, good-first-issue]
  add-comment:
    max: 1
timeout-minutes: 5
---

Triage new issues in $\{\{ github.repository }}. For issue #$\{\{ github.event.issue.number }},
analyze the title and body and add exactly one label from the allowed set.
Skip if the issue already has labels. After labeling, comment explaining why.

Effort: Low β€” copy from factory template, customize allowed labels


P1 β€” Plan for Near-Term

P1.1 β€” Audit Workflows (Meta-Agent)

What: A daily workflow that scans all agentic workflow runs, tracks costs, identifies failures, and surfaces patterns across the 27-workflow ecosystem.

Why: The factory's Audit Workflows created 93 discussion reports and opened 9 issues, 4 of which led to downstream PRs. With 27 workflows this repo has passed the complexity threshold where a meta-observer becomes essential. Currently there's no way to know if the agents collectively are healthy, efficient, or regressing.

How:

---
name: Audit Workflows
description: Daily meta-agent that audits all workflow runs for errors, costs, and quality patterns
on:
  schedule: daily
  workflow_dispatch:
permissions:
  contents: read
  actions: read
tools:
  agentic-workflows:
  github:
    toolsets: [default, actions]
  cache-memory:
    key: audit-workflows
safe-outputs:
  create-discussion:
    title-prefix: "[Audit] "
    category: "general"
  create-issue:
    labels: [workflow-health]
    max: 3
timeout-minutes: 20
---
Analyze the last 24h of agentic workflow runs. For each workflow:
- Report success/failure rate
- Flag workflows with 0 successful runs in 48h
- Identify unusually long runs (potential hangs)
- Track turn counts and estimate costs
- Create issues for workflows with consistent failures

Effort: Medium β€” needs the agentic-workflows MCP tool (already available) and cache-memory for trend tracking


P1.2 β€” Breaking Change Checker

What: A workflow that runs on PRs and/or daily to detect backward-incompatible changes to the CLI interface, API contracts, or configuration format.

Why: This is a distributed CLI tool installed in CI pipelines across organizations. Breaking changes to --allow-domains behavior, exit codes, or Docker Compose configuration can silently break hundreds of pipelines. The factory's Breaking Change Checker created alert issues like flagging CLI version incompatibilities. Especially critical before releases.

How:

---
name: Breaking Change Checker
on:
  pull_request:
    paths: ["src/cli.ts", "src/types.ts", "src/docker-manager.ts", "src/squid-config.ts"]
    types: [opened, synchronize]
  workflow_dispatch:
permissions:
  contents: read
  pull-requests: read
tools:
  github:
    toolsets: [default]
  bash: ["git log:*", "cat:*", "grep:*"]
safe-outputs:
  add-comment:
    max: 1
  create-issue:
    labels: [breaking-change]
    max: 1
timeout-minutes: 10
---
Analyze PR #$\{\{ github.event.pull_request.number }} for breaking changes to:
- CLI flags (removals, renames, behavior changes in src/cli.ts)
- Exit code semantics
- Docker Compose configuration format
- Domain whitelist pattern matching behavior
- Environment variable names/semantics
Alert with a comment if breaking changes are detected. Create an issue if critical.

Effort: Medium β€” needs good domain knowledge baked into the prompt


P1.3 β€” Workflow Health Manager (ci-doctor Enhancement)

What: Add the missing workflows to ci-doctor's watch list and ensure ci-doctor is triggered for all new workflows added. Consider creating a dedicated Workflow Health Manager that monitors the health and activity of all agentic workflows.

Why: Currently ci-doctor watches ~26 workflows but build-test-dotnet (currently in the list) and other recently added workflows may be missing. Each time a new workflow is added, someone must manually update the ci-doctor watch list. A Workflow Health Manager automatically discovers all workflows.

How:

  1. Immediately: Verify ci-doctor's workflow list against actual workflows (add any missing)
  2. Near-term: Create a separate workflow-health-manager.md that uses agenticworkflows-status to detect unhealthy workflows without needing an explicit list

Effort: Low-Medium


P2 β€” Consider for Roadmap

P2.1 β€” Code Simplifier

What: A daily agent that analyzes recent commits for complexity and creates PRs proposing simplifications (early returns, extracted helpers, shorter expressions).

Why: The factory's Code Simplifier achieved 83% PR merge rate. TypeScript/Node.js is directly supported. src/docker-manager.ts (1500+ lines) and containers/agent/entrypoint.sh are prime candidates for simplification.

How: gh aw add-wizard https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/code-simplifier.md then customize for TypeScript.

Effort: Low (add wizard, customize)


P2.2 β€” Changeset Generator (Release Automation)

What: A workflow that analyzes commits since the last release, proposes a version bump (major/minor/patch based on conventional commits), and opens a PR with updated CHANGELOG.

Why: The factory's Changeset workflow achieved 78% merge rate (22/28 PRs merged). This repo already uses conventional commits via commitlint, making version bump analysis straightforward.

How: gh aw add-wizard https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/changeset.md β€” customize to trigger after a set of PRs merge to main.

Effort: Low-Medium


P2.3 β€” Documentation Link Checker

What: Weekly scan of docs-site/ for broken internal and external links, creating issues or PRs to fix them.

Why: The docs site at docs-site/src/content/docs/ contains ~20+ markdown pages with many external links. The agentics repo has a link-checker.md reference implementation. Broken links hurt user trust for a security-sensitive tool.

How: gh aw add-wizard githubnext/agentics/link-checker then configure for the docs-site directory.

Effort: Low (add wizard, configure paths)


P2.4 β€” PR Auto-Review Requester

What: When a PR is opened that touches security-critical files, automatically request review from the appropriate team/person.

Why: Security Guard and the smoke tests provide automated validation, but there's no automatic human review request for high-risk changes. Pairing automated security guard with human review assignment would close that gap.

Effort: Low β€” simple pull_request workflow with safe-outputs: request-review


P3 β€” Future Ideas

P3.1 β€” Container Base Image Security Monitor

What: Weekly check for new CVEs in the base images used (ubuntu/squid:latest, ubuntu:22.04). Create issues when critical CVEs are published against pinned base images.

Why: This is a firewall tool where container security is critical. Docker Hub and security advisories publish CVE data for base images.

Effort: Medium β€” needs container registry API access or GHSA database queries


P3.2 β€” Performance Benchmark Tracker

What: Monthly or per-release tracking of AWF startup time and container launch latency as a discussion report.

Why: Users have opened issue #1103 about shutdown taking 10sec. A performance tracker would catch regressions before they reach users.

Effort: Medium β€” needs integration test infra to measure timing


P3.3 β€” Duplicate Code Detector

What: Daily semantic analysis of recently modified TypeScript files for duplicate patterns that could be extracted as shared utilities.

Why: src/docker-manager.ts (1500+ lines) likely has extraction opportunities. The factory's version achieved 79% merge rate.

Effort: High β€” requires Serena integration or language server setup


πŸ“ˆ Maturity Assessment

Dimension Current Notes
Security Automation 5/5 Red-team scanning, PR guard, daily reviews, dependency monitoring β€” best-in-class
Testing Automation 4/5 Multi-engine smoke tests, coverage improver; missing: perf benchmarks
Documentation Automation 4/5 Daily doc sync, CLI flag checker; missing: link checker
Code Quality Automation 3/5 Missing: code simplifier, duplicate detector
Issue/PR Management 3/5 Have: monster, duplication; missing: triage, arborist, mergefest
Workflow Observability 2/5 No meta-agent monitoring the 27 workflows
Release Automation 3/5 Have: release notes; missing: changeset/version bump
Overall 4/5 Advanced β€” top 5% of repositories for agentic workflow adoption

Current Level: 4 β€” Advanced adopter with comprehensive automation across most dimensions.

Target Level: 4.5 β€” Close remaining gaps in observability and code quality; fix broken compilations.

Gap Analysis:

  1. Quick wins (< 1 day each): Fix 3 uncompiled workflows; add issue triage agent
  2. Medium effort (1-3 days each): Audit Workflows meta-agent; Breaking Change Checker
  3. Longer term: Code Simplifier; Changeset Generator; Performance Benchmarks

πŸ”„ Comparison with Best Practices

What This Repository Does Exceptionally Well

  • Domain-specific smoke testing: Running 4 different AI engines (Claude, Codex, Copilot, Chroot) as smoke tests is beyond anything in the standard factory β€” genuinely innovative
  • Red-team security: 3 concurrent hourly secret-digging agents across different engines is a unique security posture appropriate for a tool that handles AI agent credentials
  • Security-domain specialization: Combining automated Security Guard (PR-level) + daily security review + dependency monitoring is a model other security-focused repos should emulate
  • Import/shared patterns: The shared/ directory with imported fragments (mcp-pagination, version-reporting, secret-audit) demonstrates good workflow modularity

What Could Improve

  • Workflow observability gap: Running 27 workflows without an Audit Workflows meta-agent means failures can go unnoticed unless someone checks manually. The factory considers this essential at this scale.
  • Code quality automation absent: No code simplifier or duplicate detector β€” somewhat surprising given the size of docker-manager.ts and other complex files
  • Compilation hygiene: 3 workflows in compiled: No state suggests the compile step is sometimes forgotten after edits. Consider adding a CI check that fails if any .md workflow lacks a current .lock.yml

Unique Opportunities Given the Security Domain

  1. AWF could validate itself: A workflow that runs AWF against itself (meta-firewall test) would be a powerful integration test
  2. Domain allowlist regression testing: An agent that monitors whether recent PRs accidentally weakened domain filtering semantics
  3. Container hardening monitor: Track drift in container security configuration (capabilities, seccomp, network mode) across releases

πŸ“ Notes for Future Runs

Tracked in cache-memory at /tmp/gh-aw/cache-memory/advisor-notes.md

Items to track over time:

  • Were the 3 uncompiled workflows fixed? (ci-doctor, issue-monster, build-test-dotnet)
  • Was Issue Triage Agent added?
  • Was Audit Workflows meta-agent added?
  • Were Breaking Change Checker or CI Coach added?
  • Did issue backlog size decrease (currently 15+ open issues)?

Observed since last report (Feb 2026 β†’ Mar 2026):


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by Pelis Agent Factory Advisor

  • expires on Mar 8, 2026, 3:34 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions