-
Notifications
You must be signed in to change notification settings - Fork 247
Description
Performance Summary
- Agents analyzed: 26 distinct workflows (40 total runs, 48-hour window)
- Non-IM success rate: 97% (30/31) ↑ from 89% last period
- Overall quality score: 92/100 (→ stable, 20th consecutive zero-critical-issues period 🎉)
- Overall effectiveness score: 88/100 (→ stable)
- Total tokens: 36.6M | Estimated cost: ~$16.21
- Total safe items: 6 (↓ from 14 — fewer actionable findings this period)
- Critical issues: 0 (excluding P1 infrastructure)
- Top performers: The Great Escapi, AI Moderator, CI Failure Doctor
- P1 ongoing: Issue Monster (9/9 failures — infrastructure, not quality)
Critical Findings
GH_AW_GITHUB_TOKEN secret remains unset. Issue Monster fails on every scheduled run (~30-min cadence), generating ~50+ failures/day. This is a pure infrastructure failure — the agent code and prompt are fine. Tracking issue: #17414 (open since Feb 21).
- Impact: Inflates error statistics; skews overall success metrics
- Fix: Set
GH_AW_GITHUB_TOKENrepository secret - Priority: P1 — unchanged from previous periods
🛡️ Prompt Injection Attack — Detected and Blocked
The Great Escapi detected another injection attempt disguised as "security testing" (sandbox escape, DNS tunneling, network evasion, reconnaissance instructions). Agent correctly filed a noop and took no action. Security posture remains excellent.
🔧 CI Failure Doctor — 4 Reactive Runs in 48 Hours
CI Failure Doctor ran 4 times in 48 hours (compared to 5 in ~7 hours yesterday). The high reactive cadence suggests ongoing CI instability. While the agent is performing well (4/4 success), the underlying CI flakiness warrants attention.
View Agent Rankings & Detailed Scores
Top Performing Agents 🏆
| Rank | Agent | Quality | Effectiveness | Runs | Turns/run | Notes |
|---|---|---|---|---|---|---|
| 1 | The Great Escapi | 95/100 | 95/100 | 1 | 0 (noop) | Blocked prompt injection; security posture excellent |
| 2 | AI Moderator | 93/100 | 93/100 | 3 | 2 | 3/3 success, highest efficiency (~200K tokens/run, Codex) |
| 3 | CI Failure Doctor | 91/100 | 90/100 | 4 | ~5 | 4/4 success, reactive CI health responder |
| 4 | Daily Safe Outputs Conformance Checker | 90/100 | 89/100 | 1 | — | 8.6m, clean run, Claude |
| 5 | Contribution Check | 89/100 | 88/100 | 1 | — | 4.5m, clean, Copilot |
| 6 | Semantic Function Refactoring | 87/100 | 86/100 | 1 | — | 7.4m, Claude |
| 7 | Smoke suite (×5) | 88/100 | 88/100 | 5 | — | All pass: Copilot, Claude, Gemini, Project, Temp ID |
Agents Needing Improvement 📉
| Agent | Quality | Effectiveness | Issue |
|---|---|---|---|
| Issue Monster | N/A | 0/100 (infra) | 9/9 failures — GH_AW_GITHUB_TOKEN missing (#17414) |
Long-Running Agents (Monitor Efficiency)
| Agent | Duration | Engine | Notes |
|---|---|---|---|
| Chroma Issue Indexer | 19.4m | Copilot | Longest run this period — benchmark for regression |
| Daily Security Red Team Agent | 14.0m | Claude | Expected for deep analysis |
| Daily Safe Output Tool Optimizer | 11.1m | — | Acceptable for optimizer |
| Release | 11.1m | — | Expected |
View Effectiveness & Resource Metrics
Task Completion Rates (non-IM)
- High completion (>90%): 25 workflows — all non-IM agents succeeded
- Low completion (<50%): 1 workflow — Issue Monster (infrastructure failure only)
Resource Efficiency (48h window)
| Metric | Value |
|---|---|
| Total tokens | 36.6M |
| Estimated cost | $16.21 |
| Total turns | 304 |
| Avg run duration (non-IM) | 7.1m |
| Max run duration | 19.4m (Chroma Issue Indexer) |
| Safe items produced | 6 |
| Safe items/run | 0.19 |
Engine Distribution
| Engine | Runs | Notes |
|---|---|---|
| Copilot | ~11 | Issue Monster, Chroma, Workflow Skill Extractor, Plan, etc. |
| Claude | ~8 | Semantic Refactoring, Daily Checkers, Security Red Team, etc. |
| Codex | ~4 | AI Moderator (3), Agent Container Smoke |
| Mixed/smoke | ~5 | Smoke suite |
View Behavioral Patterns
Productive Patterns ✅
- Security reflexes: The Great Escapi correctly noop'd prompt injection without false positives (2 consecutive periods)
- Reactive CI healing: CI Failure Doctor triggers cleanly on CI failures with high success rate
- Event-driven efficiency: AI Moderator processes issue events in 2 turns with minimal footprint
- Smoke coverage: All 5 engine smoke tests passed (Copilot, Claude, Gemini, Project, Temp ID)
Patterns to Watch ⚠️
- Issue Monster volume: 9 failures/48h generating noise in error aggregates — skews ecosystem metrics
- Chroma duration creep: 19.4m is within acceptable range but should be monitored for upward drift
- CI reactive frequency: 4 CI Failure Doctor runs in 48h suggests CI is not stable — root cause may lie outside agent ecosystem
Collaboration Patterns
- Workflow Health Manager and Agent Performance Analyzer coordination is effective via shared-alerts.md
- No conflicting outputs detected between orchestrators this period
- Safe item volume reduction (6 vs 14) may indicate agents are correctly finding fewer actionable items (healthy) rather than reduced coverage
Recommendations
High Priority
-
[P1] Set
GH_AW_GITHUB_TOKENsecret — Resolves Issue Monster failures entirely- Issue #17414 open — escalate to repo admin
- Impact: ~50 fewer daily error logs; success rate jumps from 77% to 97%+
-
Investigate CI Instability — CI Failure Doctor running 4×/48h indicates systemic flakiness
- Review CI workflow failure patterns to find root cause
- Consider whether flakiness is increasing week-over-week
Medium Priority
- Benchmark Chroma Issue Indexer — 19.4m is the longest run; set a regression threshold (e.g., alert if >25m)
- Monitor safe item volume trend — 6 items this period vs. 14 last period; if trend continues, assess whether agent coverage is drifting
Low Priority
- Document prompt injection detection pattern — The Great Escapi's clean behavior is a model for other security-adjacent agents
Trends (6-period history)
| Period | Quality | Effectiveness | Success Rate | Critical Issues |
|---|---|---|---|---|
| Feb 22 | 92/100 | 88/100 | 97% (non-IM) | 0 ✅ |
| Feb 21 | 92/100 | 88/100 | 89% | 0 ✅ |
| (prior) | 91/100 | 85/100 | 71% | 0 ✅ |
Overall trend: stable quality, recovering success rate, persistent P1 infrastructure issue
Actions Taken This Run
- ✅ Analyzed 40 workflow runs across 26 agents (48h window)
- ✅ Verified P1 status — Issue Monster 9/9 failures, [P1] Lockdown mode failing: GH_AW_GITHUB_TOKEN not configured — 5 workflows affected #17414 still open
- ✅ Confirmed The Great Escapi blocked prompt injection (2nd confirmed detection)
- ✅ Updated
agent-performance-latest.mdin shared memory - ✅ Updated
shared-alerts.mdwith current period status - ℹ️ No new improvement issues created (no new quality failures detected)
Analysis period: February 21–22, 2026 (48-hour window, 40 runs)
Next report: February 23, 2026
References: §22281821807 · §22281624073 · §22281571358
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by Agent Performance Analyzer - Meta-Orchestrator
- expires on Feb 23, 2026, 5:35 PM UTC