Agent Performance Report — February 22, 2026

### Performance Summary

- **Agents analyzed:** 26 distinct workflows (40 total runs, 48-hour window)
- **Non-IM success rate:** 97% (30/31) ↑ from 89% last period
- **Overall quality score:** 92/100 (→ stable, 20th consecutive zero-critical-issues period 🎉)
- **Overall effectiveness score:** 88/100 (→ stable)
- **Total tokens:** 36.6M | **Estimated cost:** ~$16.21
- **Total safe items:** 6 (↓ from 14 — fewer actionable findings this period)
- **Critical issues:** 0 (excluding P1 infrastructure)
- **Top performers:** The Great Escapi, AI Moderator, CI Failure Doctor
- **P1 ongoing:** Issue Monster (9/9 failures — infrastructure, not quality)

---

### Critical Findings

**⚠️ [P1] Issue Monster — 100% Failure Rate (9/9 runs)**

`GH_AW_GITHUB_TOKEN` secret remains unset. Issue Monster fails on every scheduled run (~30-min cadence), generating ~50+ failures/day. This is a **pure infrastructure failure** — the agent code and prompt are fine. Tracking issue: [#17414](https://github.com/github/gh-aw/issues/17414) (open since Feb 21).

- **Impact:** Inflates error statistics; skews overall success metrics
- **Fix:** Set `GH_AW_GITHUB_TOKEN` repository secret
- **Priority:** P1 — unchanged from previous periods

**🛡️ Prompt Injection Attack — Detected and Blocked**

The Great Escapi detected another injection attempt disguised as "security testing" (sandbox escape, DNS tunneling, network evasion, reconnaissance instructions). Agent correctly filed a noop and took no action. Security posture remains excellent.

**🔧 CI Failure Doctor — 4 Reactive Runs in 48 Hours**

CI Failure Doctor ran 4 times in 48 hours (compared to 5 in ~7 hours yesterday). The high reactive cadence suggests ongoing CI instability. While the agent is performing well (4/4 success), the underlying CI flakiness warrants attention.

---

<details>
<summary>View Agent Rankings & Detailed Scores</summary>

### Top Performing Agents 🏆

| Rank | Agent | Quality | Effectiveness | Runs | Turns/run | Notes |
|------|-------|---------|---------------|------|-----------|-------|
| 1 | **The Great Escapi** | 95/100 | 95/100 | 1 | 0 (noop) | Blocked prompt injection; security posture excellent |
| 2 | **AI Moderator** | 93/100 | 93/100 | 3 | 2 | 3/3 success, highest efficiency (~200K tokens/run, Codex) |
| 3 | **CI Failure Doctor** | 91/100 | 90/100 | 4 | ~5 | 4/4 success, reactive CI health responder |
| 4 | **Daily Safe Outputs Conformance Checker** | 90/100 | 89/100 | 1 | — | 8.6m, clean run, Claude |
| 5 | **Contribution Check** | 89/100 | 88/100 | 1 | — | 4.5m, clean, Copilot |
| 6 | **Semantic Function Refactoring** | 87/100 | 86/100 | 1 | — | 7.4m, Claude |
| 7 | **Smoke suite** (×5) | 88/100 | 88/100 | 5 | — | All pass: Copilot, Claude, Gemini, Project, Temp ID |

### Agents Needing Improvement 📉

| Agent | Quality | Effectiveness | Issue |
|-------|---------|---------------|-------|
| **Issue Monster** | N/A | 0/100 (infra) | 9/9 failures — GH_AW_GITHUB_TOKEN missing (#17414) |

### Long-Running Agents (Monitor Efficiency)

| Agent | Duration | Engine | Notes |
|-------|----------|--------|-------|
| Chroma Issue Indexer | 19.4m | Copilot | Longest run this period — benchmark for regression |
| Daily Security Red Team Agent | 14.0m | Claude | Expected for deep analysis |
| Daily Safe Output Tool Optimizer | 11.1m | — | Acceptable for optimizer |
| Release | 11.1m | — | Expected |

</details>

<details>
<summary>View Effectiveness & Resource Metrics</summary>

### Task Completion Rates (non-IM)

- **High completion (>90%):** 25 workflows — all non-IM agents succeeded
- **Low completion (<50%):** 1 workflow — Issue Monster (infrastructure failure only)

### Resource Efficiency (48h window)

| Metric | Value |
|--------|-------|
| Total tokens | 36.6M |
| Estimated cost | $16.21 |
| Total turns | 304 |
| Avg run duration (non-IM) | 7.1m |
| Max run duration | 19.4m (Chroma Issue Indexer) |
| Safe items produced | 6 |
| Safe items/run | 0.19 |

### Engine Distribution

| Engine | Runs | Notes |
|--------|------|-------|
| Copilot | ~11 | Issue Monster, Chroma, Workflow Skill Extractor, Plan, etc. |
| Claude | ~8 | Semantic Refactoring, Daily Checkers, Security Red Team, etc. |
| Codex | ~4 | AI Moderator (3), Agent Container Smoke |
| Mixed/smoke | ~5 | Smoke suite |

</details>

<details>
<summary>View Behavioral Patterns</summary>

### Productive Patterns ✅

- **Security reflexes:** The Great Escapi correctly noop'd prompt injection without false positives (2 consecutive periods)
- **Reactive CI healing:** CI Failure Doctor triggers cleanly on CI failures with high success rate
- **Event-driven efficiency:** AI Moderator processes issue events in 2 turns with minimal footprint
- **Smoke coverage:** All 5 engine smoke tests passed (Copilot, Claude, Gemini, Project, Temp ID)

### Patterns to Watch ⚠️

- **Issue Monster volume:** 9 failures/48h generating noise in error aggregates — skews ecosystem metrics
- **Chroma duration creep:** 19.4m is within acceptable range but should be monitored for upward drift
- **CI reactive frequency:** 4 CI Failure Doctor runs in 48h suggests CI is not stable — root cause may lie outside agent ecosystem

### Collaboration Patterns

- Workflow Health Manager and Agent Performance Analyzer coordination is effective via shared-alerts.md
- No conflicting outputs detected between orchestrators this period
- Safe item volume reduction (6 vs 14) may indicate agents are correctly finding fewer actionable items (healthy) rather than reduced coverage

</details>

---

### Recommendations

#### High Priority

1. **[P1] Set `GH_AW_GITHUB_TOKEN` secret** — Resolves Issue Monster failures entirely
 - Issue [#17414](https://github.com/github/gh-aw/issues/17414) open — escalate to repo admin
 - Impact: ~50 fewer daily error logs; success rate jumps from 77% to 97%+

2. **Investigate CI Instability** — CI Failure Doctor running 4×/48h indicates systemic flakiness
 - Review CI workflow failure patterns to find root cause
 - Consider whether flakiness is increasing week-over-week

#### Medium Priority

3. **Benchmark Chroma Issue Indexer** — 19.4m is the longest run; set a regression threshold (e.g., alert if >25m)
4. **Monitor safe item volume trend** — 6 items this period vs. 14 last period; if trend continues, assess whether agent coverage is drifting

#### Low Priority

5. **Document prompt injection detection pattern** — The Great Escapi's clean behavior is a model for other security-adjacent agents

---

### Trends (6-period history)

| Period | Quality | Effectiveness | Success Rate | Critical Issues |
|--------|---------|---------------|--------------|-----------------|
| Feb 22 | 92/100 | 88/100 | 97% (non-IM) | 0 ✅ |
| Feb 21 | 92/100 | 88/100 | 89% | 0 ✅ |
| (prior) | 91/100 | 85/100 | 71% | 0 ✅ |

Overall trend: **stable quality, recovering success rate, persistent P1 infrastructure issue**

---

### Actions Taken This Run

- ✅ Analyzed 40 workflow runs across 26 agents (48h window)
- ✅ Verified P1 status — Issue Monster 9/9 failures, #17414 still open
- ✅ Confirmed The Great Escapi blocked prompt injection (2nd confirmed detection)
- ✅ Updated `agent-performance-latest.md` in shared memory
- ✅ Updated `shared-alerts.md` with current period status
- ℹ️ No new improvement issues created (no new quality failures detected)

---

> **Analysis period:** February 21–22, 2026 (48-hour window, 40 runs)
> **Next report:** February 23, 2026
> **References:** [§22281821807](https://github.com/github/gh-aw/actions/runs/22281821807) · [§22281624073](https://github.com/github/gh-aw/actions/runs/22281624073) · [§22281571358](https://github.com/github/gh-aw/actions/runs/22281571358)

---

> **Note:** This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
>
> **Tip:** Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.




> Generated by [Agent Performance Analyzer - Meta-Orchestrator](https://github.com/github/gh-aw/actions/runs/22281821807)
> - [x] expires  on Feb 23, 2026, 5:35 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — February 22, 2026 #17764

Performance Summary

Critical Findings

Top Performing Agents 🏆

Agents Needing Improvement 📉

Long-Running Agents (Monitor Efficiency)

Task Completion Rates (non-IM)

Resource Efficiency (48h window)

Engine Distribution

Productive Patterns ✅

Patterns to Watch ⚠️

Collaboration Patterns

Recommendations

High Priority

Medium Priority

Low Priority

Trends (6-period history)

Actions Taken This Run

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rank	Agent	Quality	Effectiveness	Runs	Turns/run	Notes
1	The Great Escapi	95/100	95/100	1	0 (noop)	Blocked prompt injection; security posture excellent
2	AI Moderator	93/100	93/100	3	2	3/3 success, highest efficiency (~200K tokens/run, Codex)
3	CI Failure Doctor	91/100	90/100	4	~5	4/4 success, reactive CI health responder
4	Daily Safe Outputs Conformance Checker	90/100	89/100	1	—	8.6m, clean run, Claude
5	Contribution Check	89/100	88/100	1	—	4.5m, clean, Copilot
6	Semantic Function Refactoring	87/100	86/100	1	—	7.4m, Claude
7	Smoke suite (×5)	88/100	88/100	5	—	All pass: Copilot, Claude, Gemini, Project, Temp ID

Agent	Duration	Engine	Notes
Chroma Issue Indexer	19.4m	Copilot	Longest run this period — benchmark for regression
Daily Security Red Team Agent	14.0m	Claude	Expected for deep analysis
Daily Safe Output Tool Optimizer	11.1m	—	Acceptable for optimizer
Release	11.1m	—	Expected

Metric	Value
Total tokens	36.6M
Estimated cost	$16.21
Total turns	304
Avg run duration (non-IM)	7.1m
Max run duration	19.4m (Chroma Issue Indexer)
Safe items produced	6
Safe items/run	0.19

Engine	Runs	Notes
Copilot	~11	Issue Monster, Chroma, Workflow Skill Extractor, Plan, etc.
Claude	~8	Semantic Refactoring, Daily Checkers, Security Red Team, etc.
Codex	~4	AI Moderator (3), Agent Container Smoke
Mixed/smoke	~5	Smoke suite

Period	Quality	Effectiveness	Success Rate
Feb 22	92/100	88/100	97% (non-IM)
Feb 21	92/100	88/100	89%
(prior)	91/100	85/100	71%

Agent Performance Report — February 22, 2026 #17764

Description

Performance Summary

Critical Findings

Top Performing Agents 🏆

Agents Needing Improvement 📉

Long-Running Agents (Monitor Efficiency)

Task Completion Rates (non-IM)

Resource Efficiency (48h window)

Engine Distribution

Productive Patterns ✅

Patterns to Watch ⚠️

Collaboration Patterns

Recommendations

High Priority

Medium Priority

Low Priority

Trends (6-period history)

Actions Taken This Run

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions