[copilot-token-optimizer] Token Optimizer: Q workflow — runaway 103-turn failures drive 75% of daily spend

### 🔍 Optimization Target: Q — Agentic Workflow Optimizer

**Selected because**: Highest-token consumer (9.95M tokens today) not recently optimized  
**Analysis period**: 2026-04-07 → 2026-04-12 (4 snapshot days)  
**Runs analyzed**: 11 runs across 4 days

---

### 📊 Token Usage Profile

| Metric | Value |
|---|---|
| Total tokens (snapshot period) | 34,945,652 |
| Avg tokens/run | 3,176,877 |
| Avg turns/run (weighted) | 46.2 |
| Failure rate | 45% (5/11 runs) |
| Cache efficiency (per-turn) | ~96% cache-read hits |
| Cache write tokens | 0 across all 3 audited runs |
| Successful run avg duration | 4.4 min (266s) |
| Failed runaway run duration | 11.8 min (709s) |

> Cache write tokens = 0 indicates the system prompt and tool schemas are already fully warm in cache — per-turn uncached cost is ~2–3K tokens/turn, which is healthy.

---

### 🔧 Recommendations

#### 1. Reduce `timeout-minutes` from 15 to 10 — Est. savings: ~1.4M tokens/runaway run

The single most damaging observed run used 103 turns over 11.8 minutes before failing (9.1M tokens). The most recent successful run completed in just 39 turns / 4.4 minutes. The 15-minute timeout allows the agent to run ~2.7× longer than a successful run costs.

At the observed rate of ~6.9s/turn, a 10-minute timeout caps at ~87 turns — still 2× a typical success run, but eliminates the final ~24 turns of the runaway, saving approximately 1.4M tokens on that single run.

**Evidence across all snapshots**:
- 2026-04-10: avg_turns = 68.7 (vs 46.2 overall) with 2/3 runs failing  
- 2026-04-12: one success (39 turns, 4.4 min), one failure (103 turns, 11.8 min), one quick failure (2 turns)

**Action**: Change in `q.md` frontmatter:
```yaml
timeout-minutes: 10  # was: 15; successful runs complete in ~4-5 min
```

> Note: `max-turns` is not supported for the Copilot engine. `timeout-minutes` is the correct control knob.

---

#### 2. Remove Serena (Go LSP) import — Est. savings: ~5% per-turn token reduction

The `imports: [shared/mcp/serena-go.md]` adds a full Serena LSP MCP server — designed for **Go semantic analysis** (`.go` files in `pkg/`). But Q's purpose is to optimize **agentic workflows** — it modifies `.md` workflow files, not Go source code. The serena-go.md import itself explicitly states:

> *"Only analyze `.go` files — Ignore all other file types"*  
> *"Focus on `pkg/` directory"*

Q uses `edit`, `bash`, and `github` tools to read and modify `.md` workflow files. Serena's LSP capabilities (go-to-definition, find-references, type inference) are not applicable to YAML/Markdown workflows.

Removing Serena eliminates one MCP server container and its tool schemas from every turn's context window, reducing per-turn input size across all 39–103 turns in a run.

**Estimated savings**: ~5K tokens/turn × 46 avg turns × 11 runs = ~2.5M tokens over the analysis period, plus eliminated MCP server startup overhead.

**Action**: Remove the Serena import from `q.md`:
````yaml
# Remove this line:
imports:
  - shared/mcp/serena-go.md
```

If Q ever needs code analysis, it can use `bash` with `grep`/`glob` for `.md` workflow files.

---

#### 3. Add investigation depth guardrails in the prompt — Est. savings: variable

The Phase 1 prompt instructs downloading 10–20 runs of logs. Each full-run JSON log can be large — this content becomes part of the conversation context across subsequent turns. Constraining log downloads prevents context bloat.

**Action**: In the Phase 1 prompt, tighten the log download instruction:
```
- Count: 5 recent runs (reduced from 10-20)
- After analysis, summarize findings in <250 words before proceeding to Phase 2
````

This reduces input token growth per turn as the conversation lengthens.

---

#### 4. Review `discussions` toolset necessity — Est. savings: minor

The GitHub MCP is configured with `default + actions + discussions` toolsets. Q does handle discussion triggers (the prompt has `{{#if discussion.number}}` branches), so the `discussions` toolset is justified. However, if telemetry shows Q is rarely triggered from discussions, removing it saves tool schema tokens on every invocation.

**Action**: Monitor discussion-trigger rate. If <10% of runs originate from discussions, remove the `discussions` toolset and let operators add it back if needed.

---

<details>
<summary><b>Tool Usage Matrix</b></summary>

| Tool / Server | Configured | Evidence of Need | Recommendation |
|---|---|---|---|
| `github` (default toolset) | ✅ | Issues/PR access for context | **Keep** |
| `github` (actions toolset) | ✅ | Phase 1 log download via gh-aw MCP | **Keep** |
| `github` (discussions toolset) | ✅ | Discussion trigger handler in prompt | **Keep** (review if rarely triggered) |
| `agentic-workflows` MCP | ✅ | Core tool for logs/audit/compile | **Keep** |
| `edit` | ✅ | Writing workflow .md changes | **Keep** |
| `bash` | ✅ | Script execution in investigation | **Keep** |
| `cache-memory` | ✅ | Pattern storage across invocations | **Keep** |
| Serena (Go LSP) | ✅ configured | Not needed — Q modifies `.md` not `.go` | **Remove** |

</details>

<details>
<summary><b>Audited Runs Detail (2026-04-12)</b></summary>

| Run | Created | Turns | Tokens | Cache Read | Input/Output Ratio | Conclusion |
|---|---|---|---|---|---|---|
| 1 | 2026-04-12 10:24 | 2 | 92,874 | 46,103 | 99.7% / 0.3% | failure (quick) |
| 2 | 2026-04-12 16:39 | 103 | 9,117,621 | 8,911,025 | 99.6% / 0.4% | failure (runaway) |
| 3 | 2026-04-12 18:38 | 39 | 2,896,536 | 2,773,017 | 99.6% / 0.4% | success |

**Key observation**: Run 2 (103 turns) vs Run 3 (39 turns) — both at same model and similar per-turn cost (~88K vs ~74K tokens/turn) — differ only in how long the agent was allowed to run. Run 2 used **3.1× more tokens** than Run 3 and still failed.

**Multi-day pattern** (from audit snapshots):

| Date | Runs | Total Tokens | Avg Turns | Errors |
|---|---|---|---|---|
| 2026-04-07 | 4 | 13,880,345 | 42.8 | 1 |
| 2026-04-09 | 1 | 834,473 | 14.0 | 0 |
| 2026-04-10 | 3 | 10,284,760 | 68.7 | 2 |
| 2026-04-12 | 3 | 9,946,074 | 39.0 | 2 |
| **Total** | **11** | **34,945,652** | **46.2** | **5** |

</details>

---

### ⚠️ Caveats

- Tool-level usage data (`tools_used` field) was `null` for all 3 audited runs — Serena removal recommendation is based on workflow design analysis, not observed call counts. Verify Serena is not actually being called before removing.
- The 103-turn failure cause is unknown — may be a complex legitimate task, not an infinite loop. The `timeout-minutes` reduction preserves the ability to complete ~87-turn tasks.
- These recommendations are based on 11 runs over 4 days. Edge cases (e.g., a complex code refactoring requested via `/q`) may benefit from Serena or longer timeouts.
- Quick 2-turn failure (run 1) appears to be an authentication or initialization error — separate issue, not addressed here.

**References:**  
- [Workflow run §24313949714](https://github.com/github/gh-aw/actions/runs/24313949714)







> Generated by [Copilot Token Usage Optimizer](https://github.com/github/gh-aw/actions/runs/24313949714/agentic_workflow) · ● 1.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fcopilot-token-optimizer%22&type=issues)
> - [x] expires  on Apr 19, 2026, 7:13 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-token-optimizer] Token Optimizer: Q workflow — runaway 103-turn failures drive 75% of daily spend #25931

🔍 Optimization Target: Q — Agentic Workflow Optimizer

📊 Token Usage Profile

🔧 Recommendations

1. Reduce `timeout-minutes` from 15 to 10 — Est. savings: ~1.4M tokens/runaway run

2. Remove Serena (Go LSP) import — Est. savings: ~5% per-turn token reduction

4. Review `discussions` toolset necessity — Est. savings: minor

⚠️ Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Total tokens (snapshot period)	34,945,652
Avg tokens/run	3,176,877
Avg turns/run (weighted)	46.2
Failure rate	45% (5/11 runs)
Cache efficiency (per-turn)	~96% cache-read hits
Cache write tokens	0 across all 3 audited runs
Successful run avg duration	4.4 min (266s)
Failed runaway run duration	11.8 min (709s)

Tool / Server	Configured	Evidence of Need	Recommendation
`github` (default toolset)	✅	Issues/PR access for context	Keep
`github` (actions toolset)	✅	Phase 1 log download via gh-aw MCP	Keep
`github` (discussions toolset)	✅	Discussion trigger handler in prompt	Keep (review if rarely triggered)
`agentic-workflows` MCP	✅	Core tool for logs/audit/compile	Keep
`edit`	✅	Writing workflow .md changes	Keep
`bash`	✅	Script execution in investigation	Keep
`cache-memory`	✅	Pattern storage across invocations	Keep
Serena (Go LSP)	✅ configured	Not needed — Q modifies `.md` not `.go`	Remove

Run	Created	Turns	Tokens	Cache Read	Input/Output Ratio	Conclusion
1	2026-04-12 10:24	2	92,874	46,103	99.7% / 0.3%	failure (quick)
2	2026-04-12 16:39	103	9,117,621	8,911,025	99.6% / 0.4%	failure (runaway)
3	2026-04-12 18:38	39	2,896,536	2,773,017	99.6% / 0.4%	success

Date	Runs	Total Tokens	Avg Turns	Errors
2026-04-07	4	13,880,345	42.8	1
2026-04-09	1	834,473	14.0	0
2026-04-10	3	10,284,760	68.7	2
2026-04-12	3	9,946,074	39.0	2
Total	11	34,945,652	46.2	5

[copilot-token-optimizer] Token Optimizer: Q workflow — runaway 103-turn failures drive 75% of daily spend #25931

Description

🔍 Optimization Target: Q — Agentic Workflow Optimizer

📊 Token Usage Profile

🔧 Recommendations

1. Reduce timeout-minutes from 15 to 10 — Est. savings: ~1.4M tokens/runaway run

2. Remove Serena (Go LSP) import — Est. savings: ~5% per-turn token reduction

4. Review discussions toolset necessity — Est. savings: minor

⚠️ Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Reduce `timeout-minutes` from 15 to 10 — Est. savings: ~1.4M tokens/runaway run

4. Review `discussions` toolset necessity — Est. savings: minor