Skip to content

[copilot-token-optimizer] Token Optimizer: Q workflow β€” runaway 103-turn failures drive 75% of daily spendΒ #25931

@github-actions

Description

@github-actions

πŸ” Optimization Target: Q β€” Agentic Workflow Optimizer

Selected because: Highest-token consumer (9.95M tokens today) not recently optimized
Analysis period: 2026-04-07 β†’ 2026-04-12 (4 snapshot days)
Runs analyzed: 11 runs across 4 days


πŸ“Š Token Usage Profile

Metric Value
Total tokens (snapshot period) 34,945,652
Avg tokens/run 3,176,877
Avg turns/run (weighted) 46.2
Failure rate 45% (5/11 runs)
Cache efficiency (per-turn) ~96% cache-read hits
Cache write tokens 0 across all 3 audited runs
Successful run avg duration 4.4 min (266s)
Failed runaway run duration 11.8 min (709s)

Cache write tokens = 0 indicates the system prompt and tool schemas are already fully warm in cache β€” per-turn uncached cost is ~2–3K tokens/turn, which is healthy.


πŸ”§ Recommendations

1. Reduce timeout-minutes from 15 to 10 β€” Est. savings: ~1.4M tokens/runaway run

The single most damaging observed run used 103 turns over 11.8 minutes before failing (9.1M tokens). The most recent successful run completed in just 39 turns / 4.4 minutes. The 15-minute timeout allows the agent to run ~2.7Γ— longer than a successful run costs.

At the observed rate of ~6.9s/turn, a 10-minute timeout caps at ~87 turns β€” still 2Γ— a typical success run, but eliminates the final ~24 turns of the runaway, saving approximately 1.4M tokens on that single run.

Evidence across all snapshots:

  • 2026-04-10: avg_turns = 68.7 (vs 46.2 overall) with 2/3 runs failing
  • 2026-04-12: one success (39 turns, 4.4 min), one failure (103 turns, 11.8 min), one quick failure (2 turns)

Action: Change in q.md frontmatter:

timeout-minutes: 10  # was: 15; successful runs complete in ~4-5 min

Note: max-turns is not supported for the Copilot engine. timeout-minutes is the correct control knob.


2. Remove Serena (Go LSP) import β€” Est. savings: ~5% per-turn token reduction

The imports: [shared/mcp/serena-go.md] adds a full Serena LSP MCP server β€” designed for Go semantic analysis (.go files in pkg/). But Q's purpose is to optimize agentic workflows β€” it modifies .md workflow files, not Go source code. The serena-go.md import itself explicitly states:

"Only analyze .go files β€” Ignore all other file types"
"Focus on pkg/ directory"

Q uses edit, bash, and github tools to read and modify .md workflow files. Serena's LSP capabilities (go-to-definition, find-references, type inference) are not applicable to YAML/Markdown workflows.

Removing Serena eliminates one MCP server container and its tool schemas from every turn's context window, reducing per-turn input size across all 39–103 turns in a run.

Estimated savings: ~5K tokens/turn Γ— 46 avg turns Γ— 11 runs = ~2.5M tokens over the analysis period, plus eliminated MCP server startup overhead.

Action: Remove the Serena import from q.md:

# Remove this line:
imports:
  - shared/mcp/serena-go.md
```

If Q ever needs code analysis, it can use `bash` with `grep`/`glob` for `.md` workflow files.

---

#### 3. Add investigation depth guardrails in the prompt β€” Est. savings: variable

The Phase 1 prompt instructs downloading 10–20 runs of logs. Each full-run JSON log can be large β€” this content becomes part of the conversation context across subsequent turns. Constraining log downloads prevents context bloat.

**Action**: In the Phase 1 prompt, tighten the log download instruction:
```
- Count: 5 recent runs (reduced from 10-20)
- After analysis, summarize findings in <250 words before proceeding to Phase 2

This reduces input token growth per turn as the conversation lengthens.


4. Review discussions toolset necessity β€” Est. savings: minor

The GitHub MCP is configured with default + actions + discussions toolsets. Q does handle discussion triggers (the prompt has {{#if discussion.number}} branches), so the discussions toolset is justified. However, if telemetry shows Q is rarely triggered from discussions, removing it saves tool schema tokens on every invocation.

Action: Monitor discussion-trigger rate. If <10% of runs originate from discussions, remove the discussions toolset and let operators add it back if needed.


Tool Usage Matrix
Tool / Server Configured Evidence of Need Recommendation
github (default toolset) βœ… Issues/PR access for context Keep
github (actions toolset) βœ… Phase 1 log download via gh-aw MCP Keep
github (discussions toolset) βœ… Discussion trigger handler in prompt Keep (review if rarely triggered)
agentic-workflows MCP βœ… Core tool for logs/audit/compile Keep
edit βœ… Writing workflow .md changes Keep
bash βœ… Script execution in investigation Keep
cache-memory βœ… Pattern storage across invocations Keep
Serena (Go LSP) βœ… configured Not needed β€” Q modifies .md not .go Remove
Audited Runs Detail (2026-04-12)
Run Created Turns Tokens Cache Read Input/Output Ratio Conclusion
1 2026-04-12 10:24 2 92,874 46,103 99.7% / 0.3% failure (quick)
2 2026-04-12 16:39 103 9,117,621 8,911,025 99.6% / 0.4% failure (runaway)
3 2026-04-12 18:38 39 2,896,536 2,773,017 99.6% / 0.4% success

Key observation: Run 2 (103 turns) vs Run 3 (39 turns) β€” both at same model and similar per-turn cost (~88K vs ~74K tokens/turn) β€” differ only in how long the agent was allowed to run. Run 2 used 3.1Γ— more tokens than Run 3 and still failed.

Multi-day pattern (from audit snapshots):

Date Runs Total Tokens Avg Turns Errors
2026-04-07 4 13,880,345 42.8 1
2026-04-09 1 834,473 14.0 0
2026-04-10 3 10,284,760 68.7 2
2026-04-12 3 9,946,074 39.0 2
Total 11 34,945,652 46.2 5

⚠️ Caveats

  • Tool-level usage data (tools_used field) was null for all 3 audited runs β€” Serena removal recommendation is based on workflow design analysis, not observed call counts. Verify Serena is not actually being called before removing.
  • The 103-turn failure cause is unknown β€” may be a complex legitimate task, not an infinite loop. The timeout-minutes reduction preserves the ability to complete ~87-turn tasks.
  • These recommendations are based on 11 runs over 4 days. Edge cases (e.g., a complex code refactoring requested via /q) may benefit from Serena or longer timeouts.
  • Quick 2-turn failure (run 1) appears to be an authentication or initialization error β€” separate issue, not addressed here.

References:

Generated by Copilot Token Usage Optimizer Β· ● 1.2M Β· β—·

  • expires on Apr 19, 2026, 7:13 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions