[copilot-cli-research] Copilot CLI Deep Research - April 12, 2026 #25936
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot CLI Deep Research Agent. A newer discussion is available at Discussion #26089. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Analysis Date: 2026-04-12 | Repository: github/gh-aw | Triggered by:
@pelikhanThis is the third consecutive daily analysis comparing Copilot CLI capabilities against actual usage across 186 workflows (89 explicit
engine: copilot, 26 defaulting to Copilot, 44 Claude, 8 Codex). The analysis reveals a pattern of feature under-adoption: while the Copilot engine exposes rich configuration options (env vars, custom agents, autopilot, sandbox tuning), the vast majority of workflows use only the basicengine: copilotstring with default settings.Key Finding: Zero workflows use
engine.env,engine.args, orengine.api-targetin their frontmatter — powerful features documented and supported, but completely ignored in practice.Most Positive Trend:
playwrightadoption grew from 13% → 17% andweb-fetchfrom 19% → 22%, showing growing use of browsing capabilities.🔴 Critical Issues (Immediate Attention)
Security gap: 73% of web-browsing workflows have no network firewall
Out of 20 workflows using
tools.web-fetch, only ~5 pair it withsandbox.agent: awf. Fetching external URLs without a network firewall exposes the agent to SSRF risks and uncontrolled network access.8/10 custom agent files are completely unused
The
.github/agents/directory has 10 specialized agent files, but onlytechnical-doc-writerandci-cleanerare referenced from production workflows.grumpy-reviewer,adr-writer,contribution-checker,w3c-specification-writer, and others sit unused.🟡 Medium Priority Opportunities
max-continuationsbarely used (2/89 = 2%)Copilot CLI's autopilot mode — where the agent runs multiple continuation passes — is Copilot-exclusive. Only
smoke-copilot.mdandtest-quality-sentinel.mduse it. Long-running analysis workflows (deep-report.md,daily-architecture-diagram.md,delight.md) would benefit significantly frommax-continuations: 5+.bare: trueunderused (2/89 = 2%)Suppresses AGENTS.md loading via
--no-custom-instructions. Useful for focused, context-clean workflows likedaily-fact.md,poem-bot.md, andrelease.mdwhere the repo's general instructions are irrelevant noise.Feature Usage Matrix
engine.modelengine.agent(custom file)engine.agent: awf(sandbox)engine.bareengine.max-continuationsengine.envengine.argsengine.api-targetengine.version(pinning)features.copilot-requestsfeatures.mcp-gatewayfeatures.cli-proxytools.web-fetchtools.playwrighttools.repo-memorymcp-scriptssandbox.agent.memorysandbox.agent.mounts🔍 View High Priority Missed Opportunities
🔴 High Priority Opportunities
Opportunity 1:
engine.env– 0% AdoptionWhat: Custom environment variables injected into the Copilot CLI step.
Why It Matters: Enables runtime config without modifying workflow files. Useful for debug logging, custom API endpoints, integration-specific tuning, and A/B testing different configurations.
Affected Workflows: Any workflow needing debug visibility or custom runtime behavior.
Implementation:
Why likely unused: The extended engine object syntax (
id: copilot) is less commonly known than the simple string form. Most workflow authors may not know env vars can be injected this way.Opportunity 2: Web-Fetch Workflows Without Sandbox
What: 15-18 workflows use
tools.web-fetchwithoutsandbox.agent: awf.Why It Matters: Without AWF network firewall, the agent can reach arbitrary URLs. This creates SSRF risk and uncontrolled data exfiltration paths.
Affected Workflows:
brave.md,research.md(already has AWF ✅), workflows importingshared/mcp/tavily.md,shared/mcp/brave.md.Check:
brave.mduses Brave MCP but has nosandbox:block. Add:🟡 View Medium Priority Opportunities
🟡 Medium Priority Opportunities
Opportunity 3: Custom Agent Files (8/10 Unused)
What:
.github/agents/contains specialized agent personas. Onlyengine.agent:wires them.Unused agent files with clear candidates:
grumpy-reviewer.agent.mdpr-nitpick-reviewer.mdadr-writer.agent.mddaily-architecture-diagram.mdcontribution-checker.agent.mdcontribution-check.mdw3c-specification-writer.agent.mdlayout-spec-maintainer.mdagentic-workflows.agent.mdworkflow-generator.md,craft.mdImplementation (example for
pr-nitpick-reviewer.md):Impact: Specialized agent behaviors applied consistently without duplicating instructions in every workflow.
Opportunity 4:
max-continuationsfor Complex Analysis WorkflowsWhat: Autopilot mode allows Copilot to run in multiple passes, each building on the previous output.
Why It Matters: One-shot runs on complex analysis tasks (architecture review, issue clustering, deep research) often produce incomplete output. Multiple continuation passes improve depth and accuracy.
Candidate workflows (long-running analysis, currently limited to single pass):
deep-report.md(timeout: 30min) → addmax-continuations: 5daily-architecture-diagram.md→max-continuations: 3delight.md(very complex analysis) →max-continuations: 5copilot-cli-deep-research.md(this workflow!) →max-continuations: 3portfolio-analyst.md→max-continuations: 3Implementation:
Opportunity 5:
bare: truefor Context-Clean WorkflowsWhat: Disables
--no-custom-instructionsso AGENTS.md isn't loaded. Useful for workflows where the repo's development context is irrelevant.Candidate workflows (simple, focused, self-contained):
daily-fact.md– generates fun facts, doesn't need AGENTS.mdpoem-bot.md– creative writing, repo context is noisedictation-prompt.md– text correction taskdelight.md– independent creative agentImplementation:
Opportunity 6:
engine.versionPinning for StabilityWhat: Pin to a specific Copilot CLI version to avoid unexpected breakage from releases.
Why It Matters:
gh-awitself is tested against specific CLI versions. Uncontrolled upgrades can break workflows mid-run.Candidate workflows (production critical, high frequency):
daily-issues-report.md– daily production workflowauto-triage-issues.md– every 6h triggerhourly-ci-cleaner.md– hourly triggerImplementation:
🟢 View Low Priority Opportunities
🟢 Low Priority Opportunities
Opportunity 7:
mcp-scriptsfor Structured Tool CallsWhat: Allows defining shell script tools that the agent can call via MCP protocol. Only
daily-cli-performance.mduses this.Candidates: Any workflow that currently uses
tools.bashwith a small set of specific commands could convert tomcp-scriptsfor cleaner tool boundaries and better audit trails.Opportunity 8: GitHub Toolset Specificity
What: Many workflows use
toolsets: [default]which includes repos, issues, pull_requests, and context. Using specific toolsets reduces the agent's surface area.Pattern observed:
architecture-guardian.mdcorrectly usestoolsets: [repos](read-only repo access). More workflows could follow this pattern.Opportunity 9: AWF Memory Limits
What:
sandbox.agent.memorysets memory limits on the AWF container (e.g.,"8g").Why useful: Long-running workflows doing Python data analysis (
daily-issues-report.md,python-data-charts.md) might benefit from explicit memory limits to prevent OOM.Implementation:
Opportunity 10:
engine.api-targetfor Multi-Tenant TestingWhat: Routes Copilot CLI to a non-default API endpoint. Zero usage currently.
When relevant: Testing workflows against GHEC data-residency tenants or GHE Server instances. Not applicable to the current repo but worth noting for future.
Specific Workflow Recommendations
View Top 5 Quick-Win Recommendations
pr-nitpick-reviewer.mdengine: { id: copilot, agent: grumpy-reviewer }contribution-check.mdengine: { id: copilot, agent: contribution-checker }brave.mdsandbox: { agent: awf }+network: { allowed: [defaults, api.search.brave.com] }deep-report.mdengine: { id: copilot, max-continuations: 5 }daily-fact.mdengine: { id: copilot, bare: true, model: gpt-5.1-codex-mini }📈 Trends vs. Previous Run (2026-04-11)
playwrightadoptionweb-fetchadoptionmcp-scriptsengine.agent(custom file)max-continuationsengine.bareengine.envengine.args* Previous run may have counted differently (body text vs. frontmatter-only). Current methodology: frontmatter-only checks.
Best Practice Guidelines
Based on this research, here are the recommended best practices for Copilot workflows:
web-fetchwith AWF sandbox: Always addsandbox: { agent: awf }when using external URL fetching to enforce network restrictions..github/agents/files viaengine: { id: copilot, agent: <name> }instead of embedding role instructions in every workflow.max-continuationsfor complex analysis: Workflows with 30min+ timeouts doing deep analysis benefit frommax-continuations: 3-5.bare: truefor focused tasks: Creative/utility workflows that don't need repo context run cleaner without AGENTS.md loading.toolsets: [issues]instead oftoolsets: [default]when only one resource type is needed.gpt-5.1-codex-minifor simple tasks: Simple workflows (fact generation, short summaries) don't need the full model. Pin to a lighter model for cost savings.Action Items
Immediate (this week):
sandbox: { agent: awf }tobrave.mdto restrict Brave MCP network accessgrumpy-revieweragent topr-nitpick-reviewer.mdcontribution-checkeragent tocontribution-check.mdShort-term (this month):
max-continuations: 5todeep-report.mdanddelight.mdbare: truetodaily-fact.mdandpoem-bot.mdengine.model: gpt-5.1-codex-minito simple utility workflowsagentic-workflows.agent.md— wire toworkflow-generator.mdLong-term (this quarter):
shared/engine/defaults.mdimport with standard engine config patternsmcp-scriptsfor high-frequency tool-heavy workflowsengine.envuse cases in workflow examplesResearch Methodology & Supporting Data
Research Methodology
Data Sources:
.github/workflows/*.md— 186 workflow markdown files (non-lock)pkg/workflow/copilot_engine*.go— Engine implementationpkg/constants/feature_constants.go— Feature flag definitionsdocs/src/content/docs/reference/engines.md— Engine documentation.github/agents/— Custom agent file inventory/tmp/gh-aw/repo-memory/default/copilot-research-latest.json— Historical dataAnalysis Scripts: Python3 + grep pattern analysis on frontmatter YAML sections.
Workflow Count Methodology:
.mdfiles in.github/workflows/that don't end in.lock.ymlengine: copilot(89) + extended object withid: copilot(~10) + no engine specified (26, defaults to Copilot) = ~125 effectiveLimitations:
.github/aw/imports/) not analyzed in full detailReferences:
Beta Was this translation helpful? Give feedback.
All reactions