Problem
When something goes wrong with a DevLake deployment, users must manually:
- Run
gh devlake status and read the output
- Test each connection individually
- Check pipeline logs in the Config UI
- Correlate error messages across services
There's no single command that inspects the entire stack, identifies problems, and explains what's wrong with actionable remediation steps.
Proposed Solution
Add gh devlake diagnose ΓÇö an AI-powered diagnostic command that runs all health checks, connection tests, and pipeline inspections, then synthesizes a diagnosis with remediation commands.
Command surface
# Full diagnostic
gh devlake diagnose
# Focus on a specific area
gh devlake diagnose --scope connections
gh devlake diagnose --scope pipelines
How it works
-
Gather data ΓÇö run all checks programmatically (no user interaction needed):
- Ping all endpoints (backend, Config UI, Grafana)
- Test all saved connections across all plugins
- Fetch recent pipeline runs and their error messages
- Check DB connectivity
- Read state file for deployment context
-
Send to Copilot SDK ΓÇö package all results into a structured context and send to the LLM with a diagnostic prompt
-
Stream diagnosis ΓÇö the LLM synthesizes findings into plain-language explanation with actionable gh devlake commands
Example output
$ gh devlake diagnose
🔍 Running diagnostics...
✅ Backend API: http://localhost:8080 (healthy)
✅ Config UI: http://localhost:4000 (healthy)
✅ Grafana: http://localhost:3002 (healthy)
❌ Connection "GitHub - my-org" (github, id=1): 401 Unauthorized
✅ Connection "Copilot - my-ent" (gh-copilot, id=2): healthy
⚠️ Pipeline #12: FAILED (2 hours ago)
📋 Diagnosis:
Your GitHub connection "GitHub - my-org" is returning 401 Unauthorized.
This typically means the PAT has expired or been revoked.
To fix:
1. Generate a new PAT with scopes: repo, read:org, read:user
2. Update the connection:
gh devlake configure connection update --plugin github --id 1 --token ghp_NEW_TOKEN
Pipeline #12 failed because it depends on this connection.
After updating the token, re-trigger collection:
gh devlake configure project add --project-name my-team
Architecture
Reuses the internal/copilot/ package from #63. Adds diagnostic-specific tools:
// Tool: test_all_connections
// Batch-tests every connection across all plugins and returns results
var testAllConnectionsTool = copilot.DefineTool("test_all_connections",
"Test all saved DevLake connections and return pass/fail status for each",
func(params struct{}, inv copilot.ToolInvocation) (any, error) {
client := devlake.NewClient(apiURL)
var results []ConnectionTestResult
for _, def := range connectionRegistry {
conns, _ := client.ListConnections(def.Plugin)
for _, conn := range conns {
test, _ := client.TestSavedConnection(def.Plugin, conn.ID)
results = append(results, ConnectionTestResult{
Plugin: def.Plugin, ID: conn.ID, Name: conn.Name,
Healthy: test.Success, Message: test.Message,
})
}
}
return results, nil
})
// Tool: get_recent_pipeline_errors
// Fetches recent failed pipelines with error details
var getRecentPipelineErrorsTool = copilot.DefineTool("get_recent_pipeline_errors",
"Get recent failed DevLake pipeline runs with error messages and timestamps",
func(params struct{ Limit int `json:"limit,omitempty"` }, inv copilot.ToolInvocation) (any, error) {
// ... fetch pipelines, filter for failures, include error details ...
})
// Tool: check_all_endpoints
// Pings backend, Config UI, Grafana and returns status for each
var checkEndpointsTool = copilot.DefineTool("check_all_endpoints",
"Check health of all DevLake endpoints (backend API, Config UI, Grafana)",
func(params struct{}, inv copilot.ToolInvocation) (any, error) {
// ... ping each endpoint from state file or discovery ...
})
Output mode
Unlike insights (which streams), diagnose uses batch mode: collect the full response, then render with the CLI's standard emoji/box-drawing formatting. This ensures the diagnostic output has consistent visual structure.
// Wait for full response instead of streaming
response, err := session.SendAndWait(ctx, copilot.MessageOptions{
Prompt: diagnosticPrompt,
})
// Format and print with standard CLI output conventions
System prompt for diagnosis
The system message includes:
- DevLake architecture context (three-layer model, plugin structure)
- Available
gh devlake commands for remediation
- Common failure patterns and their fixes
- The user's deployment type (local vs Azure) from the state file
Files to create/modify
| File |
Change |
cmd/diagnose.go |
NEW ΓÇö gh devlake diagnose command |
internal/copilot/tools.go |
ADD ΓÇö diagnostic-specific tools (test_all_connections, get_recent_pipeline_errors, check_all_endpoints) |
internal/copilot/system.go |
ADD ΓÇö diagnostic system prompt variant |
Acceptance Criteria
Target Version
v0.4.3 ΓÇö AI-powered operations within the active v0.4.x line.
Dependencies
References
Problem
When something goes wrong with a DevLake deployment, users must manually:
gh devlake statusand read the outputThere's no single command that inspects the entire stack, identifies problems, and explains what's wrong with actionable remediation steps.
Proposed Solution
Add
gh devlake diagnoseΓÇö an AI-powered diagnostic command that runs all health checks, connection tests, and pipeline inspections, then synthesizes a diagnosis with remediation commands.Command surface
How it works
Gather data ΓÇö run all checks programmatically (no user interaction needed):
Send to Copilot SDK ΓÇö package all results into a structured context and send to the LLM with a diagnostic prompt
Stream diagnosis ΓÇö the LLM synthesizes findings into plain-language explanation with actionable
gh devlakecommandsExample output
Architecture
Reuses the
internal/copilot/package from #63. Adds diagnostic-specific tools:Output mode
Unlike
insights(which streams),diagnoseuses batch mode: collect the full response, then render with the CLI's standard emoji/box-drawing formatting. This ensures the diagnostic output has consistent visual structure.System prompt for diagnosis
The system message includes:
gh devlakecommands for remediationFiles to create/modify
cmd/diagnose.gogh devlake diagnosecommandinternal/copilot/tools.gotest_all_connections,get_recent_pipeline_errors,check_all_endpoints)internal/copilot/system.goAcceptance Criteria
gh devlake diagnosegathers all health/connection/pipeline data and produces a synthesis--scope connectionslimits diagnosis to connection health only--scope pipelineslimits diagnosis to pipeline failures onlygh devlakecommands for remediationinsights)go build ./...andgo test ./...passTarget Version
v0.4.3 ΓÇö AI-powered operations within the active v0.4.x line.
Dependencies
internal/copilotpackage +gh devlake insights#63 ΓÇö Copilot SDK integration (internal/copilot/package, SDK dependency)gh devlake querycommand with extensible query engine #62 ΓÇö query engine (for pipeline/metric data)--jsonoutput flag to read commands #60 ΓÇö--jsonoutput flag (for--jsonmode if desired)References
github/copilot-sdk/goΓÇöDefineTool,SendAndWait(batch mode), system messagesgo/README.mdΓÇö full API referencego/definetool.goΓÇö type-safe tool definitionsgo/session.goΓÇöSend,SendAndWait, event handlinggithub/copilot-sdkΓÇö architecture, auth, custom toolsapache/incubator-devlake/AGENTS.mdΓÇö plugin structure, API routescmd/status.goΓÇö existing health check logic to reusecmd/configure_connection_test_cmd.goΓÇö existing connection test logicinternal/devlake/client.goΓÇöHealth(),TestSavedConnection(),ListConnections(),GetPipeline()internal/copilot/ΓÇö shared SDK client from Integrate Copilot SDK (Go) —internal/copilotpackage +gh devlake insights#63