Automatically detects failed pods across all AKS clusters, fetches their logs, and uses Claude AI to explain what went wrong and how to fix it.
- Azure CLI installed and logged in (
az login) - Claude Code installed and authenticated
- Python 3.10+
- Access to an Azure resource group containing AKS clusters
git clone <repo-url>
cd Pidgeotto
python3 -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env and set at minimum: AKS_RESOURCE_GROUPpython main.pyPidgeotto will:
- Query Azure for all clusters in the resource group
- Skip stopped clusters, watch only running ones
- Detect any pod failures in real time
- Fetch the pod logs and send them to Claude for analysis
- Print a summary panel to the terminal
For every failed pod you'll see two panels:
╭─ FAILED POD ──────────────────────────────╮
│ Cluster : my-cluster │
│ Name : my-pod-abc123 │
│ Namespace : my-namespace │
│ Reason : Error │ Exit code : 1 │
│ Time : 2026-03-25 17:02:36 UTC │
╰───────────────────────────────────────────╯
╭─ Claude Analysis ─────────────────────────╮
│ [WARNING] │
│ │
│ Root cause: ... │
│ Suggested fix: ... │
╰───────────────────────────────────────────╯
A full history of all analyses is saved to state/analysis_log.md.
Copy .env.example to .env and edit as needed:
| Variable | Default | Description |
|---|---|---|
AKS_RESOURCE_GROUP |
(required) | Azure resource group to scan |
PLATFORM_NAME |
(none) | Your platform/org name, added to Claude's analysis context |
TIMEZONE |
UTC |
Timestamps timezone — any IANA name (e.g. Asia/Kolkata, America/New_York). Falls back to UTC if invalid. |
CLUSTER_SYNC_INTERVAL |
300 |
Seconds between cluster list refreshes |
WATCH_NAMESPACES |
(all) | Comma-separated namespaces to watch |
LOG_TAIL_LINES |
150 |
Log lines fetched per failed container |
STATE_DIR |
./state |
Where seen.json and analysis_log.md are stored |
| Scenario | Behaviour |
|---|---|
| Stopped cluster | Skipped — no noise in logs |
| Cluster started after Pidgeotto is running | Picked up automatically within CLUSTER_SYNC_INTERVAL seconds |
| Cluster deleted from Azure | Watcher stopped cleanly |
| Same pod failure seen after restart | Skipped — tracked in state/seen.json |
| Cluster temporarily unreachable | Retries with exponential backoff (up to 60s) |