Skip to content

git-pixel22/Pidgeotto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pidgeotto 🪽

Automatically detects failed pods across all AKS clusters, fetches their logs, and uses Claude AI to explain what went wrong and how to fix it.


Prerequisites

  • Azure CLI installed and logged in (az login)
  • Claude Code installed and authenticated
  • Python 3.10+
  • Access to an Azure resource group containing AKS clusters

Setup

git clone <repo-url>
cd Pidgeotto

python3 -m venv myenv
source myenv/bin/activate

pip install -r requirements.txt

cp .env.example .env
# Edit .env and set at minimum: AKS_RESOURCE_GROUP

Run

python main.py

Pidgeotto will:

  1. Query Azure for all clusters in the resource group
  2. Skip stopped clusters, watch only running ones
  3. Detect any pod failures in real time
  4. Fetch the pod logs and send them to Claude for analysis
  5. Print a summary panel to the terminal

Output

For every failed pod you'll see two panels:

╭─ FAILED POD ──────────────────────────────╮
│ Cluster   : my-cluster                    │
│ Name      : my-pod-abc123                 │
│ Namespace : my-namespace                  │
│ Reason    : Error  │  Exit code : 1       │
│ Time      : 2026-03-25 17:02:36 UTC       │
╰───────────────────────────────────────────╯
╭─ Claude Analysis ─────────────────────────╮
│ [WARNING]                                 │
│                                           │
│ Root cause: ...                           │
│ Suggested fix: ...                        │
╰───────────────────────────────────────────╯

A full history of all analyses is saved to state/analysis_log.md.


Configuration

Copy .env.example to .env and edit as needed:

Variable Default Description
AKS_RESOURCE_GROUP (required) Azure resource group to scan
PLATFORM_NAME (none) Your platform/org name, added to Claude's analysis context
TIMEZONE UTC Timestamps timezone — any IANA name (e.g. Asia/Kolkata, America/New_York). Falls back to UTC if invalid.
CLUSTER_SYNC_INTERVAL 300 Seconds between cluster list refreshes
WATCH_NAMESPACES (all) Comma-separated namespaces to watch
LOG_TAIL_LINES 150 Log lines fetched per failed container
STATE_DIR ./state Where seen.json and analysis_log.md are stored

How it handles edge cases

Scenario Behaviour
Stopped cluster Skipped — no noise in logs
Cluster started after Pidgeotto is running Picked up automatically within CLUSTER_SYNC_INTERVAL seconds
Cluster deleted from Azure Watcher stopped cleanly
Same pod failure seen after restart Skipped — tracked in state/seen.json
Cluster temporarily unreachable Retries with exponential backoff (up to 60s)

About

A minimal tool for monitoring failed pods across AKS clusters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages