This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
python3 -m pip install -e .PYTHONPATH=src python3 -m unittest discover -s testsPYTHONPATH=src python3 -m unittest tests/test_supervisor.pyPYTHONPATH=src python3 -m unittest tests.test_supervisor.TestSupervisor.test_run_completespython3 -m compileall srcThis is a Python CLI tool (codex-loop) that acts as an external supervisor for Codex CLI, enabling autonomous multi-iteration implementation workflows. It also ships a Codex skill at skills/codex-loop/SKILL.md.
src/codex_loop/cli.py— the entire CLI. All subcommands (init,run,status,doctor,daemon,service,snapshots,events,cleanup,logs) are implemented here as a singlemain()function withargparse.pyproject.toml— declares thecodex-loopconsole script pointing tocodex_loop.cli:main.
| Module | Role |
|---|---|
config.py |
Dataclass config hierarchy loaded from codex-loop.yaml. Falls back to JSON if PyYAML is absent. |
state_store.py |
Reads/writes .codex-loop/state.json — the single source of truth for task status, blocker records, and loop metadata. |
task_graph.py |
Parses tasks/*.md files, tracks task status and dependencies. |
supervisor.py |
Core iteration loop: selects next task → runs Codex → verifies → handles circuit breakers → fires hooks. Returns LoopOutcome.COMPLETED or LoopOutcome.BLOCKED. |
codex_runner.py |
Wraps codex exec / codex exec resume subprocess calls, handles session resume fallback. |
verifier.py |
Runs verification commands from config after each iteration. |
hooks.py |
Executes pre_iteration, post_iteration, on_completed, on_blocked shell hooks with env vars. |
run_flow.py |
Orchestrates a full run: acquires lock, creates worktree, runs doctor, instantiates Supervisor, tears down. |
init_flow.py |
Scaffolds codex-loop.yaml, spec/, plan/, tasks/, .codex-loop/state.json from a prompt. |
doctor.py |
Validates and optionally repairs project state (backfills config defaults, checks task/state consistency). |
daemon_manager.py |
Manages a detached watchdog process around run --continuous. Uses .codex-loop/daemon.json and heartbeat files. |
service_manager.py |
Installs/uninstalls a macOS launchd agent under ~/Library/LaunchAgents. |
reporting.py |
Formats all reporting output: status, snapshots, events, health, session inventory. |
cleanup.py |
Prunes artifact directories (logs/, runs/, prompts/) by count or age. |
metrics.py |
Accumulates loop metrics (iteration counts, blocker codes, watchdog restarts). |
watchdog_manager.py |
Restart-policy logic for the daemon watchdog. |
run_lock.py |
File-based lock to prevent concurrent runs against the same project dir. |
git_ops.py |
Creates/removes temporary worktrees for isolated execution. |
cli.py
└─ run_flow.py # acquires lock, worktree, calls doctor
└─ supervisor.py # iteration loop
├─ task_graph.py # selects next task
├─ codex_runner.py # calls `codex exec`
├─ verifier.py # runs verification commands
└─ hooks.py # fires lifecycle hooks
All persistent state lives under .codex-loop/ in the project directory. Task definitions live in tasks/*.md. Configuration lives in codex-loop.yaml.
The supervisor stops with BLOCKED when any of these thresholds are hit:
max_iterations— total iterationsmax_no_progress_iterations— iterations with no task status change (measured by real git diff, not agent self-report)max_consecutive_runner_failures— consecutive Codex exec failuresmax_consecutive_verification_failures— consecutive verification failures
Several design invariants enforce correctness for unattended runs:
- Real diff over self-report:
Supervisor._real_files_changed()usesgit diff --name-only HEADto measure actual file changes. The agent-reportedfiles_changedfield is only used as a fallback when git is unavailable. This prevents Codex from resetting the no-progress counter by lying about changes. - Verification-gated completion: When all tasks self-report
done, the supervisor runs a final verification pass before declaringCOMPLETED. If verification fails, the last task is reopened and the loop continues. - Verification output in prompt: The last failed verification's stdout/stderr (up to 1500 chars each) is injected into the next iteration's prompt under
## Last Verification Output (FAILED), giving Codex precise error context. - Verification timeout:
VerificationConfig.timeout_seconds(default 300) is enforced per command viasubprocess.run(timeout=...). Timed-out commands are recorded withtimed_out: Trueand count as failures. - Iteration heartbeat thread: When
heartbeat_pathis provided,run_flow._run_supervisor_with_heartbeat()starts a daemon thread that writes the heartbeat every 60 seconds throughoutsupervisor.run(). This prevents the watchdog (stale threshold: 300 s) from misreading a longcodex execcall (up to 1800 s) as a dead process. - Worktree persistence:
worktree_pathandworktree_branchare stored instate.jsonmeta. Subsequentrun_project()calls reuse the same worktree branch instead of creating a new one, preserving code changes across--retry-blockedcycles. - Transient error classification:
Supervisor._is_transient_runner_error()detects network timeouts, rate limits, and kill signals. Transient errors get backoff-and-retry without incrementingconsecutive_runner_failuresorconsecutive_task_failures. - Task-level circuit breaker: When a single task fails
max_consecutive_task_failurestimes (default 5) consecutively, it is markedblockedand the loop continues to the next task instead of halting the entire run. - Task dependencies: Task markdown files can declare
<!-- depends_on: 001-foo, 002-bar -->(HTML comment anywhere) ordepends_on:in YAML frontmatter._select_task()skips any task whose dependencies are not yetdone.
execution.max_consecutive_task_failures(default 5): task-level circuit breaker threshold. When a single task fails this many times consecutively (runner or verification failures), it is markedblockedand skipped; the next pending task is promoted toreadyand the loop continues.execution.max_consecutive_task_failures = 0: disables the task-level circuit breaker.verification.timeout_seconds(default 300): per-command timeout for verification commands. Timed-out commands are recorded withtimed_out: trueand count as failures.
skills/codex-loop/SKILL.md is a Codex skill definition. Reference docs are in skills/codex-loop/references/. This is separate from the Python implementation and describes when/how Codex itself should invoke codex-loop.