AgentOps is the operating system around your coding agent: it tracks the work, validates the plan and code, and feeds what was learned into the next session.
Start Here · Install · See It Work · Skills · CLI · FAQ · Newcomer Guide
Most coding-agent tools improve the session. AgentOps improves the repo around the session.
Your agent remembers what worked. Every session writes decisions, patterns, and learnings to .agents/ — git-tracked, auditable, diffable. Not opaque SQLite. Not vector embeddings on someone else's server. Plain files that compound across sessions and transfer between machines with git clone.
Your code reviewed by Claude AND GPT. /council spawns independent judges across vendors — no anchoring, no single-model blind spots. The only plugin that does cross-vendor consensus validation.
| Without AgentOps | With AgentOps |
|---|---|
| Partial memory each session | Repo-native memory via .agents/ — retrieval, freshness weighting, injection |
| Review after the fact | /pre-mortem before build, /vibe before commit |
| Untracked agent runs | Issues, dependency waves, worktrees, audit trails |
| Chat logs | Artifacts the next session can use |
# Claude Code (recommended): marketplace + plugin install
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace
# Codex CLI (0.110.0+ native plugin; installs the native plugin, archives stale raw mirrors if needed, suppresses the unstable-plugins warning, then open a fresh Codex session)
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash
# Other Skills-compatible agents (agent-specific, install only what you need)
# Example (Cursor):
npx skills@latest add boshu2/agentops --cursor -gOn Linux, also install system bubblewrap so Codex does not warn that it is falling back to the vendored copy:
sudo apt-get install -y bubblewrapSkills work standalone — no CLI required. The ao CLI is what unlocks the full repo-native layer: knowledge extraction, retrieval and injection, maturity scoring, goals, and control-plane style workflows.
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops
brew install agentops
which ao
ao versionOr install via release binaries or build from source.
Then type /quickstart in your agent chat.
Three commands, zero methodology. Pick one and go:
/council validate this PR # Multi-model code review — immediate value
/research "how does auth work" # Codebase exploration with memory
/implement "fix the login bug" # Full lifecycle for one taskWhen you're ready for more:
/plan → /crank # Decompose into issues, parallel-execute
/rpi "add retry backoff" # Full pipeline: research → plan → build → validate → learn
/evolve # Fitness-scored improvement loop — walk away, come back to better codeEvery skill works alone. Compose them however you want. Full catalog: Skills.
A phased lifecycle, not a bag of prompts:
| Phase | Primary skills | What gets locked in |
|---|---|---|
| Discovery | /brainstorm -> /research -> /plan -> /pre-mortem |
repo context, scoped work, known risks, execution packet |
| Implementation | /crank -> /swarm -> /implement |
closed issues, validated wave outputs, ratchet checkpoints |
| Validation + learning | /validation -> /vibe -> /post-mortem -> /retro -> /forge |
findings, learnings, next work, stronger prevention artifacts |
/rpi orchestrates all three phases. /evolve keeps running /rpi against GOALS.md so the worst fitness gap gets addressed next. The output is not just code — it is code + state + memory + gates.
| Pattern | Chain | When |
|---|---|---|
| Quick fix | /implement |
One issue, clear scope |
| Validated fix | /implement → /vibe |
One issue, want confidence |
| Planned epic | /plan → /pre-mortem → /crank → /post-mortem |
Multi-issue, structured |
| Full pipeline | /rpi (chains all above) |
End-to-end, autonomous |
| Evolve loop | /evolve (chains /rpi repeatedly) |
Fitness-scored improvement |
| PR contribution | /pr-research → /pr-plan → /pr-implement → /pr-validate → /pr-prep |
External repo |
| Knowledge query | ao search → /research (if gaps) |
Understanding before building |
| Standalone review | /council validate <target> |
Ad-hoc multi-judge review |
- Mission and fitness:
GOALS.md,ao goals,/evolve - Discovery chain:
/brainstorm->ao search/ao lookup->/research->/plan->/pre-mortem - Execution chain:
/crank->/swarm->/implement->/vibe-> ratchet checkpoints - Compiled prevention chain: findings registry -> planning rules / pre-mortem checks / constraints -> later planning and validation
- Continuity chain: session hooks + phased manifests +
/handoff+/recover
That is the real architecture: a local operating layer around the agent, not just a prompt pack. See Primitive Chains for the audited map.
Each session automatically extracts knowledge and feeds it back — making the next session smarter:
┌───────────────────────────────────────────────────────────────────────┐
│ THE KNOWLEDGE FLYWHEEL │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ 1. WORK │─>│ 2. FORGE │─>│ 3. POOL │─>│ 4. PROMOTE │ │
│ │ Session │ │ Extract │ │ Score & │ │ Graduate │ │
│ │ │ │ │ │ Queue │ │ │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ ^ │ │
│ │ ┌────────────┐ ┌────────────┐ │ │
│ └─────────│ 6. INJECT │<─│5. LEARNINGS│<──────────┘ │
│ │ Surface │ │ Permanent │ │
│ │ & Cite │ │ Knowledge │ │
│ └────────────┘ └────────────┘ │
│ │
│ Each citation feeds back: utility scores update, high-utility │
│ knowledge surfaces more often, low-utility decays. This is the │
│ compounding effect — sessions get smarter because the best │
│ knowledge rises and the noise sinks. │
└───────────────────────────────────────────────────────────────────────┘
Forge extracts decisions, solutions, learnings, and failures from every session. Pool scores each on specificity, actionability, and novelty. Promote graduates the best after another session cites it. Inject surfaces the top knowledge at session start — each retrieval creates a citation that drives the feedback loop.
> /research "retry backoff strategies"
[lookup] 3 prior learnings found (freshness-weighted):
- Token bucket with Redis (established, high confidence)
- Rate limit at middleware layer, not per-handler (pattern)
- /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + retrieved context
Recommends: exponential backoff with jitter, reuse existing Redis client
Session 50 starts with 50 sessions of accumulated wisdom — not from scratch. Stale insights decay automatically. Useful ones compound. Measure it: ao flywheel status.
Deep dive: The Knowledge Flywheel
- Local-only — no telemetry, no cloud, no accounts. Nothing phones home.
- Auditable — plans, verdicts, learnings, and patterns are plain files on disk.
- Multi-runtime — Claude Code and Codex CLI (first-class), Cursor and OpenCode (experimental).
- Harder to drift — tracked issues and validation gates mean the repo is less dependent on agent mood or memory.
Everything is open source — audit it yourself.
OpenCode — plugin + skills
Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md
Configuration — environment variables
All optional. AgentOps works out of the box with no configuration. Full reference: docs/ENV-VARS.md
What AgentOps touches:
| What | Where | Reversible? |
|---|---|---|
| Skills | Global skill homes (~/.agents/skills for Codex/OpenAI-documented installs, plus compatibility caches outside your repo; for Claude Code: ~/.claude/skills/) |
rm -rf ~/.claude/skills/ ~/.agents/skills/ ~/.codex/skills/ ~/.codex/plugins/cache/agentops-marketplace/agentops/ |
| Knowledge artifacts | .agents/ in your repo (git-ignored by default) |
rm -rf .agents/ |
| Hook registration | .claude/settings.json |
Delete entries from .claude/settings.json |
Nothing modifies your source code.
Troubleshooting: docs/troubleshooting.md
1. One command — validate a PR:
> /council validate this PR
[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping
2. Full pipeline — research through post-mortem, one command:
> /rpi "add retry backoff to rate limiter"
[research] Found 3 prior learnings on rate limiting (injected)
[plan] 2 issues, 1 wave → epic ag-0058
[pre-mortem] Council validates plan → PASS (knew about Redis choice)
[crank] Parallel agents: Wave 1 ██ 2/2
[vibe] Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel] Next: /rpi "add circuit breaker to external API calls"
3. The endgame — /evolve: define goals, walk away, come back to a better codebase:
> /evolve
[evolve] GOALS.md: 18 gates loaded, score 77.0% (14/18 passing)
[cycle-1] Worst: wiring-closure (weight 6) + 3 more
/rpi "Fix failing goals" → score 93.3% (25/28) ✓
── the agent naturally organizes into phases ──
[cycle-2-35] Coverage blitz: 17 packages from ~85% → ~97% avg
Table-driven tests, edge cases, error paths
[cycle-38-59] Benchmarks added to all 15 internal packages
[cycle-60-95] Complexity annihilation: zero functions >= 8
(was dozens >= 20 — extracted helpers, tested independently)
[cycle-96-116] Modernization: sentinel errors, exhaustive switches,
Go 1.26-compatible idioms (slices, cmp.Or, range-over-int)
[teardown] 203 files changed, 20K+ lines, 116 cycles
All tests pass. Go vet clean. Avg coverage 97%.
/post-mortem → 33 learnings extracted
Ready for next /evolve — the floor is now the ceiling.
That ran overnight — ~7 hours, unattended. Regression gates auto-reverted anything that broke a passing goal. The agent naturally organized into the right order: build a safety net (tests), refactor aggressively (complexity), then polish.
More examples — swarm, session continuity, different workflows
Parallelize anything with /swarm:
> /swarm "research auth patterns, brainstorm rate limiting improvements"
[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/
Session continuity across compaction or restart:
> /handoff
[handoff] Saved: 3 open issues, current branch, next action
Continuation prompt written to .agents/handoffs/
--- next session ---
> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
Branch: feature/rate-limiter
Next: /implement ag-0058.3
Different developers, different setups:
| Workflow | Commands | What happens |
|---|---|---|
| PR reviewer | /council validate this PR |
One command, actionable feedback, no setup |
| Team lead | /research → /plan → /council validate |
Compose skills manually, stay in control |
| Solo dev | /rpi "add user auth" |
Research through post-mortem, walk away |
| Platform team | /swarm + /evolve |
Parallel pipelines + fitness-scored improvement loop |
Not sure which skill to run? See the Skill Router.
Every skill works alone. Compose them however you want.
Core skills — where most users spend their time:
| Skill | What it does |
|---|---|
/council |
Independent judges (Claude + Codex) debate, surface disagreement, converge. The validation primitive everything else builds on. |
/research |
Deep codebase exploration — produces structured findings with memory |
/implement |
Full lifecycle for one task — research, plan, build, validate, learn |
/vibe |
Code quality review — complexity + multi-model council + domain checklists |
/evolve |
Measure goals, fix the worst gap, regression-gate everything, repeat overnight |
Full catalog:
Judgment — the foundation everything validates against
| Skill | What it does |
|---|---|
/council |
Independent judges (Claude + Codex) debate, surface disagreement, converge. Auto-extracts findings into flywheel. --preset=security-audit, --perspectives, --debate |
/vibe |
Code quality review — complexity + council + finding classification (CRITICAL vs INFORMATIONAL) + suppression framework + domain checklists (SQL, LLM, concurrency) |
/pre-mortem |
Validate plans — error/rescue mapping, scope modes (Expand/Hold/Reduce), temporal interrogation, prediction tracking with downstream correlation |
/post-mortem |
Wrap up work — council validates, prediction accuracy scoring (HIT/MISS/SURPRISE), session streak tracking, persistent retro history |
Execution — research, plan, build, ship
| Skill | What it does |
|---|---|
/research |
Deep codebase exploration — produces structured findings |
/plan |
Decompose a goal into trackable issues with dependency waves |
/implement |
Full lifecycle for one task — research, plan, build, validate, learn |
/crank |
Parallel agents in dependency-ordered waves, fresh context per worker |
/swarm |
Parallelize any skill — run research, brainstorms, implementations in parallel |
/rpi |
Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem) |
/evolve |
The endgame: measure goals, fix the worst gap, regression-gate everything, learn, repeat overnight |
Knowledge — the flywheel that makes sessions compound
| Skill | What it does |
|---|---|
/retro |
Capture a decision, pattern, or lesson learned |
/forge |
Extract learnings from completed work into .agents/ |
/flywheel |
Monitor knowledge health — velocity, staleness, pool depths |
Supporting skills — onboarding, session, traceability, product, utility
| Onboarding | /quickstart, /using-agentops |
| Session | /handoff, /recover, /status |
| Traceability | /trace, /provenance |
| Product | /product, /goals, /release, /readme, /doc |
| Utility | /brainstorm, /bug-hunt, /complexity |
Full reference: docs/SKILLS.md
Cross-runtime orchestration — mix Claude, Codex, OpenCode
AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.
| Spawning Backend | How it works | Best for |
|---|---|---|
| Native teams | TeamCreate + SendMessage — built into Claude Code |
Tight coordination, debate |
| Background tasks | Task(run_in_background=true) — last-resort fallback |
When no team APIs available |
| Codex sub-agents | /codex-team — Claude orchestrates Codex workers |
Cross-vendor validation |
Custom agents — why AgentOps ships its own
Two read-only agents fill the gap between Claude Code's Explore (no commands) and general-purpose (full write, expensive):
| Agent | Model | Can do | Can't do |
|---|---|---|---|
agentops:researcher |
haiku | Read, search, run commands | Write or edit files |
agentops:code-reviewer |
sonnet | Read, search, git diff, structured findings |
Write or edit files |
Skills spawn these automatically — /research uses the researcher, /vibe uses the code-reviewer.
.agents/ is an append-only ledger — every learning, verdict, pattern, and decision is a dated file. Write once, score by freshness, inject the best, prune the rest. The formal model is cache eviction with freshness decay. Full lifecycle: Context Lifecycle.
Phase details — what each step does
-
/research— Explores your codebase. Produces a research artifact with findings and recommendations. -
/plan— Decomposes the goal into issues with dependency waves. Creates a beads epic (git-native issue tracking). -
/pre-mortem— Judges simulate failures before you write code. FAIL? Re-plan with feedback (max 3 retries). -
/crank— Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits.--test-firstfor spec-first TDD. -
/vibe— Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3). -
/post-mortem— Council validates the implementation. Retro extracts learnings. Suggests the next/rpicommand.
/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.
| Topic | Where |
|---|---|
| Phased RPI (fresh context per phase) | How It Works |
| Parallel RPI (N epics in isolated worktrees) | How It Works |
Setting up /evolve (GOALS.md, fitness loop) |
Evolve Setup |
| Science, systems theory, prior art | The Science |
Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL
Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)
The ao CLI adds the knowledge flywheel (extract, inject, decay, maturity) and terminal-based RPI that runs without an active chat session.
ao seed # Plant AgentOps in any repo (auto-detects project type)
ao rpi loop --supervisor --max-cycles 1 # Canonical autonomous cycle (policy-gated landing)
ao rpi loop --supervisor "fix auth bug" # Single explicit-goal supervised cycle
ao rpi phased --from=implementation "ag-058" # Resume a specific phased run at build phase
ao rpi parallel --manifest epics.json # Run N epics concurrently in isolated worktrees
ao rpi status --watch # Monitor active/terminal runsWalk away, come back to committed code + extracted learnings.
ao search "query" # Search repo-local knowledge and forged session history
ao lookup --query "topic" # Retrieve specific knowledge artifacts by ID or relevance
ao notebook update # Merge latest session insights into MEMORY.md
ao memory sync # Sync session history to MEMORY.md (cross-runtime: Codex, OpenCode)
ao context assemble # Build 5-section context briefing for a task
ao feedback-loop # Close the MemRL feedback loop (citation → utility → maturity)
ao metrics health # Flywheel health: sigma, rho, delta, escape velocity
ao dedup # Detect near-duplicate learnings (--merge for auto-resolution)
ao contradict # Detect potentially contradictory learnings
ao demo # Interactive demoao search (built on CASS) searches the current repo's .agents/ao/sessions/ plus adjacent .agents/ knowledge surfaces such as learnings, patterns, findings, and research. Raw global Claude or Codex archives are not searched directly until they are forged into the repo.
Second Brain + Obsidian vault — semantic search over all your sessions
.agents/ is plain text — open it as an Obsidian vault for browsing and linking. For semantic search, pair with Smart Connections (local embeddings, MCP server for agent retrieval).
Full reference: CLI Commands
One recursive shape at every scale:
/implement ── one worker, one issue, one verify cycle
└── /crank ── waves of /implement (FIRE loop)
└── /rpi ── research → plan → crank → validate → learn
└── /evolve ── fitness-gated /rpi cycles
Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem. Orchestrators stay in the main session; workers fork into subagents. See SKILL-TIERS.md for the full classification.
| Topic | Where |
|---|---|
| Five pillars, operational invariants | Architecture |
| Brownian Ratchet, Ralph Wiggum, context windowing | How It Works |
| Orchestrator vs worker fork rules | Skill Tiers |
| Injection philosophy, freshness decay, MemRL | The Science |
| Primitive chains (audited map) | Primitive Chains |
| Context lifecycle, three-tier injection | Context Lifecycle |
| Alternative | What it does well | Where AgentOps focuses differently |
|---|---|---|
| GSD | Clean subagent spawning, fights context rot | Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions) |
| Compound Engineer | Knowledge compounding, structured loop | Multi-model councils and validation gates — independent judges debating before and after code ships |
docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.
Issue tracking — Beads / bd
Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd vc status (optional Dolt state check; JSONL auto-sync is automatic). More: AGENTS.md
See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.
Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog
