AgentOps is the operating system around your coding agent: it tracks the work, validates the plan and code, and feeds what was learned into the next session.
How It Works · Install · See It Work · Skills · CLI · FAQ · Newcomer Guide
Most coding-agent tools improve the session. AgentOps improves the repo around the session.
| Without AgentOps | With AgentOps |
|---|---|
| Partial memory each session | Repo-native memory via .agents/ — retrieval, freshness weighting, injection |
| Review after the fact | /pre-mortem before build, /vibe before commit |
| Untracked agent runs | Issues, dependency waves, worktrees, audit trails |
| Chat logs | Artifacts the next session can use |
# Claude Code (recommended): marketplace + plugin install
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace
# Codex CLI (0.110.0+ native plugin; installs the native plugin, archives stale raw mirrors if needed, then open a fresh Codex session)
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash
# Other Skills-compatible agents (agent-specific, install only what you need)
# Example (Cursor):
npx skills@latest add boshu2/agentops --cursor -gSkills work standalone — no CLI required. The ao CLI is what unlocks the full repo-native layer: knowledge extraction, retrieval and injection, maturity scoring, goals, and control-plane style workflows.
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops
brew install agentops
which ao
ao versionOr install via release binaries or build from source.
Then type /quickstart in your agent chat.
A phased lifecycle, not a bag of prompts:
| Phase | Primary skills | What gets locked in |
|---|---|---|
| Discovery | /brainstorm -> /research -> /plan -> /pre-mortem |
repo context, scoped work, known risks, execution packet |
| Implementation | /crank -> /swarm -> /implement |
closed issues, validated wave outputs, ratchet checkpoints |
| Validation + learning | /validation -> /vibe -> /post-mortem -> /retro -> /forge |
findings, learnings, next work, stronger prevention artifacts |
/rpi orchestrates all three phases. /evolve keeps running /rpi against GOALS.md so the worst fitness gap gets addressed next. The output is not just code — it is code + state + memory + gates.
| Pattern | Chain | When |
|---|---|---|
| Quick fix | /implement |
One issue, clear scope |
| Validated fix | /implement → /vibe |
One issue, want confidence |
| Planned epic | /plan → /pre-mortem → /crank → /post-mortem |
Multi-issue, structured |
| Full pipeline | /rpi (chains all above) |
End-to-end, autonomous |
| Evolve loop | /evolve (chains /rpi repeatedly) |
Fitness-scored improvement |
| PR contribution | /pr-research → /pr-plan → /pr-implement → /pr-validate → /pr-prep |
External repo |
| Knowledge query | /knowledge → /research (if gaps) |
Understanding before building |
| Standalone review | /council validate <target> |
Ad-hoc multi-judge review |
- Mission and fitness:
GOALS.md,ao goals,/evolve - Discovery chain:
/brainstorm->ao search/ao lookup->/research->/plan->/pre-mortem - Execution chain:
/crank->/swarm->/implement->/vibe-> ratchet checkpoints - Compiled prevention chain: findings registry -> planning rules / pre-mortem checks / constraints -> later planning and validation
- Continuity chain: session hooks + phased manifests +
/handoff+/recover
That is the real architecture: a local operating layer around the agent, not just a prompt pack. See Primitive Chains for the audited map.
Each session writes decisions and patterns to .agents/. The next relevant task starts with that context already loaded:
> /research "retry backoff strategies"
[lookup] 3 prior learnings found (freshness-weighted):
- Token bucket with Redis (established, high confidence)
- Rate limit at middleware layer, not per-handler (pattern)
- /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + retrieved context
Recommends: exponential backoff with jitter, reuse existing Redis client
Session 5 started with scored, repo-specific context — not from scratch. Stale insights decay automatically. Useful ones compound.
- Local-only — no telemetry, no cloud, no accounts. Nothing phones home.
- Auditable — plans, verdicts, learnings, and patterns are plain files on disk.
- Multi-runtime — Claude Code, Codex CLI, Cursor, OpenCode.
- Harder to drift — tracked issues and validation gates mean the repo is less dependent on agent mood or memory.
Everything is open source — audit it yourself.
OpenCode — plugin + skills
Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md
Configuration — environment variables
All optional. AgentOps works out of the box with no configuration. Full reference: docs/ENV-VARS.md
What AgentOps touches:
| What | Where | Reversible? |
|---|---|---|
| Skills | Global skill homes (~/.agents/skills for Codex/OpenAI-documented installs, plus compatibility caches outside your repo; for Claude Code: ~/.claude/skills/) |
rm -rf ~/.claude/skills/ ~/.agents/skills/ ~/.codex/skills/ ~/.codex/plugins/cache/agentops-marketplace/agentops/ |
| Knowledge artifacts | .agents/ in your repo (git-ignored by default) |
rm -rf .agents/ |
| Hook registration | .claude/settings.json |
Delete entries from .claude/settings.json |
Nothing modifies your source code.
Troubleshooting: docs/troubleshooting.md
1. One command — validate a PR:
> /council validate this PR
[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping
2. Full pipeline — research through post-mortem, one command:
> /rpi "add retry backoff to rate limiter"
[research] Found 3 prior learnings on rate limiting (injected)
[plan] 2 issues, 1 wave → epic ag-0058
[pre-mortem] Council validates plan → PASS (knew about Redis choice)
[crank] Parallel agents: Wave 1 ██ 2/2
[vibe] Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel] Next: /rpi "add circuit breaker to external API calls"
3. The endgame — /evolve: define goals, walk away, come back to a better codebase:
> /evolve
[evolve] GOALS.md: 18 gates loaded, score 77.0% (14/18 passing)
[cycle-1] Worst: wiring-closure (weight 6) + 3 more
/rpi "Fix failing goals" → score 93.3% (25/28) ✓
── the agent naturally organizes into phases ──
[cycle-2-35] Coverage blitz: 17 packages from ~85% → ~97% avg
Table-driven tests, edge cases, error paths
[cycle-38-59] Benchmarks added to all 15 internal packages
[cycle-60-95] Complexity annihilation: zero functions >= 8
(was dozens >= 20 — extracted helpers, tested independently)
[cycle-96-116] Modernization: sentinel errors, exhaustive switches,
Go 1.26-compatible idioms (slices, cmp.Or, range-over-int)
[teardown] 203 files changed, 20K+ lines, 116 cycles
All tests pass. Go vet clean. Avg coverage 97%.
/post-mortem → 33 learnings extracted
Ready for next /evolve — the floor is now the ceiling.
That ran overnight — ~7 hours, unattended. Regression gates auto-reverted anything that broke a passing goal. The agent naturally organized into the right order: build a safety net (tests), refactor aggressively (complexity), then polish.
More examples — swarm, session continuity, different workflows
Parallelize anything with /swarm:
> /swarm "research auth patterns, brainstorm rate limiting improvements"
[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/
Session continuity across compaction or restart:
> /handoff
[handoff] Saved: 3 open issues, current branch, next action
Continuation prompt written to .agents/handoffs/
--- next session ---
> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
Branch: feature/rate-limiter
Next: /implement ag-0058.3
Different developers, different setups:
| Workflow | Commands | What happens |
|---|---|---|
| PR reviewer | /council validate this PR |
One command, actionable feedback, no setup |
| Team lead | /research → /plan → /council validate |
Compose skills manually, stay in control |
| Solo dev | /rpi "add user auth" |
Research through post-mortem, walk away |
| Platform team | /swarm + /evolve |
Parallel pipelines + fitness-scored improvement loop |
Not sure which skill to run? See the Skill Router.
Every skill works alone. Compose them however you want.
Judgment — the foundation everything validates against:
| Skill | What it does |
|---|---|
/council |
Independent judges (Claude + Codex) debate, surface disagreement, converge. Auto-extracts findings into flywheel. --preset=security-audit, --perspectives, --debate |
/vibe |
Code quality review — complexity + council + finding classification (CRITICAL vs INFORMATIONAL) + suppression framework + domain checklists (SQL, LLM, concurrency) |
/pre-mortem |
Validate plans — error/rescue mapping, scope modes (Expand/Hold/Reduce), temporal interrogation, prediction tracking with downstream correlation |
/post-mortem |
Wrap up work — council validates, prediction accuracy scoring (HIT/MISS/SURPRISE), session streak tracking, persistent retro history |
Execution — research, plan, build, ship:
| Skill | What it does |
|---|---|
/research |
Deep codebase exploration — produces structured findings |
/plan |
Decompose a goal into trackable issues with dependency waves |
/implement |
Full lifecycle for one task — research, plan, build, validate, learn |
/crank |
Parallel agents in dependency-ordered waves, fresh context per worker |
/swarm |
Parallelize any skill — run research, brainstorms, implementations in parallel |
/rpi |
Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem) |
/evolve |
The endgame: measure goals, fix the worst gap, regression-gate everything, learn, repeat overnight |
Knowledge — the flywheel that makes sessions compound:
| Skill | What it does |
|---|---|
/knowledge |
Query learnings, patterns, and decisions across .agents/ |
/retro |
Manually capture a decision, pattern, or lesson |
/retro |
Extract learnings from completed work |
/flywheel |
Monitor knowledge health — velocity, staleness, pool depths |
Supporting skills:
| Onboarding | /quickstart, /using-agentops |
| Session | /handoff, /recover, /status |
| Traceability | /trace, /provenance |
| Product | /product, /goals, /release, /readme, /doc |
| Utility | /brainstorm, /bug-hunt, /complexity |
Full reference: docs/SKILLS.md
Cross-runtime orchestration — mix Claude, Codex, OpenCode
AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.
| Spawning Backend | How it works | Best for |
|---|---|---|
| Native teams | TeamCreate + SendMessage — built into Claude Code |
Tight coordination, debate |
| Background tasks | Task(run_in_background=true) — last-resort fallback |
When no team APIs available |
| Codex sub-agents | /codex-team — Claude orchestrates Codex workers |
Cross-vendor validation |
Custom agents — why AgentOps ships its own
Two read-only agents fill the gap between Claude Code's Explore (no commands) and general-purpose (full write, expensive):
| Agent | Model | Can do | Can't do |
|---|---|---|---|
agentops:researcher |
haiku | Read, search, run commands | Write or edit files |
agentops:code-reviewer |
sonnet | Read, search, git diff, structured findings |
Write or edit files |
Skills spawn these automatically — /research uses the researcher, /vibe uses the code-reviewer.
.agents/ is an append-only ledger — every learning, verdict, pattern, and decision is a dated file. Write once, score by freshness, inject the best, prune the rest. The formal model is cache eviction with freshness decay. Full lifecycle: Context Lifecycle.
Phase details — what each step does
-
/research— Explores your codebase. Produces a research artifact with findings and recommendations. -
/plan— Decomposes the goal into issues with dependency waves. Creates a beads epic (git-native issue tracking). -
/pre-mortem— Judges simulate failures before you write code. FAIL? Re-plan with feedback (max 3 retries). -
/crank— Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits.--test-firstfor spec-first TDD. -
/vibe— Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3). -
/post-mortem— Council validates the implementation. Retro extracts learnings. Suggests the next/rpicommand.
/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.
| Topic | Where |
|---|---|
| Phased RPI (fresh context per phase) | How It Works |
| Parallel RPI (N epics in isolated worktrees) | How It Works |
Setting up /evolve (GOALS.md, fitness loop) |
Evolve Setup |
| Science, systems theory, prior art | The Science |
Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL
Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)
The ao CLI adds the knowledge flywheel (extract, inject, decay, maturity) and terminal-based RPI that runs without an active chat session.
ao seed # Plant AgentOps in any repo (auto-detects project type)
ao rpi loop --supervisor --max-cycles 1 # Canonical autonomous cycle (policy-gated landing)
ao rpi loop --supervisor "fix auth bug" # Single explicit-goal supervised cycle
ao rpi phased --from=implementation "ag-058" # Resume a specific phased run at build phase
ao rpi parallel --manifest epics.json # Run N epics concurrently in isolated worktrees
ao rpi status --watch # Monitor active/terminal runsWalk away, come back to committed code + extracted learnings.
ao search "query" # Search knowledge across files and chat history
ao lookup --query "topic" # Retrieve specific knowledge artifacts by ID or relevance
ao notebook update # Merge latest session insights into MEMORY.md
ao memory sync # Sync session history to MEMORY.md (cross-runtime: Codex, OpenCode)
ao context assemble # Build 5-section context briefing for a task
ao feedback-loop # Close the MemRL feedback loop (citation → utility → maturity)
ao metrics health # Flywheel health: sigma, rho, delta, escape velocity
ao dedup # Detect near-duplicate learnings (--merge for auto-resolution)
ao contradict # Detect potentially contradictory learnings
ao demo # Interactive demoao search (built on CASS) indexes every chat session from every runtime that writes to .agents/ao/sessions/.
Second Brain + Obsidian vault — semantic search over all your sessions
.agents/ is plain text — open it as an Obsidian vault for browsing and linking. For semantic search, pair with Smart Connections (local embeddings, MCP server for agent retrieval).
Full reference: CLI Commands
One recursive shape at every scale:
/implement ── one worker, one issue, one verify cycle
└── /crank ── waves of /implement (FIRE loop)
└── /rpi ── research → plan → crank → validate → learn
└── /evolve ── fitness-gated /rpi cycles
Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem. Orchestrators stay in the main session; workers fork into subagents. See SKILL-TIERS.md for the full classification.
| Topic | Where |
|---|---|
| Five pillars, operational invariants | Architecture |
| Brownian Ratchet, Ralph Wiggum, context windowing | How It Works |
| Orchestrator vs worker fork rules | Skill Tiers |
| Injection philosophy, freshness decay, MemRL | The Science |
| Primitive chains (audited map) | Primitive Chains |
| Context lifecycle, three-tier injection | Context Lifecycle |
| Alternative | What it does well | Where AgentOps focuses differently |
|---|---|---|
| GSD | Clean subagent spawning, fights context rot | Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions) |
| Compound Engineer | Knowledge compounding, structured loop | Multi-model councils and validation gates — independent judges debating before and after code ships |
docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.
Issue tracking — Beads / bd
Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd vc status (optional Dolt state check; JSONL auto-sync is automatic). More: AGENTS.md
See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.
Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog
