Skip to content

boshu2/agentops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

866 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AgentOps

Validate Nightly

The local DevOps layer for coding agents.

AgentOps is the operating system around your coding agent: it tracks the work, validates the plan and code, and feeds what was learned into the next session.

How It Works · Install · See It Work · Skills · CLI · FAQ · Newcomer Guide

Agents running full development cycles in parallel with validation gates and a coordinating team leader


Why AgentOps Exists

Most coding-agent tools improve the session. AgentOps improves the repo around the session.

Without AgentOps With AgentOps
Partial memory each session Repo-native memory via .agents/ — retrieval, freshness weighting, injection
Review after the fact /pre-mortem before build, /vibe before commit
Untracked agent runs Issues, dependency waves, worktrees, audit trails
Chat logs Artifacts the next session can use

Install

# Claude Code (recommended): marketplace + plugin install
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

# Codex CLI (0.110.0+ native plugin; installs the native plugin, archives stale raw mirrors if needed, then open a fresh Codex session)
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

# Other Skills-compatible agents (agent-specific, install only what you need)
# Example (Cursor):
npx skills@latest add boshu2/agentops --cursor -g

Install ao CLI (optional)

Skills work standalone — no CLI required. The ao CLI is what unlocks the full repo-native layer: knowledge extraction, retrieval and injection, maturity scoring, goals, and control-plane style workflows.

Homebrew (recommended)

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops
brew install agentops
which ao
ao version

Or install via release binaries or build from source.

Then type /quickstart in your agent chat.


How It Works

A phased lifecycle, not a bag of prompts:

Phase Primary skills What gets locked in
Discovery /brainstorm -> /research -> /plan -> /pre-mortem repo context, scoped work, known risks, execution packet
Implementation /crank -> /swarm -> /implement closed issues, validated wave outputs, ratchet checkpoints
Validation + learning /validation -> /vibe -> /post-mortem -> /retro -> /forge findings, learnings, next work, stronger prevention artifacts

/rpi orchestrates all three phases. /evolve keeps running /rpi against GOALS.md so the worst fitness gap gets addressed next. The output is not just code — it is code + state + memory + gates.

Pattern Chain When
Quick fix /implement One issue, clear scope
Validated fix /implement/vibe One issue, want confidence
Planned epic /plan/pre-mortem/crank/post-mortem Multi-issue, structured
Full pipeline /rpi (chains all above) End-to-end, autonomous
Evolve loop /evolve (chains /rpi repeatedly) Fitness-scored improvement
PR contribution /pr-research/pr-plan/pr-implement/pr-validate/pr-prep External repo
Knowledge query /knowledge/research (if gaps) Understanding before building
Standalone review /council validate <target> Ad-hoc multi-judge review

Primitive chains underneath it

  • Mission and fitness: GOALS.md, ao goals, /evolve
  • Discovery chain: /brainstorm -> ao search / ao lookup -> /research -> /plan -> /pre-mortem
  • Execution chain: /crank -> /swarm -> /implement -> /vibe -> ratchet checkpoints
  • Compiled prevention chain: findings registry -> planning rules / pre-mortem checks / constraints -> later planning and validation
  • Continuity chain: session hooks + phased manifests + /handoff + /recover

That is the real architecture: a local operating layer around the agent, not just a prompt pack. See Primitive Chains for the audited map.

Why the memory matters

Each session writes decisions and patterns to .agents/. The next relevant task starts with that context already loaded:

> /research "retry backoff strategies"

[lookup] 3 prior learnings found (freshness-weighted):
  - Token bucket with Redis (established, high confidence)
  - Rate limit at middleware layer, not per-handler (pattern)
  - /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + retrieved context
           Recommends: exponential backoff with jitter, reuse existing Redis client

Session 5 started with scored, repo-specific context — not from scratch. Stale insights decay automatically. Useful ones compound.

Why engineers buy in

  • Local-only — no telemetry, no cloud, no accounts. Nothing phones home.
  • Auditable — plans, verdicts, learnings, and patterns are plain files on disk.
  • Multi-runtime — Claude Code, Codex CLI, Cursor, OpenCode.
  • Harder to drift — tracked issues and validation gates mean the repo is less dependent on agent mood or memory.

Everything is open source — audit it yourself.


OpenCode — plugin + skills

Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md

Configuration — environment variables

All optional. AgentOps works out of the box with no configuration. Full reference: docs/ENV-VARS.md

What AgentOps touches:

What Where Reversible?
Skills Global skill homes (~/.agents/skills for Codex/OpenAI-documented installs, plus compatibility caches outside your repo; for Claude Code: ~/.claude/skills/) rm -rf ~/.claude/skills/ ~/.agents/skills/ ~/.codex/skills/ ~/.codex/plugins/cache/agentops-marketplace/agentops/
Knowledge artifacts .agents/ in your repo (git-ignored by default) rm -rf .agents/
Hook registration .claude/settings.json Delete entries from .claude/settings.json

Nothing modifies your source code.

Troubleshooting: docs/troubleshooting.md


See It Work

1. One command — validate a PR:

> /council validate this PR

[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping

2. Full pipeline — research through post-mortem, one command:

> /rpi "add retry backoff to rate limiter"

[research]    Found 3 prior learnings on rate limiting (injected)
[plan]        2 issues, 1 wave → epic ag-0058
[pre-mortem]  Council validates plan → PASS (knew about Redis choice)
[crank]       Parallel agents: Wave 1 ██ 2/2
[vibe]        Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel]    Next: /rpi "add circuit breaker to external API calls"

3. The endgame/evolve: define goals, walk away, come back to a better codebase:

> /evolve

[evolve] GOALS.md: 18 gates loaded, score 77.0% (14/18 passing)

[cycle-1]     Worst: wiring-closure (weight 6) + 3 more
              /rpi "Fix failing goals" → score 93.3% (25/28) ✓

              ── the agent naturally organizes into phases ──

[cycle-2-35]  Coverage blitz: 17 packages from ~85% → ~97% avg
              Table-driven tests, edge cases, error paths
[cycle-38-59] Benchmarks added to all 15 internal packages
[cycle-60-95] Complexity annihilation: zero functions >= 8
              (was dozens >= 20 — extracted helpers, tested independently)
[cycle-96-116] Modernization: sentinel errors, exhaustive switches,
              Go 1.26-compatible idioms (slices, cmp.Or, range-over-int)

[teardown]    203 files changed, 20K+ lines, 116 cycles
              All tests pass. Go vet clean. Avg coverage 97%.
              /post-mortem → 33 learnings extracted
              Ready for next /evolve — the floor is now the ceiling.

That ran overnight — ~7 hours, unattended. Regression gates auto-reverted anything that broke a passing goal. The agent naturally organized into the right order: build a safety net (tests), refactor aggressively (complexity), then polish.

More examples — swarm, session continuity, different workflows

Parallelize anything with /swarm:

> /swarm "research auth patterns, brainstorm rate limiting improvements"

[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/

Session continuity across compaction or restart:

> /handoff
[handoff] Saved: 3 open issues, current branch, next action
         Continuation prompt written to .agents/handoffs/

--- next session ---

> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
          Branch: feature/rate-limiter
          Next: /implement ag-0058.3

Different developers, different setups:

Workflow Commands What happens
PR reviewer /council validate this PR One command, actionable feedback, no setup
Team lead /research/plan/council validate Compose skills manually, stay in control
Solo dev /rpi "add user auth" Research through post-mortem, walk away
Platform team /swarm + /evolve Parallel pipelines + fitness-scored improvement loop

Not sure which skill to run? See the Skill Router.


Skills

Every skill works alone. Compose them however you want.

Judgment — the foundation everything validates against:

Skill What it does
/council Independent judges (Claude + Codex) debate, surface disagreement, converge. Auto-extracts findings into flywheel. --preset=security-audit, --perspectives, --debate
/vibe Code quality review — complexity + council + finding classification (CRITICAL vs INFORMATIONAL) + suppression framework + domain checklists (SQL, LLM, concurrency)
/pre-mortem Validate plans — error/rescue mapping, scope modes (Expand/Hold/Reduce), temporal interrogation, prediction tracking with downstream correlation
/post-mortem Wrap up work — council validates, prediction accuracy scoring (HIT/MISS/SURPRISE), session streak tracking, persistent retro history

Execution — research, plan, build, ship:

Skill What it does
/research Deep codebase exploration — produces structured findings
/plan Decompose a goal into trackable issues with dependency waves
/implement Full lifecycle for one task — research, plan, build, validate, learn
/crank Parallel agents in dependency-ordered waves, fresh context per worker
/swarm Parallelize any skill — run research, brainstorms, implementations in parallel
/rpi Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem)
/evolve The endgame: measure goals, fix the worst gap, regression-gate everything, learn, repeat overnight

Knowledge — the flywheel that makes sessions compound:

Skill What it does
/knowledge Query learnings, patterns, and decisions across .agents/
/retro Manually capture a decision, pattern, or lesson
/retro Extract learnings from completed work
/flywheel Monitor knowledge health — velocity, staleness, pool depths

Supporting skills:

Onboarding /quickstart, /using-agentops
Session /handoff, /recover, /status
Traceability /trace, /provenance
Product /product, /goals, /release, /readme, /doc
Utility /brainstorm, /bug-hunt, /complexity

Full reference: docs/SKILLS.md

Cross-runtime orchestration — mix Claude, Codex, OpenCode

AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.

Spawning Backend How it works Best for
Native teams TeamCreate + SendMessage — built into Claude Code Tight coordination, debate
Background tasks Task(run_in_background=true) — last-resort fallback When no team APIs available
Codex sub-agents /codex-team — Claude orchestrates Codex workers Cross-vendor validation
Custom agents — why AgentOps ships its own

Two read-only agents fill the gap between Claude Code's Explore (no commands) and general-purpose (full write, expensive):

Agent Model Can do Can't do
agentops:researcher haiku Read, search, run commands Write or edit files
agentops:code-reviewer sonnet Read, search, git diff, structured findings Write or edit files

Skills spawn these automatically — /research uses the researcher, /vibe uses the code-reviewer.


Deep Dive

.agents/ is an append-only ledger — every learning, verdict, pattern, and decision is a dated file. Write once, score by freshness, inject the best, prune the rest. The formal model is cache eviction with freshness decay. Full lifecycle: Context Lifecycle.

Phase details — what each step does
  1. /research — Explores your codebase. Produces a research artifact with findings and recommendations.

  2. /plan — Decomposes the goal into issues with dependency waves. Creates a beads epic (git-native issue tracking).

  3. /pre-mortem — Judges simulate failures before you write code. FAIL? Re-plan with feedback (max 3 retries).

  4. /crank — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. --test-first for spec-first TDD.

  5. /vibe — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).

  6. /post-mortem — Council validates the implementation. Retro extracts learnings. Suggests the next /rpi command.

/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.

Topic Where
Phased RPI (fresh context per phase) How It Works
Parallel RPI (N epics in isolated worktrees) How It Works
Setting up /evolve (GOALS.md, fitness loop) Evolve Setup
Science, systems theory, prior art The Science
Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL

Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)


The ao CLI

The ao CLI adds the knowledge flywheel (extract, inject, decay, maturity) and terminal-based RPI that runs without an active chat session.

ao seed                                        # Plant AgentOps in any repo (auto-detects project type)
ao rpi loop --supervisor --max-cycles 1        # Canonical autonomous cycle (policy-gated landing)
ao rpi loop --supervisor "fix auth bug"        # Single explicit-goal supervised cycle
ao rpi phased --from=implementation "ag-058"   # Resume a specific phased run at build phase
ao rpi parallel --manifest epics.json          # Run N epics concurrently in isolated worktrees
ao rpi status --watch                          # Monitor active/terminal runs

Walk away, come back to committed code + extracted learnings.

ao search "query"              # Search knowledge across files and chat history
ao lookup --query "topic"      # Retrieve specific knowledge artifacts by ID or relevance
ao notebook update             # Merge latest session insights into MEMORY.md
ao memory sync                 # Sync session history to MEMORY.md (cross-runtime: Codex, OpenCode)
ao context assemble            # Build 5-section context briefing for a task
ao feedback-loop               # Close the MemRL feedback loop (citation → utility → maturity)
ao metrics health              # Flywheel health: sigma, rho, delta, escape velocity
ao dedup                       # Detect near-duplicate learnings (--merge for auto-resolution)
ao contradict                  # Detect potentially contradictory learnings
ao demo                        # Interactive demo

ao search (built on CASS) indexes every chat session from every runtime that writes to .agents/ao/sessions/.

Second Brain + Obsidian vault — semantic search over all your sessions

.agents/ is plain text — open it as an Obsidian vault for browsing and linking. For semantic search, pair with Smart Connections (local embeddings, MCP server for agent retrieval).

Full reference: CLI Commands


Architecture

One recursive shape at every scale:

/implement ── one worker, one issue, one verify cycle
    └── /crank ── waves of /implement (FIRE loop)
        └── /rpi ── research → plan → crank → validate → learn
            └── /evolve ── fitness-gated /rpi cycles

Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem. Orchestrators stay in the main session; workers fork into subagents. See SKILL-TIERS.md for the full classification.

Topic Where
Five pillars, operational invariants Architecture
Brownian Ratchet, Ralph Wiggum, context windowing How It Works
Orchestrator vs worker fork rules Skill Tiers
Injection philosophy, freshness decay, MemRL The Science
Primitive chains (audited map) Primitive Chains
Context lifecycle, three-tier injection Context Lifecycle

How AgentOps Fits With Other Tools

Alternative What it does well Where AgentOps focuses differently
GSD Clean subagent spawning, fights context rot Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions)
Compound Engineer Knowledge compounding, structured loop Multi-model councils and validation gates — independent judges debating before and after code ships

Detailed comparisons →


FAQ

docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.


Contributing

Issue tracking — Beads / bd

Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd vc status (optional Dolt state check; JSONL auto-sync is automatic). More: AGENTS.md

See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.

License

Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog