Opus Nx turns model reasoning traces into persistent, inspectable artifacts you can explore, verify, rerun, and improve. Every thinking step becomes a node in a navigable graph β not a black box.
This repository is positioned for research and open-source collaboration. Run it locally with your own credentials.
A live reasoning session: 17 thinking nodes, typed edges (influence, support, contradiction, refinement), compaction boundaries, and fork branches β all persisted and queryable.
Most AI workflows keep only final answers. Opus Nx keeps the entire reasoning path and supports policy improvement over time.
| Standard AI UX | Opus Nx |
|---|---|
| Final answer only | Persistent reasoning graph artifacts |
| Single perspective | 6-agent swarm + 4-style branching workflows |
| Limited traceability | Decision points, typed edges, step verification, lifecycle state |
| Prompt-only iteration | Promote β rerun β compare β retain loops |
| Ephemeral context | 3-tier memory hierarchy (working β recall β archival) |
Full visual walkthrough with all screenshots:
docs/features.md
Every extended thinking session becomes a graph of discrete reasoning steps β nodes scored for confidence, connected by typed edges showing how ideas flow, branch, and build on each other.
Reasoning nodes colored by type (thinking, compaction, fork branch) with edges showing influence, support, contradiction, and refinement relationships. Minimap in corner for navigation.
Deploy a swarm of 6 specialized AI agents that collaborate in real-time via WebSocket streaming. Maestro decomposes the problem, DeepThinker analyzes, Contrarian challenges, Verifier validates, Synthesizer merges, and Metacognition audits.
The Synthesizer agent merging perspectives from all agents into a coherent framework, with live session stats (17 nodes, 62K tokens) and human-in-the-loop checkpoints.
Explore problems using arbitrary reasoning graphs with BFS, DFS, or best-first search. Thoughts branch, aggregate, and get verified at each depth level β a visual implementation of Besta et al. (2023).
A 4-depth GoT tree with 8 branches. Each node shows its thought, confidence score (40%-94%), verification status (Verified/Aggregated), and reasoning path. Color-coded by depth level.
Process Reward Model verifies each reasoning step independently. See structured steps (CONSIDERATION β HYPOTHESIS β EVALUATION β CONCLUSION) with confidence scores, decision counts, and edge relationships.
13 structured reasoning steps extracted from a single thinking pass. Each step typed (Consideration, Hypothesis, Evaluation, Conclusion) with 1.6K thinking tokens and 13 decision points persisted.
Final steps: EVALUATION β MAIN CONCLUSION β MODEL OUTPUT. The reasoning chain ends with a persisted artifact showing both the internal deliberation and the final structured response.
Fork any question into 4 concurrent reasoning styles: conservative, aggressive, balanced, and contrarian. Each branch reasons independently, then results are compared with confidence scores and key points.
Left: Fork analysis showing 4 perspectives with confidence scores (45%β82%) and synthesis. Right: Metacognitive Insights panel with 3 biases, 3 patterns, and 1 improvement idea detected.
A 3-tier memory system: working context (active reasoning), recall buffer (recent history), and archival storage (long-term knowledge). Entries persist across sessions with semantic search and importance scoring.
Left: Memory hierarchy showing 4 entries in Main Context, 4 in Recall, with importance scores and source types. Right: Session stats with confidence breakdown and token usage visualization.
- ThinkGraph β Persistent reasoning graphs with queryable nodes and typed edges
- ThinkFork β 4-style branching, steering, and debate mode
- PRM Verification β Step-level verification with structured reasoning extraction
- Agent Swarm β 6-agent orchestration with WebSocket streaming
- Graph of Thoughts β BFS/DFS/best-first search over thought trees
- Metacognitive Insights β Bias detection, pattern recognition, improvement hypotheses
- Memory Hierarchy β 3-tier MemGPT-style memory with semantic retrieval
- Hypothesis Lifecycle β Promote β rerun β compare β retain loops
- Session Sharing β Persistent sessions with replay and sharing
- Evaluation Harnesses β Retrieval benchmarks and quality metrics
Two-service runtime with shared persistence:
Browser
-> Next.js web app (apps/web)
-> packages/core reasoning modules
-> packages/db Supabase access
-> swarm proxy routes
-> Python FastAPI swarm service (agents)
-> Supabase Postgres + pgvector
See full architecture details: docs/architecture.md
git clone https://github.com/omerakben/opus-nx.git
cd opus-nx
./scripts/dev-start.shThe setup script handles everything: prerequisites check, dependency install, env bootstrap, connection verify, build, and launch. It will prompt you for API credentials on first run.
Run everything locally with just an Anthropic API key β no Supabase cloud account needed. Data stays on your machine.
Prerequisites: Docker, Node.js 22+, pnpm. Optional: Python 3.12+ and uv for the agent swarm.
git clone https://github.com/omerakben/opus-nx.git
cd opus-nx
./scripts/docker-start.shThe script handles everything: checks prerequisites, copies .env.docker to .env, prompts for your Anthropic API key, starts a local PostgreSQL + pgvector database in Docker, installs all dependencies, builds the project, and launches the dev servers.
When it's done, open http://localhost:3000 in your browser.
Or step by step:
cp .env.docker .env
# Edit .env β add your ANTHROPIC_API_KEY
docker compose -f docker-compose.local.yml up -d # Start local DB
pnpm install && pnpm build && pnpm dev # Install, build, run| Service | URL | Purpose |
|---|---|---|
| Dashboard | http://localhost:3000 |
Next.js web app β open this in your browser |
| Agent Swarm | http://localhost:8000 |
Python FastAPI backend (auto-starts if uv is installed) |
| REST API | http://localhost:54321 |
Supabase-compatible DB API (used internally) |
| PostgreSQL | localhost:54322 |
Direct DB access (psql, pgAdmin) |
# Lifecycle
./scripts/docker-start.sh --stop # Stop everything (dev servers + database)
./scripts/docker-start.sh --reset # Wipe database and start fresh
./scripts/docker-start.sh --db-only # Start only the database (no dev servers)
# Database access
docker exec -it opus-nx-postgres psql -U postgres -d opus_nx # Direct SQL access
docker compose -f docker-compose.local.yml logs -f postgres # Stream DB logs- Node.js >= 22
- pnpm 9.x
- Python 3.12+
- uv
git clone https://github.com/omerakben/opus-nx.git
cd opus-nx
pnpm installpnpm setupThis creates .env and agents/.env if missing and aligns AUTH_SECRET across both files.
Required values:
ANTHROPIC_API_KEYAUTH_SECRETSUPABASE_URLSUPABASE_SERVICE_ROLE_KEYSUPABASE_ANON_KEY
pnpm setup:verifypnpm devOptional local swarm backend:
cd agents
uv run uvicorn src.main:app --reload --port 8000Use your own provider accounts and keys.
- Do not rely on maintainer personal credentials
- Keep
AUTH_SECRETconsistent across web and agents - Treat demo mode as optional (
DEMO_MODE=trueonly when intentionally enabled)
pnpm dev # Start all dev servers
pnpm lint # Lint all packages
pnpm typecheck # Type-check all packages
pnpm test # Run tests
pnpm db:migrate # Run Supabase migrations
pnpm setup # Bootstrap env files
pnpm setup:verify # Verify API connections
./scripts/dev-start.sh # Full setup + launch (recommended)
./scripts/docker-start.sh # Docker local DB + dev servers
./scripts/docker-start.sh --db-only # Docker DB only (no dev servers)Agent tests:
cd agents
uv run pytestPOST /api/thinkingβ Extended thinking requestPOST /api/thinking/streamβ SSE streaming for thinking deltasPOST /api/forkβ ThinkFork parallel reasoningPOST /api/verifyβ PRM step-by-step verificationPOST /api/gotβ Graph of Thoughts reasoning
GET/POST /api/sessionsβ List/create sessionsGET /api/sessions/[sessionId]/nodesβ Get thinking nodesGET /api/reasoning/[id]β Get reasoning node details
POST /api/swarmβ Initiate multi-agent swarmGET /api/swarm/tokenβ WebSocket auth tokenPOST /api/swarm/[sessionId]/checkpointβ Human-in-the-loop checkpoint
GET/POST /api/memoryβ Hierarchical memory operationsGET/POST /api/insightsβ Metacognitive insights
| Layer | Technology |
|---|---|
| LLM | Claude Opus 4.6 (50K extended thinking budget) |
| Dashboard | Next.js 16, React 19, Tailwind CSS 4, shadcn/ui |
| Agent Swarm | Python 3.12, FastAPI, Anthropic SDK, NetworkX |
| Database | Supabase (PostgreSQL + pgvector with HNSW indexes) |
| Visualization | @xyflow/react (react-flow) |
| Deployment | Vercel (dashboard) + Fly.io (agents) |
| Runtime | Node.js 22+, TypeScript 5.7+ |
| Testing | Vitest 4, Playwright, pytest |
Implemented concepts are grounded in:
| Paper | Module | Key Contribution |
|---|---|---|
| Tree of Thoughts (Yao et al., 2023) | ThinkFork | BFS/DFS search over reasoning trees with state evaluation |
| Let's Verify Step by Step (Lightman et al., 2023) | PRM Verifier | Process supervision β verify each reasoning step independently |
| Graph of Thoughts (Besta et al., 2023) | GoT Engine | Arbitrary thought graph topology with aggregation and refinement |
| MemGPT (Packer et al., 2023) | Memory Hierarchy | 3-tier memory hierarchy with paging and auto-eviction |
See:
- Visual feature guide:
docs/features.md - Canonical docs index:
docs/README.md - PRD:
docs/prd.md - Architecture:
docs/architecture.md - Runbooks:
docs/runbooks/ - Historical docs archive:
docs/archive/build-history/
Contributions are welcome.
- Contribution guide:
CONTRIBUTING.md - Code of conduct:
CODE_OF_CONDUCT.md
Priority areas:
- Reasoning quality and evaluation rigor
- Setup ergonomics and onboarding
- Lifecycle and experiment UX
- Reliability and observability
Ozzy β AI Engineer & Full-Stack Developer
TUEL AI β AI Research Platform
Claude β AI Research Partner (Anthropic)
A human + AI collaboration exploring persistent reasoning artifacts.
MIT