Skip to content

🧠 Open-source AI reasoning research platform β€” persistent reasoning graphs, multi-agent swarms, Graph-of-Thoughts, and MemGPT-style memory. Built with Claude Opus 4.6 by Anthropic. Explore, verify, and fork AI thinking in real-time.

Notifications You must be signed in to change notification settings

omerakben/opus-nx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

122 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Opus Nx

Open-source platform for persistent AI reasoning research

Opus Nx turns model reasoning traces into persistent, inspectable artifacts you can explore, verify, rerun, and improve. Every thinking step becomes a node in a navigable graph β€” not a black box.

This repository is positioned for research and open-source collaboration. Run it locally with your own credentials.

Opus Nx β€” Persistent Reasoning Graph
A live reasoning session: 17 thinking nodes, typed edges (influence, support, contradiction, refinement), compaction boundaries, and fork branches β€” all persisted and queryable.


Why Opus Nx

Most AI workflows keep only final answers. Opus Nx keeps the entire reasoning path and supports policy improvement over time.

Standard AI UX Opus Nx
Final answer only Persistent reasoning graph artifacts
Single perspective 6-agent swarm + 4-style branching workflows
Limited traceability Decision points, typed edges, step verification, lifecycle state
Prompt-only iteration Promote β†’ rerun β†’ compare β†’ retain loops
Ephemeral context 3-tier memory hierarchy (working β†’ recall β†’ archival)

See It In Action

Full visual walkthrough with all screenshots: docs/features.md

Persistent Reasoning Graphs (ThinkGraph)

Every extended thinking session becomes a graph of discrete reasoning steps β€” nodes scored for confidence, connected by typed edges showing how ideas flow, branch, and build on each other.

ThinkGraph β€” Reasoning nodes with typed edges
Reasoning nodes colored by type (thinking, compaction, fork branch) with edges showing influence, support, contradiction, and refinement relationships. Minimap in corner for navigation.

6-Agent Swarm Orchestration

Deploy a swarm of 6 specialized AI agents that collaborate in real-time via WebSocket streaming. Maestro decomposes the problem, DeepThinker analyzes, Contrarian challenges, Verifier validates, Synthesizer merges, and Metacognition audits.

Agent Swarm β€” 6 specialists collaborating
The Synthesizer agent merging perspectives from all agents into a coherent framework, with live session stats (17 nodes, 62K tokens) and human-in-the-loop checkpoints.

Graph of Thoughts (GoT)

Explore problems using arbitrary reasoning graphs with BFS, DFS, or best-first search. Thoughts branch, aggregate, and get verified at each depth level β€” a visual implementation of Besta et al. (2023).

Graph of Thoughts β€” Tree search visualization
A 4-depth GoT tree with 8 branches. Each node shows its thought, confidence score (40%-94%), verification status (Verified/Aggregated), and reasoning path. Color-coded by depth level.

Step-by-Step Verification (PRM)

Process Reward Model verifies each reasoning step independently. See structured steps (CONSIDERATION β†’ HYPOTHESIS β†’ EVALUATION β†’ CONCLUSION) with confidence scores, decision counts, and edge relationships.

Structured Reasoning β€” Step verification
13 structured reasoning steps extracted from a single thinking pass. Each step typed (Consideration, Hypothesis, Evaluation, Conclusion) with 1.6K thinking tokens and 13 decision points persisted.

Reasoning conclusion and model output
Final steps: EVALUATION β†’ MAIN CONCLUSION β†’ MODEL OUTPUT. The reasoning chain ends with a persisted artifact showing both the internal deliberation and the final structured response.

ThinkFork β€” 4-Style Divergent Analysis

Fork any question into 4 concurrent reasoning styles: conservative, aggressive, balanced, and contrarian. Each branch reasons independently, then results are compared with confidence scores and key points.

ThinkFork β€” 4 divergent perspectives Metacognitive Insights β€” Improvement ideas
Left: Fork analysis showing 4 perspectives with confidence scores (45%–82%) and synthesis. Right: Metacognitive Insights panel with 3 biases, 3 patterns, and 1 improvement idea detected.

Memory Hierarchy (MemGPT-inspired)

A 3-tier memory system: working context (active reasoning), recall buffer (recent history), and archival storage (long-term knowledge). Entries persist across sessions with semantic search and importance scoring.

Memory Hierarchy β€” 3-tier MemGPT system Session Stats β€” Token breakdown
Left: Memory hierarchy showing 4 entries in Main Context, 4 in Recall, with importance scores and source types. Right: Session stats with confidence breakdown and token usage visualization.


Current Capabilities

  1. ThinkGraph β€” Persistent reasoning graphs with queryable nodes and typed edges
  2. ThinkFork β€” 4-style branching, steering, and debate mode
  3. PRM Verification β€” Step-level verification with structured reasoning extraction
  4. Agent Swarm β€” 6-agent orchestration with WebSocket streaming
  5. Graph of Thoughts β€” BFS/DFS/best-first search over thought trees
  6. Metacognitive Insights β€” Bias detection, pattern recognition, improvement hypotheses
  7. Memory Hierarchy β€” 3-tier MemGPT-style memory with semantic retrieval
  8. Hypothesis Lifecycle β€” Promote β†’ rerun β†’ compare β†’ retain loops
  9. Session Sharing β€” Persistent sessions with replay and sharing
  10. Evaluation Harnesses β€” Retrieval benchmarks and quality metrics

Architecture

Two-service runtime with shared persistence:

Browser
  -> Next.js web app (apps/web)
      -> packages/core reasoning modules
      -> packages/db Supabase access
      -> swarm proxy routes
  -> Python FastAPI swarm service (agents)
  -> Supabase Postgres + pgvector

See full architecture details: docs/architecture.md

Quick Start

One-Command Setup

git clone https://github.com/omerakben/opus-nx.git
cd opus-nx
./scripts/dev-start.sh

The setup script handles everything: prerequisites check, dependency install, env bootstrap, connection verify, build, and launch. It will prompt you for API credentials on first run.

Docker Quick Start (Local Database)

Run everything locally with just an Anthropic API key β€” no Supabase cloud account needed. Data stays on your machine.

Prerequisites: Docker, Node.js 22+, pnpm. Optional: Python 3.12+ and uv for the agent swarm.

git clone https://github.com/omerakben/opus-nx.git
cd opus-nx
./scripts/docker-start.sh

The script handles everything: checks prerequisites, copies .env.docker to .env, prompts for your Anthropic API key, starts a local PostgreSQL + pgvector database in Docker, installs all dependencies, builds the project, and launches the dev servers.

When it's done, open http://localhost:3000 in your browser.

Or step by step:

cp .env.docker .env
# Edit .env β†’ add your ANTHROPIC_API_KEY

docker compose -f docker-compose.local.yml up -d    # Start local DB
pnpm install && pnpm build && pnpm dev              # Install, build, run
Service URL Purpose
Dashboard http://localhost:3000 Next.js web app β€” open this in your browser
Agent Swarm http://localhost:8000 Python FastAPI backend (auto-starts if uv is installed)
REST API http://localhost:54321 Supabase-compatible DB API (used internally)
PostgreSQL localhost:54322 Direct DB access (psql, pgAdmin)
# Lifecycle
./scripts/docker-start.sh --stop       # Stop everything (dev servers + database)
./scripts/docker-start.sh --reset      # Wipe database and start fresh
./scripts/docker-start.sh --db-only    # Start only the database (no dev servers)

# Database access
docker exec -it opus-nx-postgres psql -U postgres -d opus_nx  # Direct SQL access
docker compose -f docker-compose.local.yml logs -f postgres   # Stream DB logs

Manual Setup

1) Prerequisites

  1. Node.js >= 22
  2. pnpm 9.x
  3. Python 3.12+
  4. uv

2) Install

git clone https://github.com/omerakben/opus-nx.git
cd opus-nx
pnpm install

3) Bootstrap local env

pnpm setup

This creates .env and agents/.env if missing and aligns AUTH_SECRET across both files.

4) Add your own credentials

Required values:

  1. ANTHROPIC_API_KEY
  2. AUTH_SECRET
  3. SUPABASE_URL
  4. SUPABASE_SERVICE_ROLE_KEY
  5. SUPABASE_ANON_KEY

5) Verify setup

pnpm setup:verify

6) Run

pnpm dev

Optional local swarm backend:

cd agents
uv run uvicorn src.main:app --reload --port 8000

Credential Ownership Model

Use your own provider accounts and keys.

  • Do not rely on maintainer personal credentials
  • Keep AUTH_SECRET consistent across web and agents
  • Treat demo mode as optional (DEMO_MODE=true only when intentionally enabled)

Key Commands

pnpm dev                        # Start all dev servers
pnpm lint                       # Lint all packages
pnpm typecheck                  # Type-check all packages
pnpm test                       # Run tests
pnpm db:migrate                 # Run Supabase migrations
pnpm setup                      # Bootstrap env files
pnpm setup:verify               # Verify API connections
./scripts/dev-start.sh          # Full setup + launch (recommended)
./scripts/docker-start.sh       # Docker local DB + dev servers
./scripts/docker-start.sh --db-only  # Docker DB only (no dev servers)

Agent tests:

cd agents
uv run pytest

API Groups

Reasoning

  • POST /api/thinking β€” Extended thinking request
  • POST /api/thinking/stream β€” SSE streaming for thinking deltas
  • POST /api/fork β€” ThinkFork parallel reasoning
  • POST /api/verify β€” PRM step-by-step verification
  • POST /api/got β€” Graph of Thoughts reasoning

Sessions and Artifacts

  • GET/POST /api/sessions β€” List/create sessions
  • GET /api/sessions/[sessionId]/nodes β€” Get thinking nodes
  • GET /api/reasoning/[id] β€” Get reasoning node details

Swarm

  • POST /api/swarm β€” Initiate multi-agent swarm
  • GET /api/swarm/token β€” WebSocket auth token
  • POST /api/swarm/[sessionId]/checkpoint β€” Human-in-the-loop checkpoint

Memory

  • GET/POST /api/memory β€” Hierarchical memory operations
  • GET/POST /api/insights β€” Metacognitive insights

Tech Stack

Layer Technology
LLM Claude Opus 4.6 (50K extended thinking budget)
Dashboard Next.js 16, React 19, Tailwind CSS 4, shadcn/ui
Agent Swarm Python 3.12, FastAPI, Anthropic SDK, NetworkX
Database Supabase (PostgreSQL + pgvector with HNSW indexes)
Visualization @xyflow/react (react-flow)
Deployment Vercel (dashboard) + Fly.io (agents)
Runtime Node.js 22+, TypeScript 5.7+
Testing Vitest 4, Playwright, pytest

Research Foundation

Implemented concepts are grounded in:

Paper Module Key Contribution
Tree of Thoughts (Yao et al., 2023) ThinkFork BFS/DFS search over reasoning trees with state evaluation
Let's Verify Step by Step (Lightman et al., 2023) PRM Verifier Process supervision β€” verify each reasoning step independently
Graph of Thoughts (Besta et al., 2023) GoT Engine Arbitrary thought graph topology with aggregation and refinement
MemGPT (Packer et al., 2023) Memory Hierarchy 3-tier memory hierarchy with paging and auto-eviction

See:

Documentation Map

Contributing

Contributions are welcome.

Priority areas:

  1. Reasoning quality and evaluation rigor
  2. Setup ergonomics and onboarding
  3. Lifecycle and experiment UX
  4. Reliability and observability

Built By

Ozzy β€” AI Engineer & Full-Stack Developer

TUEL AI β€” AI Research Platform

Claude β€” AI Research Partner (Anthropic)

A human + AI collaboration exploring persistent reasoning artifacts.

License

MIT

About

🧠 Open-source AI reasoning research platform β€” persistent reasoning graphs, multi-agent swarms, Graph-of-Thoughts, and MemGPT-style memory. Built with Claude Opus 4.6 by Anthropic. Explore, verify, and fork AI thinking in real-time.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •