Plexus is an AI Agent Operating System designed for analyzing content streams, orchestrating complex workflows, and taking action at scale.
It transforms the chaotic process of managing hundreds of AI prompts and classifiers into a structured, engineering-grade discipline. By combining a robust backend, a real-time dashboard, and deep integration with AI agents, Plexus enables teams to build, deploy, and improve AI solutions without managing low-level infrastructure.
- AI Agent Integration (MCP): First-class support for AI agents (like Claude/Cursor) to interact with the system, create configurations, and run analyses via the Model Context Protocol.
- Real-Time Dashboard: A modern Next.js application for monitoring activity, managing scorecards, and visualizing performance metrics.
- Scorecard System: Organize disparate classification tasks into versioned, managed "Scorecards" with clear lineage and configuration.
- Feedback Alignment: A closed-loop system for capturing human feedback, analyzing disagreements, and continuously improving AI performance.
- Evaluation Framework: Comprehensive tools for running accuracy tests, regression testing, and performance benchmarking.
- Procedures & Experiments: Orchestrate complex, multi-step AI workflows (using LangGraph) that go beyond simple classification.
Plexus encapsulates a robust cognitive framework that standardizes how AI systems process information. Instead of writing raw code or prompting LLMs in isolation, you define your domain using Plexus's specialized vocabulary and Domain-Specific Languages (DSLs).
The system is built around a set of strong, opinionated primitives that form the "grammar" of AI operations:
- Item: The fundamental unit of content (text, audio transcript, document) to be processed.
- Scorecard: A collection of related scoring logic and configurations.
- Score: A specific cognitive task (e.g., "Is this customer angry?", "Extract the purchase date").
- Score Result: The outcome of applying a Score to an Item (includes value, confidence, explanation, and metadata).
- Prediction: A tentative Score Result generated by an AI model, waiting for validation.
- Evaluation: The process of measuring the accuracy of Predictions against a "Gold Set" or human judgment.
- Feedback: Human correction or validation of a Prediction.
- Feedback Item: A specific instance where human judgment diverged from (or confirmed) AI judgment, used for training alignment.
- Dataset: A curated collection of Items used for training, testing, or benchmarking.
- Data Source: The origin stream for Items (e.g., database query, API feed).
Plexus uses configuration-as-code to define cognitive processes:
-
Score Configuration (YAML): Defines how to think.
- Specifies the model provider (OpenAI, Anthropic), parameters, and prompt structure.
- Defines the graph of cognitive steps (using LangGraph-compatible nodes).
- detailed control over inputs, outputs, and processing logic.
-
Data Source Configuration (YAML): Defines what to process.
- specific SQL or API queries to fetch Items.
- Filtering and windowing logic for creating Datasets.
-
Lua Scripting (Embedded): Defines programmatic logic.
- Embedded directly within YAML configurations for lightweight, fast execution.
- Preferred over Python for its simpler syntax and ability to be adapted into a domain-specific dialect.
- Allows for complex conditional logic and data transformation without leaving the configuration file.
By using these DSLs, you elevate your work from "writing scripts" to "architecting cognitive systems," ensuring reproducibility, version control, and scalability.
Plexus embodies the "Everything as Code" architectural principle from top to bottom. This is not just for DevOps; it is the foundational strategy for integrating AI at every level.
- Infrastructure as Code (IaC): The entire AWS cloud environment (Lambda, DynamoDB, SQS) is defined in code (CDK), making the substrate itself versionable and reproducible.
- Cognition as Code: AI behaviors, prompts, and logic are defined in DSLs (YAML/Lua) rather than opaque model weights.
AI agents excel at reading and iteratively editing code. By structuring the entire system as code, Plexus enables:
- Self-Evolving Agents: AI agents can improve their own performance by iteratively editing their own configuration code (DSLs).
- Data Flywheels: The system supports online learning and human-in-the-loop patterns where human feedback directly informs the next iteration of the configuration code.
- Self-Alignment: Over time, the system "aligns" itself to human intent by constantly refining its logic based on feedback, creating a system that gets smarter and more accurate automatically.
Plexus is built on a modern, scalable stack:
- Frontend: Next.js 14, AWS Amplify Gen2, Shadcn UI, Tailwind CSS
- Backend: Python 3.11, GraphQL (AWS AppSync), Celery
- Infrastructure: AWS CDK (Lambda, SQS, DynamoDB), Docker
- AI Orchestration: LangChain, LangGraph
- AI Providers: OpenAI, Anthropic, AWS Bedrock
- Python 3.11 (Required)
- Node.js 18+
- AWS Account (for deployment)
-
Clone the repository:
git clone https://github.com/AnthusAI/Plexus.git cd Plexus -
Install Python dependencies:
pip install -e . -
Set up configuration: Copy the example config and update with your credentials:
cp plexus.yaml.example .plexus/config.yamlThe dashboard is a standard Next.js application located in the dashboard/ directory.
cd dashboard
npm install
npm run devSee dashboard/README.md for full details.
Scorecards are the top-level containers for your classification logic. You can create them via the Dashboard or using AI agents via MCP.
Scores are defined using YAML configuration files that specify the model, prompt, and logic.
Tip: Use the plexus-score-config-updater agent to safely manage these configurations.
Validate your scores against ground-truth data (or human feedback) to ensure accuracy.
plexus evaluate accuracy --scorecard-name "My Scorecard" --score-name "My Score"Or use the plexus_evaluation_run MCP tool.
The "Flywheel" of Plexus:
- AI makes a prediction
- Human reviews and corrects (if wrong)
- Plexus captures the feedback
- AI analyzes the error patterns
- Configuration is updated to fix the error
Plexus is designed to be operated by AI agents as much as by humans. The /MCP directory contains a fully-featured Model Context Protocol server.
- AGENTS.md: Read the full Agent Integration Guide
- Capabilities: Agents can read data, update configurations, run tests, and analyze results.
- Safety: The system includes specialized "Agents" and "Skills" in
.claude/to ensure safe operations (e.g., validating YAML before pushing).
- Python: We use
pytestfor testing.pytest
- TypeScript: We use
Jestfor frontend testing.cd dashboard && npm run test
- Infrastructure: Deployed via AWS CDK.
cd infrastructure && cdk deploy
- Agent Integration Guide - How to use Plexus with AI agents.
- Dashboard Documentation - Frontend setup and features.
- MCP Server - Technical details of the MCP implementation.
Plexus is open-source under the MIT license.