Plexus

Overview

Plexus is an AI Agent Operating System designed for analyzing content streams, orchestrating complex workflows, and taking action at scale.

It transforms the chaotic process of managing hundreds of AI prompts and classifiers into a structured, engineering-grade discipline. By combining a robust backend, a real-time dashboard, and deep integration with AI agents, Plexus enables teams to build, deploy, and improve AI solutions without managing low-level infrastructure.

Core Features

AI Agent Integration (MCP): First-class support for AI agents (like Claude/Cursor) to interact with the system, create configurations, and run analyses via the Model Context Protocol.
Real-Time Dashboard: A modern Next.js application for monitoring activity, managing scorecards, and visualizing performance metrics.
Scorecard System: Organize disparate classification tasks into versioned, managed "Scorecards" with clear lineage and configuration.
Feedback Alignment: A closed-loop system for capturing human feedback, analyzing disagreements, and continuously improving AI performance.
Evaluation Framework: Comprehensive tools for running accuracy tests, regression testing, and performance benchmarking.
Procedures & Experiments: Orchestrate complex, multi-step AI workflows (using LangGraph) that go beyond simple classification.

Domain-Specific Cognitive Framework

Plexus encapsulates a robust cognitive framework that standardizes how AI systems process information. Instead of writing raw code or prompting LLMs in isolation, you define your domain using Plexus's specialized vocabulary and Domain-Specific Languages (DSLs).

The Building Blocks (Nouns & Verbs)

The system is built around a set of strong, opinionated primitives that form the "grammar" of AI operations:

Item: The fundamental unit of content (text, audio transcript, document) to be processed.
Scorecard: A collection of related scoring logic and configurations.
Score: A specific cognitive task (e.g., "Is this customer angry?", "Extract the purchase date").
Score Result: The outcome of applying a Score to an Item (includes value, confidence, explanation, and metadata).
Prediction: A tentative Score Result generated by an AI model, waiting for validation.
Evaluation: The process of measuring the accuracy of Predictions against a "Gold Set" or human judgment.
Feedback: Human correction or validation of a Prediction.
Feedback Item: A specific instance where human judgment diverged from (or confirmed) AI judgment, used for training alignment.
Dataset: A curated collection of Items used for training, testing, or benchmarking.
Data Source: The origin stream for Items (e.g., database query, API feed).

Domain-Specific Languages (DSLs)

Plexus uses configuration-as-code to define cognitive processes:

Score Configuration (YAML): Defines how to think.
- Specifies the model provider (OpenAI, Anthropic), parameters, and prompt structure.
- Defines the graph of cognitive steps (using LangGraph-compatible nodes).
- detailed control over inputs, outputs, and processing logic.
Data Source Configuration (YAML): Defines what to process.
- specific SQL or API queries to fetch Items.
- Filtering and windowing logic for creating Datasets.
Lua Scripting (Embedded): Defines programmatic logic.
- Embedded directly within YAML configurations for lightweight, fast execution.
- Preferred over Python for its simpler syntax and ability to be adapted into a domain-specific dialect.
- Allows for complex conditional logic and data transformation without leaving the configuration file.

By using these DSLs, you elevate your work from "writing scripts" to "architecting cognitive systems," ensuring reproducibility, version control, and scalability.

Everything as Code & AI-Native Architecture

Plexus embodies the "Everything as Code" architectural principle from top to bottom. This is not just for DevOps; it is the foundational strategy for integrating AI at every level.

Infrastructure as Code (IaC): The entire AWS cloud environment (Lambda, DynamoDB, SQS) is defined in code (CDK), making the substrate itself versionable and reproducible.
Cognition as Code: AI behaviors, prompts, and logic are defined in DSLs (YAML/Lua) rather than opaque model weights.

Why This Matters for AI

AI agents excel at reading and iteratively editing code. By structuring the entire system as code, Plexus enables:

Self-Evolving Agents: AI agents can improve their own performance by iteratively editing their own configuration code (DSLs).
Data Flywheels: The system supports online learning and human-in-the-loop patterns where human feedback directly informs the next iteration of the configuration code.
Self-Alignment: Over time, the system "aligns" itself to human intent by constantly refining its logic based on feedback, creating a system that gets smarter and more accurate automatically.

Architecture

Plexus is built on a modern, scalable stack:

Frontend: Next.js 14, AWS Amplify Gen2, Shadcn UI, Tailwind CSS
Backend: Python 3.11, GraphQL (AWS AppSync), Celery
Infrastructure: AWS CDK (Lambda, SQS, DynamoDB), Docker
AI Orchestration: LangChain, LangGraph
AI Providers: OpenAI, Anthropic, AWS Bedrock

Getting Started

Prerequisites

Python 3.11 (Required)
Node.js 18+
AWS Account (for deployment)

Installation

Clone the repository:

git clone https://github.com/AnthusAI/Plexus.git
cd Plexus

Install Python dependencies:
```
pip install -e .
```
Set up configuration: Copy the example config and update with your credentials:

cp plexus.yaml.example .plexus/config.yaml

Running the Dashboard

The dashboard is a standard Next.js application located in the dashboard/ directory.

cd dashboard
npm install
npm run dev

See dashboard/README.md for full details.

Key Workflows

1. Creating Scorecards

Scorecards are the top-level containers for your classification logic. You can create them via the Dashboard or using AI agents via MCP.

2. Configuring Scores

Scores are defined using YAML configuration files that specify the model, prompt, and logic. Tip: Use the plexus-score-config-updater agent to safely manage these configurations.

3. Running Evaluations

Validate your scores against ground-truth data (or human feedback) to ensure accuracy.

plexus evaluate accuracy --scorecard-name "My Scorecard" --score-name "My Score"

Or use the plexus_evaluation_run MCP tool.

4. Feedback Alignment

The "Flywheel" of Plexus:

AI makes a prediction
Human reviews and corrects (if wrong)
Plexus captures the feedback
AI analyzes the error patterns
Configuration is updated to fix the error

AI Agent Integration

Plexus is designed to be operated by AI agents as much as by humans. The /MCP directory contains a fully-featured Model Context Protocol server.

AGENTS.md: Read the full Agent Integration Guide
Capabilities: Agents can read data, update configurations, run tests, and analyze results.
Safety: The system includes specialized "Agents" and "Skills" in .claude/ to ensure safe operations (e.g., validating YAML before pushing).

Development

Python: We use pytest for testing.
```
pytest
```
TypeScript: We use Jest for frontend testing.
```
cd dashboard && npm run test
```
Infrastructure: Deployed via AWS CDK.
```
cd infrastructure && cdk deploy
```

Documentation Links

Agent Integration Guide - How to use Plexus with AI agents.
Dashboard Documentation - Frontend setup and features.
MCP Server - Technical details of the MCP implementation.

License

Plexus is open-source under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 3,733 Commits
.claude		.claude
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
.vscode		.vscode
MCP		MCP
dashboard		dashboard
docker		docker
docs		docs
documentation		documentation
examples		examples
features		features
infrastructure		infrastructure
lambda_functions/score_processing		lambda_functions/score_processing
plexus		plexus
score-processor-fanout-lambda		score-processor-fanout-lambda
score-processor-lambda		score-processor-lambda
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
plexus.yaml.example		plexus.yaml.example
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
scorecards		scorecards
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plexus

Overview

Core Features

Domain-Specific Cognitive Framework

The Building Blocks (Nouns & Verbs)

Domain-Specific Languages (DSLs)

Everything as Code & AI-Native Architecture

Why This Matters for AI

Architecture

Getting Started

Prerequisites

Installation

Running the Dashboard

Key Workflows

1. Creating Scorecards

2. Configuring Scores

3. Running Evaluations

4. Feedback Alignment

AI Agent Integration

Development

Documentation Links

License

About

Uh oh!

Releases 78

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

AnthusAI/Plexus

Folders and files

Latest commit

History

Repository files navigation

Plexus

Overview

Core Features

Domain-Specific Cognitive Framework

The Building Blocks (Nouns & Verbs)

Domain-Specific Languages (DSLs)

Everything as Code & AI-Native Architecture

Why This Matters for AI

Architecture

Getting Started

Prerequisites

Installation

Running the Dashboard

Key Workflows

1. Creating Scorecards

2. Configuring Scores

3. Running Evaluations

4. Feedback Alignment

AI Agent Integration

Development

Documentation Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 78

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages