Skip to content

An orchestration system for managing text classification at scale using LLMs, ML models, and NLP.

License

Notifications You must be signed in to change notification settings

AnthusAI/Plexus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plexus

Overview

Plexus is an AI Agent Operating System designed for analyzing content streams, orchestrating complex workflows, and taking action at scale.

It transforms the chaotic process of managing hundreds of AI prompts and classifiers into a structured, engineering-grade discipline. By combining a robust backend, a real-time dashboard, and deep integration with AI agents, Plexus enables teams to build, deploy, and improve AI solutions without managing low-level infrastructure.

Core Features

  • AI Agent Integration (MCP): First-class support for AI agents (like Claude/Cursor) to interact with the system, create configurations, and run analyses via the Model Context Protocol.
  • Real-Time Dashboard: A modern Next.js application for monitoring activity, managing scorecards, and visualizing performance metrics.
  • Scorecard System: Organize disparate classification tasks into versioned, managed "Scorecards" with clear lineage and configuration.
  • Feedback Alignment: A closed-loop system for capturing human feedback, analyzing disagreements, and continuously improving AI performance.
  • Evaluation Framework: Comprehensive tools for running accuracy tests, regression testing, and performance benchmarking.
  • Procedures & Experiments: Orchestrate complex, multi-step AI workflows (using LangGraph) that go beyond simple classification.

Domain-Specific Cognitive Framework

Plexus encapsulates a robust cognitive framework that standardizes how AI systems process information. Instead of writing raw code or prompting LLMs in isolation, you define your domain using Plexus's specialized vocabulary and Domain-Specific Languages (DSLs).

The Building Blocks (Nouns & Verbs)

The system is built around a set of strong, opinionated primitives that form the "grammar" of AI operations:

  • Item: The fundamental unit of content (text, audio transcript, document) to be processed.
  • Scorecard: A collection of related scoring logic and configurations.
  • Score: A specific cognitive task (e.g., "Is this customer angry?", "Extract the purchase date").
  • Score Result: The outcome of applying a Score to an Item (includes value, confidence, explanation, and metadata).
  • Prediction: A tentative Score Result generated by an AI model, waiting for validation.
  • Evaluation: The process of measuring the accuracy of Predictions against a "Gold Set" or human judgment.
  • Feedback: Human correction or validation of a Prediction.
  • Feedback Item: A specific instance where human judgment diverged from (or confirmed) AI judgment, used for training alignment.
  • Dataset: A curated collection of Items used for training, testing, or benchmarking.
  • Data Source: The origin stream for Items (e.g., database query, API feed).

Domain-Specific Languages (DSLs)

Plexus uses configuration-as-code to define cognitive processes:

  1. Score Configuration (YAML): Defines how to think.

    • Specifies the model provider (OpenAI, Anthropic), parameters, and prompt structure.
    • Defines the graph of cognitive steps (using LangGraph-compatible nodes).
    • detailed control over inputs, outputs, and processing logic.
  2. Data Source Configuration (YAML): Defines what to process.

    • specific SQL or API queries to fetch Items.
    • Filtering and windowing logic for creating Datasets.
  3. Lua Scripting (Embedded): Defines programmatic logic.

    • Embedded directly within YAML configurations for lightweight, fast execution.
    • Preferred over Python for its simpler syntax and ability to be adapted into a domain-specific dialect.
    • Allows for complex conditional logic and data transformation without leaving the configuration file.

By using these DSLs, you elevate your work from "writing scripts" to "architecting cognitive systems," ensuring reproducibility, version control, and scalability.

Everything as Code & AI-Native Architecture

Plexus embodies the "Everything as Code" architectural principle from top to bottom. This is not just for DevOps; it is the foundational strategy for integrating AI at every level.

  • Infrastructure as Code (IaC): The entire AWS cloud environment (Lambda, DynamoDB, SQS) is defined in code (CDK), making the substrate itself versionable and reproducible.
  • Cognition as Code: AI behaviors, prompts, and logic are defined in DSLs (YAML/Lua) rather than opaque model weights.

Why This Matters for AI

AI agents excel at reading and iteratively editing code. By structuring the entire system as code, Plexus enables:

  • Self-Evolving Agents: AI agents can improve their own performance by iteratively editing their own configuration code (DSLs).
  • Data Flywheels: The system supports online learning and human-in-the-loop patterns where human feedback directly informs the next iteration of the configuration code.
  • Self-Alignment: Over time, the system "aligns" itself to human intent by constantly refining its logic based on feedback, creating a system that gets smarter and more accurate automatically.

Architecture

Plexus is built on a modern, scalable stack:

  • Frontend: Next.js 14, AWS Amplify Gen2, Shadcn UI, Tailwind CSS
  • Backend: Python 3.11, GraphQL (AWS AppSync), Celery
  • Infrastructure: AWS CDK (Lambda, SQS, DynamoDB), Docker
  • AI Orchestration: LangChain, LangGraph
  • AI Providers: OpenAI, Anthropic, AWS Bedrock

Getting Started

Prerequisites

  • Python 3.11 (Required)
  • Node.js 18+
  • AWS Account (for deployment)

Installation

  1. Clone the repository:

    git clone https://github.com/AnthusAI/Plexus.git
    cd Plexus
  2. Install Python dependencies:

    pip install -e .
  3. Set up configuration: Copy the example config and update with your credentials:

cp plexus.yaml.example .plexus/config.yaml

Running the Dashboard

The dashboard is a standard Next.js application located in the dashboard/ directory.

cd dashboard
npm install
npm run dev

See dashboard/README.md for full details.

Key Workflows

1. Creating Scorecards

Scorecards are the top-level containers for your classification logic. You can create them via the Dashboard or using AI agents via MCP.

2. Configuring Scores

Scores are defined using YAML configuration files that specify the model, prompt, and logic. Tip: Use the plexus-score-config-updater agent to safely manage these configurations.

3. Running Evaluations

Validate your scores against ground-truth data (or human feedback) to ensure accuracy.

plexus evaluate accuracy --scorecard-name "My Scorecard" --score-name "My Score"

Or use the plexus_evaluation_run MCP tool.

4. Feedback Alignment

The "Flywheel" of Plexus:

  1. AI makes a prediction
  2. Human reviews and corrects (if wrong)
  3. Plexus captures the feedback
  4. AI analyzes the error patterns
  5. Configuration is updated to fix the error

AI Agent Integration

Plexus is designed to be operated by AI agents as much as by humans. The /MCP directory contains a fully-featured Model Context Protocol server.

  • AGENTS.md: Read the full Agent Integration Guide
  • Capabilities: Agents can read data, update configurations, run tests, and analyze results.
  • Safety: The system includes specialized "Agents" and "Skills" in .claude/ to ensure safe operations (e.g., validating YAML before pushing).

Development

  • Python: We use pytest for testing.
    pytest
  • TypeScript: We use Jest for frontend testing.
    cd dashboard && npm run test
  • Infrastructure: Deployed via AWS CDK.
    cd infrastructure && cdk deploy

Documentation Links

License

Plexus is open-source under the MIT license.

About

An orchestration system for managing text classification at scale using LLMs, ML models, and NLP.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7