py-jrag

A multi-agent Retrieval-Augmented Generation (RAG) system built with modern architecture patterns. The system processes complex queries through specialized AI agents, featuring real-time streaming responses, hybrid AI provider support, persistent memory management, and advanced tool orchestration.

Built with enterprise-grade reliability using clean architecture principles, the system separates business logic from presentation layers, enabling excellent testability and maintainability.

✨ Key Features

🤖 Multi-Agent Orchestration: Specialized agents handle query decomposition, information gathering, extraction, and response generation
☁️ Hybrid AI Provider Support: Seamlessly switch between local (Ollama), cloud (Groq/OpenAI), and Anthropic Claude providers
🧠 Persistent Memory System: Cross-session memory using FAISS vector storage and OpenAI embeddings
⚙️ Enterprise Configuration: Centralized TOML-based configuration with runtime reloading and environment management
⚡ Real-time Streaming: Asynchronous streaming responses with "thinking" process visualization and non-blocking execution
🔍 Advanced Search Integration: Semantic search, full-text search, and metadata filtering through Elasticsearch
🛠️ Intelligent Tool System: Automatic tool calling with Google-style docstring parsing and retry mechanisms
💬 Modern Chat Interface: Streamlit-powered interface with structured reasoning visualization and responsive design
🏗️ Clean Architecture: Business logic separated from UI for excellent testability and maintainability

🏛️ Architecture Overview

py-jrag implements a clean, modular architecture with the RAGOrchestrator as the core business logic component. The system processes complex queries through multiple specialized agents with support for both streaming and non-streaming responses.

Core Components

RAGOrchestrator (app/orchestrator.py) - Core business logic orchestrating the entire RAG workflow
Specialized Agents (agents/) - Domain-specific AI agents for different processing tasks
Memory System (memory/) - Persistent conversation memory with FAISS vector storage
AI Clients (configuration/) - Multi-provider AI integration (OpenAI, Groq, Ollama, Anthropic Claude)
Configuration System (configuration/configuration.py) - Enterprise-grade TOML-based configuration management

Processing Pipeline

flowchart TD
    A[👤 User Query] --> B[🎭 RAGOrchestrator]
    B --> C[🔍 Decoupler Agent]
    C --> D[❓ Sub-Questions]
    D --> E[🛠️ Crafter Agent]
    
    E --> F1[🔍 Semantic Search]
    E --> F2[📄 Full-Text Search] 
    E --> F3[🏷️ Metadata Search]
    
    F1 --> G[📊 Elasticsearch]
    F2 --> G
    F3 --> G
    
    G --> H[📝 Extractor Agent]
    H --> I[🗣️ Speaker Agent]
    I --> J[💬 Streaming Response]
    
    K[🌿 Linden Framework] -.-> C
    K -.-> E
    K -.-> H
    K -.-> I
    
    L[🤖 AI Providers] -.-> K
    M[💾 FAISS Memory] -.-> K
    
    style A fill:#e1f5fe
    style J fill:#e8f5e8
    style K fill:#fff3e0
    style L fill:#fce4ec
    style M fill:#f3e5f5

Flow: User Query → Decoupler → Crafter → Extractor → Speaker → Response

RAGOrchestrator - Core Business Logic

The RAGOrchestrator is the heart of the system, providing clean separation between business logic and UI.

Key Methods:

process_query(): Complete RAG processing pipeline that returns a final response
process_query_streaming(): RAG processing with real-time token streaming for interactive UIs
reset_crafter(): Clean crafter agent state between queries to prevent context contamination

Benefits of the Architecture:

✅ Testable: Business logic can be unit tested independently from UI components
✅ Maintainable: Clear separation of concerns with modular agent system
✅ Reusable: Orchestrator works with any interface (Streamlit, API, CLI)
✅ Type-Safe: Full type annotations and Pydantic models for reliable data handling

🤖 Agent System Architecture

Built on Linden Framework: py-jrag leverages the Linden framework's AgentRunner infrastructure, extending it with specialized agents and domain-specific tools.

🌿 Linden Framework Integration

py-jrag uses Linden's core capabilities:

AgentRunner: Base class providing agent lifecycle management, streaming, and tool integration
Provider Interface: Unified interface for multiple AI providers (Claude, OpenAI, Groq, Ollama)
Configuration System: TOML-based configuration with type safety and validation
Memory Management: Integrated FAISS-based persistent memory across sessions

Specialized Agent Implementation

Each agent extends Linden's AgentRunner with domain-specific functionality:

1. Decoupler Agent (`agents/decoupler.py`)

Purpose: Intelligent query decomposition with conversational context awareness

Key Capabilities:

Context Resolution: Resolves pronouns and vague references using conversation history
Smart Decomposition: Breaks complex questions into independent, actionable sub-questions
Session Awareness: Handles session-specific queries and comparisons intelligently
Minimal Splitting: Uses minimum number of sub-questions required
Data Compliance: Includes relevant metadata keywords (session_id, product_id, user_id, start_time)

2. Crafter Agent (`agents/crafter.py`)

Purpose: Intelligent tool orchestration for comprehensive information retrieval

Search Tools:

Semantic Search
- Semantic Understanding: Uses advanced embeddings for concept-based search
- Cosine Similarity: Advanced similarity matching for contextual relevance
- Natural Language: Supports complex natural language queries
- Configurable Results: Flexible result count with performance optimization
Full Text Search
- Exact Matching: Precise keyword and phrase matching
- Elasticsearch Integration: Leverages full-text search capabilities
- High Precision: Ideal for specific term searches
- Performance Optimized: Fast retrieval with indexed search
Metadata Search
- Structured Filtering: Multi-field metadata filtering with boolean logic
- Schema Validation: Strict input validation for supported metadata fields
- Flexible Queries: Supports partial and complete metadata combinations
- Session Tracking: Specialized for session-based data analysis

3. Extractor Agent (`agents/extractor.py`)

Purpose: Context processing and information synthesis for long-form content

Key Features:

Content Summarization: Processes large context blocks when information exceeds manageable size
Intelligent Filtering: Automatically triggered when context contains more than 2 document segments
Structured Output: Returns processed information with summary field
Context Awareness: Maintains question relevance while condensing information

4. Speaker Agent (`agents/speaker.py`)

Purpose: Final response generation with streaming support and advanced reasoning

Advanced Capabilities:

Real-time Streaming: Token-level streaming for immediate user feedback with proper error handling
Thinking Process: Special <think> tags for reasoning visualization in collapsible UI steps
High-Quality Models: Uses configurable models (Claude Sonnet 4 by default) for superior response quality
Context Integration: Synthesizes information from all previous agents with conversation awareness
Memory-Aware Responses: Maintains chat history for contextual and personalized responses

🤖 AI Provider Support

Powered by Linden's Multi-Provider System: py-jrag leverages Linden's unified provider interface:

1. Anthropic Claude

Latest Models: Claude Sonnet 4 support with advanced reasoning capabilities
Streaming Support: Real-time token streaming with proper error handling
Tool Integration: Native function calling with structured JSON responses
Production Ready: Enterprise-grade error handling and rate limiting

2. Ollama (Local Inference)

Privacy-First Local Execution: Complete data privacy with local model hosting
Real-time Streaming: Token-level streaming via chunked HTTP responses
Resource Optimization: Efficient local inference with customizable parameters

3. Groq (High-Performance Cloud)

Ultra-Fast Cloud Inference: Specialized infrastructure for rapid responses
Production-Grade Streaming: Real-time streaming with chunked delivery and error recovery
Robust Error Handling: Comprehensive error recovery and automatic retry mechanisms

4. OpenAI (Enterprise Ready)

GPT Integration: Full support for GPT-4 and GPT-3.5-turbo models
Advanced Embedding Support: Using text-embedding-3-small for memory system
Enterprise Configuration: Comprehensive API key management and billing control

⚙️ Configuration System

The system uses a centralized TOML-based configuration with the ConfigManager class:

Configuration Structure

[models]
dec = "claude-sonnet-4-20250514"          # Decoupler agent model
tool = "claude-sonnet-4-20250514"         # Crafter agent model
extractor = "claude-sonnet-4-20250514"    # Extractor agent model
speaker = "claude-sonnet-4-20250514"      # Speaker agent model

[groq]
base_url = "https://api.groq.com/"
api_key = ""                              # Required for Groq provider
timeout = 120

[ollama]
timeout = 120                             # Local model timeout

[openai]
api_key = ""                              # Required for embeddings and OpenAI models
timeout = 120

[anthropic]
api_key = ""                              # Required for Claude models
timeout = 120
max_tokens = 2048

[elasticsearch]
scheme = "https"
host = "localhost"
port = 9200
auth_name = "elastic"
auth_pwd = "changeme"
index_name = "webflow"

[memory]
path = "./memory/faiss/faiss_memories"    # FAISS vector storage path
collection_name = "py-jrag"

Configuration Management Features

Singleton Pattern: Single configuration instance across the application
Runtime Reloading: ConfigManager.reload() for configuration updates
Environment Variables: Support for overriding configuration via environment variables
Type Safety: Strongly typed configuration with validation
Default Values: Sensible defaults for non-critical settings

🧠 Memory System Architecture

Built on Linden's Memory Infrastructure: py-jrag uses Linden's integrated memory management:

AgentMemory Features

Key Components:

Vector Storage: FAISS-based vector database for semantic memory retrieval
Embeddings: OpenAI text-embedding-3-small for high-quality semantic understanding
Agent Isolation: Per-agent memory spaces prevent cross-contamination
Conversation Retrieval: Retrieves relevant past interactions
Automatic Recording: Stores interactions with context inference
Memory Reset: Clean slate functionality per agent

Memory Configuration:

config = {
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": conf.openai.api_key
        }
    },
    "vector_store": {
        "provider": "faiss",
        "config": {
            "collection_name": "py-jrag",
            "path": conf.memory.path
        }
    }
}

📊 Performance Metrics

Evaluation Results

The system underwent comprehensive evaluation using a golden dataset of 34 queries across 4 distinct categories:

Metric Category	Average Score	Performance Level
Context Precision	0.8824	Excellent (88.24% of retrieved docs relevant)
Context Recall	0.6690	Good (66.90% of relevant info retrieved)
Human Faithfulness	4.9706	Outstanding (Nearly perfect fidelity)
LLM Faithfulness	4.7353	Excellent (High automated agreement)
Human Relevancy	4.4118	Very Good (Highly relevant responses)
LLM Relevancy	4.6765	Excellent (Strong automated relevance)
Human Completeness	4.2059	Good (Comprehensive information coverage)
LLM Completeness	4.0000	Good (Adequate automated completeness)
Human Clarity	4.5882	Excellent (Very clear communication)
LLM Clarity	4.9118	Outstanding (Exceptional automated clarity)

Category-Specific Performance

Category	Precision	Recall	Human Avg	LLM Avg	Performance Notes
Product Queries	1.000	0.958	4.42	4.54	🏆 Best Overall: Perfect precision, excellent recall
User Queries	1.000	0.667	4.58	4.78	🎯 High Precision: Perfect document relevance
Generic Queries	0.833	0.542	4.74	4.59	📚 Balanced: Good general performance
Session Queries	0.714	0.643	4.35	4.25	⚡ Challenging: Complex temporal queries

🚀 Installation & Setup

Prerequisites

Python 3.10+
Elasticsearch cluster (local or cloud)
API keys for chosen providers:
- Anthropic API key (recommended for production)
- OpenAI API key (required for embeddings)
- Groq API key (optional, for high-speed inference)
- Ollama (optional, for local inference)

Installation Steps

Clone Repository
```
git clone <repository-url>
cd py-jrag
```
Install Dependencies
```
pip install -r requirements.txt
```
Configure Application
- Copy config.toml and update with your API keys and settings
- Configure Elasticsearch connection details
- Set memory storage path
Setup Elasticsearch
- Start Elasticsearch cluster
- Create index with appropriate mappings
- Configure authentication credentials
Setup Local Models (Optional)
- Install Ollama if using local inference
- Pull desired models (e.g., ollama pull llama2)
Run Application
```
streamlit run streamlit_app.py
```

💻 Usage

Running the Application

Start Streamlit Server
```
streamlit run streamlit_app.py
```
Access Web Interface
- Open browser to http://localhost:8501
- Start chatting with the py-jrag system
Monitor Logs
- Check console for detailed agent execution logs
- Debug mode provides comprehensive tracking and performance metrics

Programmatic Usage

The RAGOrchestrator can be imported and used directly in Python code:

from app.orchestrator import RAGOrchestrator
from configuration.configuration import ConfigManager

# Initialize configuration
ConfigManager.initialize(config_path="config.toml")

# Create orchestrator
orchestrator = RAGOrchestrator()

# Process query
response = await orchestrator.process_query("Your question here")
print(response.content)

# Or use streaming
async for chunk in orchestrator.process_query_streaming("Your question here"):
    if chunk.content:
        print(chunk.content, end="")

🧪 Testing & Evaluation

Running Tests

# Run unit tests
python -m pytest test/app/test_orchestrator.py -v

# Run integration tests (requires proper setup)
python -m pytest test/ -m integration

# Run evaluation suite
python evaluation/1_rag_evaluation_runner.py  # Generate responses
python evaluation/2_llm_as_judge.py           # LLM evaluation
python evaluation/3_metrics.py                # Calculate metrics

Validation

# Validate architecture setup
python validate_architecture.py

🏗️ Project Structure

py-jrag/
├── app/                    # Core business logic
│   ├── orchestrator.py     # Main RAG orchestrator
│   └── models.py          # Data structures
├── agents/                # Specialized AI agents
│   ├── decoupler.py       # Query decomposition
│   ├── crafter.py         # Information gathering
│   ├── extractor.py       # Context processing
│   └── speaker.py         # Response generation
├── configuration/         # Configuration management
│   └── configuration.py   # Config system
├── elastic/              # Elasticsearch integration
│   └── elastic.py        # Search client
├── memory/               # Persistent memory storage
├── evaluation/           # Testing and evaluation
├── test/                # Unit and integration tests
├── assets/              # Static assets (logo, etc.)
├── streamlit_app.py     # Web interface
├── config.toml          # Configuration file
└── requirements.txt     # Dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.dev		.dev
agents		agents
app		app
assets		assets
clip		clip
configuration		configuration
dev/certs		dev/certs
elastic		elastic
evaluation		evaluation
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

py-jrag

✨ Key Features

🏛️ Architecture Overview

Core Components

Processing Pipeline

RAGOrchestrator - Core Business Logic

🤖 Agent System Architecture

🌿 Linden Framework Integration

Specialized Agent Implementation

1. Decoupler Agent (agents/decoupler.py)

2. Crafter Agent (agents/crafter.py)

3. Extractor Agent (agents/extractor.py)

4. Speaker Agent (agents/speaker.py)

🤖 AI Provider Support

1. Anthropic Claude

2. Ollama (Local Inference)

3. Groq (High-Performance Cloud)

4. OpenAI (Enterprise Ready)

⚙️ Configuration System

Configuration Structure

Configuration Management Features

🧠 Memory System Architecture

AgentMemory Features

📊 Performance Metrics

Evaluation Results

Category-Specific Performance

🚀 Installation & Setup

Prerequisites

Installation Steps

💻 Usage

Running the Application

Programmatic Usage

🧪 Testing & Evaluation

Running Tests

Validation

🏗️ Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

1. Decoupler Agent (`agents/decoupler.py`)

2. Crafter Agent (`agents/crafter.py`)

3. Extractor Agent (`agents/extractor.py`)

4. Speaker Agent (`agents/speaker.py`)