Skip to content

matstech/py-jrag

Repository files navigation

py-jrag

py-jrag Logo

A multi-agent Retrieval-Augmented Generation (RAG) system built with modern architecture patterns. The system processes complex queries through specialized AI agents, featuring real-time streaming responses, hybrid AI provider support, persistent memory management, and advanced tool orchestration.

Built with enterprise-grade reliability using clean architecture principles, the system separates business logic from presentation layers, enabling excellent testability and maintainability.

✨ Key Features

  • 🤖 Multi-Agent Orchestration: Specialized agents handle query decomposition, information gathering, extraction, and response generation
  • ☁️ Hybrid AI Provider Support: Seamlessly switch between local (Ollama), cloud (Groq/OpenAI), and Anthropic Claude providers
  • 🧠 Persistent Memory System: Cross-session memory using FAISS vector storage and OpenAI embeddings
  • ⚙️ Enterprise Configuration: Centralized TOML-based configuration with runtime reloading and environment management
  • Real-time Streaming: Asynchronous streaming responses with "thinking" process visualization and non-blocking execution
  • 🔍 Advanced Search Integration: Semantic search, full-text search, and metadata filtering through Elasticsearch
  • 🛠️ Intelligent Tool System: Automatic tool calling with Google-style docstring parsing and retry mechanisms
  • 💬 Modern Chat Interface: Streamlit-powered interface with structured reasoning visualization and responsive design
  • 🏗️ Clean Architecture: Business logic separated from UI for excellent testability and maintainability

🏛️ Architecture Overview

py-jrag implements a clean, modular architecture with the RAGOrchestrator as the core business logic component. The system processes complex queries through multiple specialized agents with support for both streaming and non-streaming responses.

Core Components

  1. RAGOrchestrator (app/orchestrator.py) - Core business logic orchestrating the entire RAG workflow
  2. Specialized Agents (agents/) - Domain-specific AI agents for different processing tasks
  3. Memory System (memory/) - Persistent conversation memory with FAISS vector storage
  4. AI Clients (configuration/) - Multi-provider AI integration (OpenAI, Groq, Ollama, Anthropic Claude)
  5. Configuration System (configuration/configuration.py) - Enterprise-grade TOML-based configuration management

Processing Pipeline

flowchart TD
    A[👤 User Query] --> B[🎭 RAGOrchestrator]
    B --> C[🔍 Decoupler Agent]
    C --> D[❓ Sub-Questions]
    D --> E[🛠️ Crafter Agent]
    
    E --> F1[🔍 Semantic Search]
    E --> F2[📄 Full-Text Search] 
    E --> F3[🏷️ Metadata Search]
    
    F1 --> G[📊 Elasticsearch]
    F2 --> G
    F3 --> G
    
    G --> H[📝 Extractor Agent]
    H --> I[🗣️ Speaker Agent]
    I --> J[💬 Streaming Response]
    
    K[🌿 Linden Framework] -.-> C
    K -.-> E
    K -.-> H
    K -.-> I
    
    L[🤖 AI Providers] -.-> K
    M[💾 FAISS Memory] -.-> K
    
    style A fill:#e1f5fe
    style J fill:#e8f5e8
    style K fill:#fff3e0
    style L fill:#fce4ec
    style M fill:#f3e5f5
Loading

Flow: User Query → Decoupler → Crafter → Extractor → Speaker → Response

RAGOrchestrator - Core Business Logic

The RAGOrchestrator is the heart of the system, providing clean separation between business logic and UI.

Key Methods:

  • process_query(): Complete RAG processing pipeline that returns a final response
  • process_query_streaming(): RAG processing with real-time token streaming for interactive UIs
  • reset_crafter(): Clean crafter agent state between queries to prevent context contamination

Benefits of the Architecture:

  • Testable: Business logic can be unit tested independently from UI components
  • Maintainable: Clear separation of concerns with modular agent system
  • Reusable: Orchestrator works with any interface (Streamlit, API, CLI)
  • Type-Safe: Full type annotations and Pydantic models for reliable data handling

🤖 Agent System Architecture

Built on Linden Framework: py-jrag leverages the Linden framework's AgentRunner infrastructure, extending it with specialized agents and domain-specific tools.

🌿 Linden Framework Integration

py-jrag uses Linden's core capabilities:

  • AgentRunner: Base class providing agent lifecycle management, streaming, and tool integration
  • Provider Interface: Unified interface for multiple AI providers (Claude, OpenAI, Groq, Ollama)
  • Configuration System: TOML-based configuration with type safety and validation
  • Memory Management: Integrated FAISS-based persistent memory across sessions

Specialized Agent Implementation

Each agent extends Linden's AgentRunner with domain-specific functionality:

1. Decoupler Agent (agents/decoupler.py)

Purpose: Intelligent query decomposition with conversational context awareness

Key Capabilities:

  • Context Resolution: Resolves pronouns and vague references using conversation history
  • Smart Decomposition: Breaks complex questions into independent, actionable sub-questions
  • Session Awareness: Handles session-specific queries and comparisons intelligently
  • Minimal Splitting: Uses minimum number of sub-questions required
  • Data Compliance: Includes relevant metadata keywords (session_id, product_id, user_id, start_time)

2. Crafter Agent (agents/crafter.py)

Purpose: Intelligent tool orchestration for comprehensive information retrieval

Search Tools:

  1. Semantic Search

    • Semantic Understanding: Uses advanced embeddings for concept-based search
    • Cosine Similarity: Advanced similarity matching for contextual relevance
    • Natural Language: Supports complex natural language queries
    • Configurable Results: Flexible result count with performance optimization
  2. Full Text Search

    • Exact Matching: Precise keyword and phrase matching
    • Elasticsearch Integration: Leverages full-text search capabilities
    • High Precision: Ideal for specific term searches
    • Performance Optimized: Fast retrieval with indexed search
  3. Metadata Search

    • Structured Filtering: Multi-field metadata filtering with boolean logic
    • Schema Validation: Strict input validation for supported metadata fields
    • Flexible Queries: Supports partial and complete metadata combinations
    • Session Tracking: Specialized for session-based data analysis

3. Extractor Agent (agents/extractor.py)

Purpose: Context processing and information synthesis for long-form content

Key Features:

  • Content Summarization: Processes large context blocks when information exceeds manageable size
  • Intelligent Filtering: Automatically triggered when context contains more than 2 document segments
  • Structured Output: Returns processed information with summary field
  • Context Awareness: Maintains question relevance while condensing information

4. Speaker Agent (agents/speaker.py)

Purpose: Final response generation with streaming support and advanced reasoning

Advanced Capabilities:

  • Real-time Streaming: Token-level streaming for immediate user feedback with proper error handling
  • Thinking Process: Special <think> tags for reasoning visualization in collapsible UI steps
  • High-Quality Models: Uses configurable models (Claude Sonnet 4 by default) for superior response quality
  • Context Integration: Synthesizes information from all previous agents with conversation awareness
  • Memory-Aware Responses: Maintains chat history for contextual and personalized responses

🤖 AI Provider Support

Powered by Linden's Multi-Provider System: py-jrag leverages Linden's unified provider interface:

1. Anthropic Claude

  • Latest Models: Claude Sonnet 4 support with advanced reasoning capabilities
  • Streaming Support: Real-time token streaming with proper error handling
  • Tool Integration: Native function calling with structured JSON responses
  • Production Ready: Enterprise-grade error handling and rate limiting

2. Ollama (Local Inference)

  • Privacy-First Local Execution: Complete data privacy with local model hosting
  • Real-time Streaming: Token-level streaming via chunked HTTP responses
  • Resource Optimization: Efficient local inference with customizable parameters

3. Groq (High-Performance Cloud)

  • Ultra-Fast Cloud Inference: Specialized infrastructure for rapid responses
  • Production-Grade Streaming: Real-time streaming with chunked delivery and error recovery
  • Robust Error Handling: Comprehensive error recovery and automatic retry mechanisms

4. OpenAI (Enterprise Ready)

  • GPT Integration: Full support for GPT-4 and GPT-3.5-turbo models
  • Advanced Embedding Support: Using text-embedding-3-small for memory system
  • Enterprise Configuration: Comprehensive API key management and billing control

⚙️ Configuration System

The system uses a centralized TOML-based configuration with the ConfigManager class:

Configuration Structure

[models]
dec = "claude-sonnet-4-20250514"          # Decoupler agent model
tool = "claude-sonnet-4-20250514"         # Crafter agent model
extractor = "claude-sonnet-4-20250514"    # Extractor agent model
speaker = "claude-sonnet-4-20250514"      # Speaker agent model

[groq]
base_url = "https://api.groq.com/"
api_key = ""                              # Required for Groq provider
timeout = 120

[ollama]
timeout = 120                             # Local model timeout

[openai]
api_key = ""                              # Required for embeddings and OpenAI models
timeout = 120

[anthropic]
api_key = ""                              # Required for Claude models
timeout = 120
max_tokens = 2048

[elasticsearch]
scheme = "https"
host = "localhost"
port = 9200
auth_name = "elastic"
auth_pwd = "changeme"
index_name = "webflow"

[memory]
path = "./memory/faiss/faiss_memories"    # FAISS vector storage path
collection_name = "py-jrag"

Configuration Management Features

  • Singleton Pattern: Single configuration instance across the application
  • Runtime Reloading: ConfigManager.reload() for configuration updates
  • Environment Variables: Support for overriding configuration via environment variables
  • Type Safety: Strongly typed configuration with validation
  • Default Values: Sensible defaults for non-critical settings

🧠 Memory System Architecture

Built on Linden's Memory Infrastructure: py-jrag uses Linden's integrated memory management:

AgentMemory Features

Key Components:

  • Vector Storage: FAISS-based vector database for semantic memory retrieval
  • Embeddings: OpenAI text-embedding-3-small for high-quality semantic understanding
  • Agent Isolation: Per-agent memory spaces prevent cross-contamination
  • Conversation Retrieval: Retrieves relevant past interactions
  • Automatic Recording: Stores interactions with context inference
  • Memory Reset: Clean slate functionality per agent

Memory Configuration:

config = {
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": conf.openai.api_key
        }
    },
    "vector_store": {
        "provider": "faiss",
        "config": {
            "collection_name": "py-jrag",
            "path": conf.memory.path
        }
    }
}

📊 Performance Metrics

Evaluation Results

The system underwent comprehensive evaluation using a golden dataset of 34 queries across 4 distinct categories:

Metric Category Average Score Performance Level
Context Precision 0.8824 Excellent (88.24% of retrieved docs relevant)
Context Recall 0.6690 Good (66.90% of relevant info retrieved)
Human Faithfulness 4.9706 Outstanding (Nearly perfect fidelity)
LLM Faithfulness 4.7353 Excellent (High automated agreement)
Human Relevancy 4.4118 Very Good (Highly relevant responses)
LLM Relevancy 4.6765 Excellent (Strong automated relevance)
Human Completeness 4.2059 Good (Comprehensive information coverage)
LLM Completeness 4.0000 Good (Adequate automated completeness)
Human Clarity 4.5882 Excellent (Very clear communication)
LLM Clarity 4.9118 Outstanding (Exceptional automated clarity)

Category-Specific Performance

Category Precision Recall Human Avg LLM Avg Performance Notes
Product Queries 1.000 0.958 4.42 4.54 🏆 Best Overall: Perfect precision, excellent recall
User Queries 1.000 0.667 4.58 4.78 🎯 High Precision: Perfect document relevance
Generic Queries 0.833 0.542 4.74 4.59 📚 Balanced: Good general performance
Session Queries 0.714 0.643 4.35 4.25 ⚡ Challenging: Complex temporal queries

🚀 Installation & Setup

Prerequisites

  • Python 3.10+
  • Elasticsearch cluster (local or cloud)
  • API keys for chosen providers:
    • Anthropic API key (recommended for production)
    • OpenAI API key (required for embeddings)
    • Groq API key (optional, for high-speed inference)
    • Ollama (optional, for local inference)

Installation Steps

  1. Clone Repository

    git clone <repository-url>
    cd py-jrag
  2. Install Dependencies

    pip install -r requirements.txt
  3. Configure Application

    • Copy config.toml and update with your API keys and settings
    • Configure Elasticsearch connection details
    • Set memory storage path
  4. Setup Elasticsearch

    • Start Elasticsearch cluster
    • Create index with appropriate mappings
    • Configure authentication credentials
  5. Setup Local Models (Optional)

    • Install Ollama if using local inference
    • Pull desired models (e.g., ollama pull llama2)
  6. Run Application

    streamlit run streamlit_app.py

💻 Usage

Running the Application

  1. Start Streamlit Server

    streamlit run streamlit_app.py
  2. Access Web Interface

    • Open browser to http://localhost:8501
    • Start chatting with the py-jrag system
  3. Monitor Logs

    • Check console for detailed agent execution logs
    • Debug mode provides comprehensive tracking and performance metrics

Programmatic Usage

The RAGOrchestrator can be imported and used directly in Python code:

from app.orchestrator import RAGOrchestrator
from configuration.configuration import ConfigManager

# Initialize configuration
ConfigManager.initialize(config_path="config.toml")

# Create orchestrator
orchestrator = RAGOrchestrator()

# Process query
response = await orchestrator.process_query("Your question here")
print(response.content)

# Or use streaming
async for chunk in orchestrator.process_query_streaming("Your question here"):
    if chunk.content:
        print(chunk.content, end="")

🧪 Testing & Evaluation

Running Tests

# Run unit tests
python -m pytest test/app/test_orchestrator.py -v

# Run integration tests (requires proper setup)
python -m pytest test/ -m integration

# Run evaluation suite
python evaluation/1_rag_evaluation_runner.py  # Generate responses
python evaluation/2_llm_as_judge.py           # LLM evaluation
python evaluation/3_metrics.py                # Calculate metrics

Validation

# Validate architecture setup
python validate_architecture.py

🏗️ Project Structure

py-jrag/
├── app/                    # Core business logic
│   ├── orchestrator.py     # Main RAG orchestrator
│   └── models.py          # Data structures
├── agents/                # Specialized AI agents
│   ├── decoupler.py       # Query decomposition
│   ├── crafter.py         # Information gathering
│   ├── extractor.py       # Context processing
│   └── speaker.py         # Response generation
├── configuration/         # Configuration management
│   └── configuration.py   # Config system
├── elastic/              # Elasticsearch integration
│   └── elastic.py        # Search client
├── memory/               # Persistent memory storage
├── evaluation/           # Testing and evaluation
├── test/                # Unit and integration tests
├── assets/              # Static assets (logo, etc.)
├── streamlit_app.py     # Web interface
├── config.toml          # Configuration file
└── requirements.txt     # Dependencies

About

An agentic RAG framework in Python, developed as the thesis project for a Master in Data Analytics. Based on the Linden library.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages