Skip to content

Raggamuffin is a high-performance web platform designed to transform static PDF documents into interactive, semantic knowledge bases. It combines advanced PDF merging capabilities with a Retrieval-Augmented Generation (RAG) engine powered by Google Gemini.

License

Notifications You must be signed in to change notification settings

josedaniel-dev/rgmfn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Raggamuffin: The Semantic PDF Orchestrator

Raggamuffin transforms static PDF documents into interactive, semantic knowledge bases. It combines intelligent PDF merging with a powerful Retrieval-Augmented Generation (RAG) engine powered by Google Gemini, enabling you to chat with your documents using natural language.


✨ Key Features

Core Functionality

  • Intelligent PDF Merging: Combine multiple PDF documents into a single file
  • Semantic Document Chat: Upload PDFs and interact with them using conversational AI (Gemini 1.5 Flash)
  • Text Extraction & Chunking: Automatically extracts and chunks PDF text for optimal RAG context
  • Audio Transcription: Convert speech to text using Gemini's audio capabilities

Integrations

  • ElevenLabs Voice: Stream audio answers from chat responses with natural-sounding voices
  • Google Drive Export: Upload merged PDFs directly to Google Drive (local OAuth flow)
  • Confluence Export: Publish chat results to Confluence pages
  • Observability: Built-in support for Datadog APM, Sentry, and PostHog (optional)

πŸš€ Quick Start

Prerequisites

Before you begin, ensure you have the following installed:

Optional:

  • Docker (for containerized deployment)
  • gcloud CLI (for Google Cloud Run deployment)

Installation

  1. Clone the repository

    git clone https://github.com/josedaniel-dev/rgmfn.git
    cd rgmfn
  2. Create a virtual environment (recommended)

    python -m venv .venv
    
    # On Windows
    .venv\Scripts\activate
    
    # On macOS/Linux
    source .venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables

    # Copy the example environment file
    cp .env.example .env
    
    # Edit .env and add your API keys
    # At minimum, you need:
    # GOOGLE_API_KEY=your_gemini_api_key_here

βš™οΈ Configuration

Edit the .env file to configure your API keys and services:

Required Configuration

# Google Gemini (REQUIRED for core functionality)
GOOGLE_API_KEY=your_gemini_key_here

Optional Integrations

# ElevenLabs (for voice responses)
ELEVENLABS_API_KEY=your_elevenlabs_key_here

# Datadog (for monitoring)
DD_API_KEY=your_dd_api_key
DD_APP_KEY=your_dd_app_key
DD_SITE=datadoghq.com
DD_SERVICE=raggamuffin
DD_ENV=dev

# CORS Settings (for GitHub Pages frontend)
CORS_ORIGINS=http://localhost:8000,https://youruser.github.io

# Sentry (error tracking)
SENTRY_DSN=your_sentry_dsn

# PostHog (analytics)
POSTHOG_API_KEY=your_posthog_key
POSTHOG_HOST=https://app.posthog.com

# Google Drive (for PDF uploads)
GOOGLE_DRIVE_TOKEN=token.json
GOOGLE_DRIVE_CLIENT_SECRETS=credentials.json

# Confluence (for exporting chat results)
CONFLUENCE_URL=https://your-domain.atlassian.net
CONFLUENCE_USERNAME=your_email@example.com
CONFLUENCE_API_TOKEN=your_confluence_token

🎯 Usage

Raggamuffin offers three ways to interact with the application:

1. Web Interface (Recommended)

Start the local server:

# Using the CLI tool
python CLI.py run

# Or directly with uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Then open your browser to:

Using the Dashboard

  1. Merge PDFs:

    • Navigate to the "Merge PDFs" section
    • Upload 2 or more PDF files
    • Click "Merge" to combine them
    • Download the merged PDF
  2. Chat with PDFs:

    • Navigate to the "Chat with PDF" section
    • Upload a single PDF document
    • Wait for processing to complete
    • Type your questions in the chat interface
    • Optionally enable voice responses (requires ElevenLabs API key)
  3. Export Options:

    • Save merged PDFs to Google Drive
    • Export chat conversations to Confluence

2. API Endpoints

Raggamuffin provides a RESTful API. Here are the main endpoints:

PDF Merging

# Merge multiple PDFs
curl -X POST "http://localhost:8000/merge" \
  -F "files=@document1.pdf" \
  -F "files=@document2.pdf" \
  -F "files=@document3.pdf"

# Response:
# {
#   "status": "success",
#   "file_path": "uploads/abc123/merged.pdf",
#   "filename": "raggamuffin_merged.pdf"
# }

# Download the merged PDF
curl "http://localhost:8000/download?path=uploads/abc123/merged.pdf&filename=merged.pdf" \
  --output merged.pdf

Document Chat (RAG)

# 1. Upload a PDF for chat
curl -X POST "http://localhost:8000/upload-for-chat" \
  -F "file=@document.pdf"

# Response:
# {
#   "session_id": "abc123def456",
#   "chunks": 42
# }

# 2. Ask questions about the document
curl -X POST "http://localhost:8000/chat" \
  -F "query=What is the main topic of this document?" \
  -F "session_id=abc123def456"

# Response:
# {
#   "answer": "The main topic is..."
# }

Audio Features

# Transcribe audio to text
curl -X POST "http://localhost:8000/transcribe" \
  -F "file=@audio.webm"

# Get audio response (text-to-speech)
curl "http://localhost:8000/stream-audio?text=Hello%20world&voice_id=21m00Tcm4TlvDq8ikWAM" \
  --output response.mp3

Export Features

# Save to Google Drive
curl -X POST "http://localhost:8000/save-to-drive" \
  -H "Content-Type: application/json" \
  -d '{"file_path": "uploads/abc123/merged.pdf"}'

# Export to Confluence
curl -X POST "http://localhost:8000/export-to-confluence" \
  -H "Content-Type: application/json" \
  -d '{
    "space": "TEAM",
    "title": "Meeting Notes",
    "content": "Summary of the document..."
  }'

3. Command Line Interface (CLI)

The CLI.py tool provides convenient commands for common tasks:

# Start the development server
python CLI.py run

# Run all tests
python CLI.py test

# Deploy to Google Cloud Run
python CLI.py deploy

# Build Docker image
python CLI.py docker-build --tag raggamuffin:latest

# Run Docker container
python CLI.py docker-run --tag raggamuffin:latest --port 8000

πŸ§ͺ Testing

Raggamuffin includes a comprehensive test suite covering unit tests, API integration tests, and end-to-end tests.

Run All Tests

python CLI.py test

Or directly with pytest:

pytest tests/

Test Categories

  • Unit Tests: tests/test_core.py - Core functionality (PDF merging, text extraction)
  • API Tests: tests/test_api.py - API endpoint validation
  • Integration Tests: tests/test_api_integration.py - Full workflow tests
  • Service Tests: tests/test_services.py - External service integrations
  • E2E Tests: tests/test_frontend_e2e.py - Browser-based UI tests (requires Playwright)

End-to-End Testing with Playwright

For browser-based testing:

# Install Playwright
pip install playwright pytest-playwright
python -m playwright install

# Run E2E tests
pytest tests/test_frontend_e2e.py

🚒 Deployment

Docker Deployment

  1. Build the Docker image:

    docker build -t raggamuffin:latest .
  2. Run the container:

    docker run -p 8000:8000 --env-file .env raggamuffin:latest

Google Cloud Run Deployment

  1. Ensure gcloud CLI is installed and configured:

    gcloud auth login
    gcloud config set project YOUR_PROJECT_ID
  2. Deploy using the CLI tool:

    python CLI.py deploy

    Or manually:

    gcloud run deploy raggamuffin-api \
      --source . \
      --platform managed \
      --region us-central1 \
      --allow-unauthenticated \
      --set-env-vars GOOGLE_API_KEY=your_key_here
  3. Set environment variables in Cloud Run console or via command:

    gcloud run services update raggamuffin-api \
      --update-env-vars GOOGLE_API_KEY=your_key,ELEVENLABS_API_KEY=your_key

Static Frontend Deployment (GitHub Pages)

The docs/ directory contains a static HTML frontend that can be deployed to GitHub Pages:

  1. Update the API URL in docs/config.js:

    const API_BASE_URL = 'https://your-cloud-run-url.run.app';
  2. Enable GitHub Pages in your repository settings:

    • Go to Settings β†’ Pages
    • Source: Deploy from a branch
    • Branch: main / docs
  3. Access your frontend at: https://yourusername.github.io/rgmfn


πŸ—οΈ System Architecture

Technology Stack

  • Frontend: HTML5/CSS3 with Tailwind CSS and glassmorphism UI
  • Backend: FastAPI (Python 3.10+) with async endpoints
  • LLM Engine: Google GenAI (Gemini 1.5 Flash)
  • Vector Search: Gemini embeddings with dot-product retrieval
  • PDF Processing: pypdf library
  • Voice: ElevenLabs TTS API

Directory Structure

raggamuffin/
β”œβ”€β”€ docs/                    # Static frontend + deployment docs
β”‚   β”œβ”€β”€ index.html           # Static frontend entry point
β”‚   └── config.js            # API base URL configuration
β”œβ”€β”€ templates/               # Jinja2 templates for FastAPI UI
β”œβ”€β”€ tests/                   # Test suite (unit, API, E2E)
β”œβ”€β”€ uploads/                 # Session-based PDF storage
β”œβ”€β”€ datadog_config/          # Datadog dashboard and monitor JSON
β”œβ”€β”€ static/                  # Static assets
β”œβ”€β”€ CLI.py                   # CLI tool for run/test/deploy
β”œβ”€β”€ main.py                  # FastAPI entry point and routes
β”œβ”€β”€ rag_engine.py            # Gemini embeddings + RAG logic
β”œβ”€β”€ pdf_engine.py            # PDF merge logic
β”œβ”€β”€ drive_service.py         # Google Drive upload service
β”œβ”€β”€ confluence_service.py    # Confluence export service
β”œβ”€β”€ elevenlabs_service.py    # ElevenLabs TTS integration
β”œβ”€β”€ monitoring_service.py    # Datadog/Sentry/PostHog helpers
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ Dockerfile               # Container configuration
└── .env.example             # Environment variables template

Data Flow

  1. Ingestion: User uploads PDFs via the dashboard or API
  2. Processing:
    • Merge: pdf_engine.py concatenates PDFs
    • Chat: rag_engine.py extracts text, chunks it, and builds embeddings
  3. Retrieval: Vector search finds relevant chunks for user queries
  4. Generation: Gemini generates contextual responses
  5. Delivery:
    • File download
    • Optional Drive/Confluence export
    • Optional audio streaming

πŸ”§ Troubleshooting

Common Issues

"Session expired" error when chatting

Problem: The server restarted and lost the in-memory RAG session.

Solution: Re-upload your PDF to create a new session. For production, consider implementing persistent session storage.

Google Drive upload returns a stub ID

Problem: Google Drive OAuth requires a local browser flow that doesn't work on Cloud Run.

Solution:

  • For local development: Run the OAuth flow locally to generate token.json
  • For production: Use a service account instead of OAuth

CORS errors when using the static frontend

Problem: The API is rejecting requests from your GitHub Pages domain.

Solution: Update the CORS_ORIGINS environment variable to include your GitHub Pages URL:

CORS_ORIGINS=https://yourusername.github.io

"API key not found" errors

Problem: The GOOGLE_API_KEY environment variable is not set.

Solution:

  1. Verify .env file exists and contains GOOGLE_API_KEY=your_key
  2. If using Docker, ensure --env-file .env is included
  3. If using Cloud Run, set environment variables in the console

Tests failing with import errors

Problem: Missing test dependencies.

Solution: Install development dependencies:

pip install -r requirements-dev.txt

ElevenLabs voice not working

Problem: Missing or invalid ElevenLabs API key.

Solution:

  1. Get an API key from ElevenLabs
  2. Add it to .env: ELEVENLABS_API_KEY=your_key
  3. Restart the server

πŸ“š Additional Documentation

  • Full Setup Guide: See docs/manual_steps.md for detailed step-by-step instructions
  • Deployment Reference: See docs/deployment.md for deployment configuration
  • API Documentation: Visit /docs endpoint when server is running for interactive Swagger UI
  • Future Roadmap: See docs/future_upgrades.md for planned features

πŸ“ Notes and Constraints

  • Google Drive: Uploads use a local OAuth flow and will not work on Cloud Run without a service account or alternate auth flow. When credentials are missing, the upload returns a stub ID for demo purposes.
  • FAISS: Included as an optional dependency but not required for the default embedding flow.
  • Static Frontend: Reads the API base URL from docs/config.js. Ensure it matches your backend URL and CORS settings.
  • Session Storage: RAG sessions are stored in-memory and will be lost on server restart. For production, implement persistent storage.
  • Voice Limits: Text-to-speech is limited to 300 characters to prevent excessive API usage.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


πŸ“„ License

See LICENSE file for details.


πŸ†˜ Support

For issues and questions:

  • Open an issue on GitHub
  • Check existing documentation in the docs/ folder
  • Review the API documentation at /docs endpoint

Built with ❀️ using Google Gemini, FastAPI, and modern web technologies.

About

Raggamuffin is a high-performance web platform designed to transform static PDF documents into interactive, semantic knowledge bases. It combines advanced PDF merging capabilities with a Retrieval-Augmented Generation (RAG) engine powered by Google Gemini.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages