Raggamuffin transforms static PDF documents into interactive, semantic knowledge bases. It combines intelligent PDF merging with a powerful Retrieval-Augmented Generation (RAG) engine powered by Google Gemini, enabling you to chat with your documents using natural language.
- Intelligent PDF Merging: Combine multiple PDF documents into a single file
- Semantic Document Chat: Upload PDFs and interact with them using conversational AI (Gemini 1.5 Flash)
- Text Extraction & Chunking: Automatically extracts and chunks PDF text for optimal RAG context
- Audio Transcription: Convert speech to text using Gemini's audio capabilities
- ElevenLabs Voice: Stream audio answers from chat responses with natural-sounding voices
- Google Drive Export: Upload merged PDFs directly to Google Drive (local OAuth flow)
- Confluence Export: Publish chat results to Confluence pages
- Observability: Built-in support for Datadog APM, Sentry, and PostHog (optional)
Before you begin, ensure you have the following installed:
- Python 3.10+ (Download)
- pip (comes with Python)
- Git (Download)
- Google Gemini API Key (Get one here)
Optional:
- Docker (for containerized deployment)
- gcloud CLI (for Google Cloud Run deployment)
-
Clone the repository
git clone https://github.com/josedaniel-dev/rgmfn.git cd rgmfn -
Create a virtual environment (recommended)
python -m venv .venv # On Windows .venv\Scripts\activate # On macOS/Linux source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Copy the example environment file cp .env.example .env # Edit .env and add your API keys # At minimum, you need: # GOOGLE_API_KEY=your_gemini_api_key_here
Edit the .env file to configure your API keys and services:
# Google Gemini (REQUIRED for core functionality)
GOOGLE_API_KEY=your_gemini_key_here# ElevenLabs (for voice responses)
ELEVENLABS_API_KEY=your_elevenlabs_key_here
# Datadog (for monitoring)
DD_API_KEY=your_dd_api_key
DD_APP_KEY=your_dd_app_key
DD_SITE=datadoghq.com
DD_SERVICE=raggamuffin
DD_ENV=dev
# CORS Settings (for GitHub Pages frontend)
CORS_ORIGINS=http://localhost:8000,https://youruser.github.io
# Sentry (error tracking)
SENTRY_DSN=your_sentry_dsn
# PostHog (analytics)
POSTHOG_API_KEY=your_posthog_key
POSTHOG_HOST=https://app.posthog.com
# Google Drive (for PDF uploads)
GOOGLE_DRIVE_TOKEN=token.json
GOOGLE_DRIVE_CLIENT_SECRETS=credentials.json
# Confluence (for exporting chat results)
CONFLUENCE_URL=https://your-domain.atlassian.net
CONFLUENCE_USERNAME=your_email@example.com
CONFLUENCE_API_TOKEN=your_confluence_tokenRaggamuffin offers three ways to interact with the application:
Start the local server:
# Using the CLI tool
python CLI.py run
# Or directly with uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000Then open your browser to:
- Dashboard: http://localhost:8000/dashboard
- API Docs: http://localhost:8000/docs (interactive Swagger UI)
-
Merge PDFs:
- Navigate to the "Merge PDFs" section
- Upload 2 or more PDF files
- Click "Merge" to combine them
- Download the merged PDF
-
Chat with PDFs:
- Navigate to the "Chat with PDF" section
- Upload a single PDF document
- Wait for processing to complete
- Type your questions in the chat interface
- Optionally enable voice responses (requires ElevenLabs API key)
-
Export Options:
- Save merged PDFs to Google Drive
- Export chat conversations to Confluence
Raggamuffin provides a RESTful API. Here are the main endpoints:
# Merge multiple PDFs
curl -X POST "http://localhost:8000/merge" \
-F "files=@document1.pdf" \
-F "files=@document2.pdf" \
-F "files=@document3.pdf"
# Response:
# {
# "status": "success",
# "file_path": "uploads/abc123/merged.pdf",
# "filename": "raggamuffin_merged.pdf"
# }
# Download the merged PDF
curl "http://localhost:8000/download?path=uploads/abc123/merged.pdf&filename=merged.pdf" \
--output merged.pdf# 1. Upload a PDF for chat
curl -X POST "http://localhost:8000/upload-for-chat" \
-F "file=@document.pdf"
# Response:
# {
# "session_id": "abc123def456",
# "chunks": 42
# }
# 2. Ask questions about the document
curl -X POST "http://localhost:8000/chat" \
-F "query=What is the main topic of this document?" \
-F "session_id=abc123def456"
# Response:
# {
# "answer": "The main topic is..."
# }# Transcribe audio to text
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@audio.webm"
# Get audio response (text-to-speech)
curl "http://localhost:8000/stream-audio?text=Hello%20world&voice_id=21m00Tcm4TlvDq8ikWAM" \
--output response.mp3# Save to Google Drive
curl -X POST "http://localhost:8000/save-to-drive" \
-H "Content-Type: application/json" \
-d '{"file_path": "uploads/abc123/merged.pdf"}'
# Export to Confluence
curl -X POST "http://localhost:8000/export-to-confluence" \
-H "Content-Type: application/json" \
-d '{
"space": "TEAM",
"title": "Meeting Notes",
"content": "Summary of the document..."
}'The CLI.py tool provides convenient commands for common tasks:
# Start the development server
python CLI.py run
# Run all tests
python CLI.py test
# Deploy to Google Cloud Run
python CLI.py deploy
# Build Docker image
python CLI.py docker-build --tag raggamuffin:latest
# Run Docker container
python CLI.py docker-run --tag raggamuffin:latest --port 8000Raggamuffin includes a comprehensive test suite covering unit tests, API integration tests, and end-to-end tests.
python CLI.py testOr directly with pytest:
pytest tests/- Unit Tests:
tests/test_core.py- Core functionality (PDF merging, text extraction) - API Tests:
tests/test_api.py- API endpoint validation - Integration Tests:
tests/test_api_integration.py- Full workflow tests - Service Tests:
tests/test_services.py- External service integrations - E2E Tests:
tests/test_frontend_e2e.py- Browser-based UI tests (requires Playwright)
For browser-based testing:
# Install Playwright
pip install playwright pytest-playwright
python -m playwright install
# Run E2E tests
pytest tests/test_frontend_e2e.py-
Build the Docker image:
docker build -t raggamuffin:latest . -
Run the container:
docker run -p 8000:8000 --env-file .env raggamuffin:latest
-
Ensure gcloud CLI is installed and configured:
gcloud auth login gcloud config set project YOUR_PROJECT_ID -
Deploy using the CLI tool:
python CLI.py deploy
Or manually:
gcloud run deploy raggamuffin-api \ --source . \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --set-env-vars GOOGLE_API_KEY=your_key_here -
Set environment variables in Cloud Run console or via command:
gcloud run services update raggamuffin-api \ --update-env-vars GOOGLE_API_KEY=your_key,ELEVENLABS_API_KEY=your_key
The docs/ directory contains a static HTML frontend that can be deployed to GitHub Pages:
-
Update the API URL in
docs/config.js:const API_BASE_URL = 'https://your-cloud-run-url.run.app';
-
Enable GitHub Pages in your repository settings:
- Go to Settings β Pages
- Source: Deploy from a branch
- Branch:
main/docs
-
Access your frontend at:
https://yourusername.github.io/rgmfn
- Frontend: HTML5/CSS3 with Tailwind CSS and glassmorphism UI
- Backend: FastAPI (Python 3.10+) with async endpoints
- LLM Engine: Google GenAI (Gemini 1.5 Flash)
- Vector Search: Gemini embeddings with dot-product retrieval
- PDF Processing: pypdf library
- Voice: ElevenLabs TTS API
raggamuffin/
βββ docs/ # Static frontend + deployment docs
β βββ index.html # Static frontend entry point
β βββ config.js # API base URL configuration
βββ templates/ # Jinja2 templates for FastAPI UI
βββ tests/ # Test suite (unit, API, E2E)
βββ uploads/ # Session-based PDF storage
βββ datadog_config/ # Datadog dashboard and monitor JSON
βββ static/ # Static assets
βββ CLI.py # CLI tool for run/test/deploy
βββ main.py # FastAPI entry point and routes
βββ rag_engine.py # Gemini embeddings + RAG logic
βββ pdf_engine.py # PDF merge logic
βββ drive_service.py # Google Drive upload service
βββ confluence_service.py # Confluence export service
βββ elevenlabs_service.py # ElevenLabs TTS integration
βββ monitoring_service.py # Datadog/Sentry/PostHog helpers
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ .env.example # Environment variables template
- Ingestion: User uploads PDFs via the dashboard or API
- Processing:
- Merge:
pdf_engine.pyconcatenates PDFs - Chat:
rag_engine.pyextracts text, chunks it, and builds embeddings
- Merge:
- Retrieval: Vector search finds relevant chunks for user queries
- Generation: Gemini generates contextual responses
- Delivery:
- File download
- Optional Drive/Confluence export
- Optional audio streaming
Problem: The server restarted and lost the in-memory RAG session.
Solution: Re-upload your PDF to create a new session. For production, consider implementing persistent session storage.
Problem: Google Drive OAuth requires a local browser flow that doesn't work on Cloud Run.
Solution:
- For local development: Run the OAuth flow locally to generate
token.json - For production: Use a service account instead of OAuth
Problem: The API is rejecting requests from your GitHub Pages domain.
Solution: Update the CORS_ORIGINS environment variable to include your GitHub Pages URL:
CORS_ORIGINS=https://yourusername.github.ioProblem: The GOOGLE_API_KEY environment variable is not set.
Solution:
- Verify
.envfile exists and containsGOOGLE_API_KEY=your_key - If using Docker, ensure
--env-file .envis included - If using Cloud Run, set environment variables in the console
Problem: Missing test dependencies.
Solution: Install development dependencies:
pip install -r requirements-dev.txtProblem: Missing or invalid ElevenLabs API key.
Solution:
- Get an API key from ElevenLabs
- Add it to
.env:ELEVENLABS_API_KEY=your_key - Restart the server
- Full Setup Guide: See
docs/manual_steps.mdfor detailed step-by-step instructions - Deployment Reference: See
docs/deployment.mdfor deployment configuration - API Documentation: Visit
/docsendpoint when server is running for interactive Swagger UI - Future Roadmap: See
docs/future_upgrades.mdfor planned features
- Google Drive: Uploads use a local OAuth flow and will not work on Cloud Run without a service account or alternate auth flow. When credentials are missing, the upload returns a stub ID for demo purposes.
- FAISS: Included as an optional dependency but not required for the default embedding flow.
- Static Frontend: Reads the API base URL from
docs/config.js. Ensure it matches your backend URL and CORS settings. - Session Storage: RAG sessions are stored in-memory and will be lost on server restart. For production, implement persistent storage.
- Voice Limits: Text-to-speech is limited to 300 characters to prevent excessive API usage.
Contributions are welcome! Please feel free to submit a Pull Request.
See LICENSE file for details.
For issues and questions:
- Open an issue on GitHub
- Check existing documentation in the
docs/folder - Review the API documentation at
/docsendpoint
Built with β€οΈ using Google Gemini, FastAPI, and modern web technologies.