Raggamuffin: The Semantic PDF Orchestrator

Raggamuffin transforms static PDF documents into interactive, semantic knowledge bases. It combines intelligent PDF merging with a powerful Retrieval-Augmented Generation (RAG) engine powered by Google Gemini, enabling you to chat with your documents using natural language.

✨ Key Features

Core Functionality

Intelligent PDF Merging: Combine multiple PDF documents into a single file
Semantic Document Chat: Upload PDFs and interact with them using conversational AI (Gemini 1.5 Flash)
Text Extraction & Chunking: Automatically extracts and chunks PDF text for optimal RAG context
Audio Transcription: Convert speech to text using Gemini's audio capabilities

Integrations

ElevenLabs Voice: Stream audio answers from chat responses with natural-sounding voices
Google Drive Export: Upload merged PDFs directly to Google Drive (local OAuth flow)
Confluence Export: Publish chat results to Confluence pages
Observability: Built-in support for Datadog APM, Sentry, and PostHog (optional)

🚀 Quick Start

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.10+ (Download)
pip (comes with Python)
Git (Download)
Google Gemini API Key (Get one here)

Optional:

Docker (for containerized deployment)
gcloud CLI (for Google Cloud Run deployment)

Installation

Clone the repository

git clone https://github.com/josedaniel-dev/rgmfn.git
cd rgmfn

Create a virtual environment (recommended)

python -m venv .venv

# On Windows
.venv\Scripts\activate

# On macOS/Linux
source .venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys
# At minimum, you need:
# GOOGLE_API_KEY=your_gemini_api_key_here

⚙️ Configuration

Edit the .env file to configure your API keys and services:

Required Configuration

# Google Gemini (REQUIRED for core functionality)
GOOGLE_API_KEY=your_gemini_key_here

Optional Integrations

# ElevenLabs (for voice responses)
ELEVENLABS_API_KEY=your_elevenlabs_key_here

# Datadog (for monitoring)
DD_API_KEY=your_dd_api_key
DD_APP_KEY=your_dd_app_key
DD_SITE=datadoghq.com
DD_SERVICE=raggamuffin
DD_ENV=dev

# CORS Settings (for GitHub Pages frontend)
CORS_ORIGINS=http://localhost:8000,https://youruser.github.io

# Sentry (error tracking)
SENTRY_DSN=your_sentry_dsn

# PostHog (analytics)
POSTHOG_API_KEY=your_posthog_key
POSTHOG_HOST=https://app.posthog.com

# Google Drive (for PDF uploads)
GOOGLE_DRIVE_TOKEN=token.json
GOOGLE_DRIVE_CLIENT_SECRETS=credentials.json

# Confluence (for exporting chat results)
CONFLUENCE_URL=https://your-domain.atlassian.net
CONFLUENCE_USERNAME=your_email@example.com
CONFLUENCE_API_TOKEN=your_confluence_token

🎯 Usage

Raggamuffin offers three ways to interact with the application:

1. Web Interface (Recommended)

Start the local server:

# Using the CLI tool
python CLI.py run

# Or directly with uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Then open your browser to:

Dashboard: http://localhost:8000/dashboard
API Docs: http://localhost:8000/docs (interactive Swagger UI)

Using the Dashboard

Merge PDFs:
- Navigate to the "Merge PDFs" section
- Upload 2 or more PDF files
- Click "Merge" to combine them
- Download the merged PDF
Chat with PDFs:
- Navigate to the "Chat with PDF" section
- Upload a single PDF document
- Wait for processing to complete
- Type your questions in the chat interface
- Optionally enable voice responses (requires ElevenLabs API key)
Export Options:
- Save merged PDFs to Google Drive
- Export chat conversations to Confluence

2. API Endpoints

Raggamuffin provides a RESTful API. Here are the main endpoints:

PDF Merging

# Merge multiple PDFs
curl -X POST "http://localhost:8000/merge" \
  -F "files=@document1.pdf" \
  -F "files=@document2.pdf" \
  -F "files=@document3.pdf"

# Response:
# {
#   "status": "success",
#   "file_path": "uploads/abc123/merged.pdf",
#   "filename": "raggamuffin_merged.pdf"
# }

# Download the merged PDF
curl "http://localhost:8000/download?path=uploads/abc123/merged.pdf&filename=merged.pdf" \
  --output merged.pdf

Document Chat (RAG)

# 1. Upload a PDF for chat
curl -X POST "http://localhost:8000/upload-for-chat" \
  -F "file=@document.pdf"

# Response:
# {
#   "session_id": "abc123def456",
#   "chunks": 42
# }

# 2. Ask questions about the document
curl -X POST "http://localhost:8000/chat" \
  -F "query=What is the main topic of this document?" \
  -F "session_id=abc123def456"

# Response:
# {
#   "answer": "The main topic is..."
# }

Audio Features

# Transcribe audio to text
curl -X POST "http://localhost:8000/transcribe" \
  -F "file=@audio.webm"

# Get audio response (text-to-speech)
curl "http://localhost:8000/stream-audio?text=Hello%20world&voice_id=21m00Tcm4TlvDq8ikWAM" \
  --output response.mp3

Export Features

# Save to Google Drive
curl -X POST "http://localhost:8000/save-to-drive" \
  -H "Content-Type: application/json" \
  -d '{"file_path": "uploads/abc123/merged.pdf"}'

# Export to Confluence
curl -X POST "http://localhost:8000/export-to-confluence" \
  -H "Content-Type: application/json" \
  -d '{
    "space": "TEAM",
    "title": "Meeting Notes",
    "content": "Summary of the document..."
  }'

3. Command Line Interface (CLI)

The CLI.py tool provides convenient commands for common tasks:

# Start the development server
python CLI.py run

# Run all tests
python CLI.py test

# Deploy to Google Cloud Run
python CLI.py deploy

# Build Docker image
python CLI.py docker-build --tag raggamuffin:latest

# Run Docker container
python CLI.py docker-run --tag raggamuffin:latest --port 8000

🧪 Testing

Raggamuffin includes a comprehensive test suite covering unit tests, API integration tests, and end-to-end tests.

Run All Tests

python CLI.py test

Or directly with pytest:

pytest tests/

Test Categories

Unit Tests: tests/test_core.py - Core functionality (PDF merging, text extraction)
API Tests: tests/test_api.py - API endpoint validation
Integration Tests: tests/test_api_integration.py - Full workflow tests
Service Tests: tests/test_services.py - External service integrations
E2E Tests: tests/test_frontend_e2e.py - Browser-based UI tests (requires Playwright)

End-to-End Testing with Playwright

For browser-based testing:

# Install Playwright
pip install playwright pytest-playwright
python -m playwright install

# Run E2E tests
pytest tests/test_frontend_e2e.py

🚢 Deployment

Docker Deployment

Build the Docker image:
```
docker build -t raggamuffin:latest .
```

Run the container:

docker run -p 8000:8000 --env-file .env raggamuffin:latest

Google Cloud Run Deployment

Ensure gcloud CLI is installed and configured:

gcloud auth login
gcloud config set project YOUR_PROJECT_ID

Deploy using the CLI tool:

python CLI.py deploy

Or manually:

gcloud run deploy raggamuffin-api \
  --source . \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GOOGLE_API_KEY=your_key_here

Set environment variables in Cloud Run console or via command:

gcloud run services update raggamuffin-api \
  --update-env-vars GOOGLE_API_KEY=your_key,ELEVENLABS_API_KEY=your_key

Static Frontend Deployment (GitHub Pages)

The docs/ directory contains a static HTML frontend that can be deployed to GitHub Pages:

Update the API URL in docs/config.js:

const API_BASE_URL = 'https://your-cloud-run-url.run.app';

Enable GitHub Pages in your repository settings:
- Go to Settings → Pages
- Source: Deploy from a branch
- Branch: main / docs
Access your frontend at: https://yourusername.github.io/rgmfn

🏗️ System Architecture

Technology Stack

Frontend: HTML5/CSS3 with Tailwind CSS and glassmorphism UI
Backend: FastAPI (Python 3.10+) with async endpoints
LLM Engine: Google GenAI (Gemini 1.5 Flash)
Vector Search: Gemini embeddings with dot-product retrieval
PDF Processing: pypdf library
Voice: ElevenLabs TTS API

Directory Structure

raggamuffin/
├── docs/                    # Static frontend + deployment docs
│   ├── index.html           # Static frontend entry point
│   └── config.js            # API base URL configuration
├── templates/               # Jinja2 templates for FastAPI UI
├── tests/                   # Test suite (unit, API, E2E)
├── uploads/                 # Session-based PDF storage
├── datadog_config/          # Datadog dashboard and monitor JSON
├── static/                  # Static assets
├── CLI.py                   # CLI tool for run/test/deploy
├── main.py                  # FastAPI entry point and routes
├── rag_engine.py            # Gemini embeddings + RAG logic
├── pdf_engine.py            # PDF merge logic
├── drive_service.py         # Google Drive upload service
├── confluence_service.py    # Confluence export service
├── elevenlabs_service.py    # ElevenLabs TTS integration
├── monitoring_service.py    # Datadog/Sentry/PostHog helpers
├── requirements.txt         # Python dependencies
├── Dockerfile               # Container configuration
└── .env.example             # Environment variables template

Data Flow

Ingestion: User uploads PDFs via the dashboard or API
Processing:
- Merge: pdf_engine.py concatenates PDFs
- Chat: rag_engine.py extracts text, chunks it, and builds embeddings
Retrieval: Vector search finds relevant chunks for user queries
Generation: Gemini generates contextual responses
Delivery:
- File download
- Optional Drive/Confluence export
- Optional audio streaming

🔧 Troubleshooting

Common Issues

"Session expired" error when chatting

Problem: The server restarted and lost the in-memory RAG session.

Solution: Re-upload your PDF to create a new session. For production, consider implementing persistent session storage.

Google Drive upload returns a stub ID

Problem: Google Drive OAuth requires a local browser flow that doesn't work on Cloud Run.

Solution:

For local development: Run the OAuth flow locally to generate token.json
For production: Use a service account instead of OAuth

CORS errors when using the static frontend

Problem: The API is rejecting requests from your GitHub Pages domain.

Solution: Update the CORS_ORIGINS environment variable to include your GitHub Pages URL:

CORS_ORIGINS=https://yourusername.github.io

"API key not found" errors

Problem: The GOOGLE_API_KEY environment variable is not set.

Solution:

Verify .env file exists and contains GOOGLE_API_KEY=your_key
If using Docker, ensure --env-file .env is included
If using Cloud Run, set environment variables in the console

Tests failing with import errors

Problem: Missing test dependencies.

Solution: Install development dependencies:

pip install -r requirements-dev.txt

ElevenLabs voice not working

Problem: Missing or invalid ElevenLabs API key.

Solution:

Get an API key from ElevenLabs
Add it to .env: ELEVENLABS_API_KEY=your_key
Restart the server

📚 Additional Documentation

Full Setup Guide: See docs/manual_steps.md for detailed step-by-step instructions
Deployment Reference: See docs/deployment.md for deployment configuration
API Documentation: Visit /docs endpoint when server is running for interactive Swagger UI
Future Roadmap: See docs/future_upgrades.md for planned features

📝 Notes and Constraints

Google Drive: Uploads use a local OAuth flow and will not work on Cloud Run without a service account or alternate auth flow. When credentials are missing, the upload returns a stub ID for demo purposes.
FAISS: Included as an optional dependency but not required for the default embedding flow.
Static Frontend: Reads the API base URL from docs/config.js. Ensure it matches your backend URL and CORS settings.
Session Storage: RAG sessions are stored in-memory and will be lost on server restart. For production, implement persistent storage.
Voice Limits: Text-to-speech is limited to 300 characters to prevent excessive API usage.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

See LICENSE file for details.

🆘 Support

For issues and questions:

Open an issue on GitHub
Check existing documentation in the docs/ folder
Review the API documentation at /docs endpoint

Built with ❤️ using Google Gemini, FastAPI, and modern web technologies.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
datadog_config		datadog_config
docs		docs
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gcloudignore		.gcloudignore
.gitignore		.gitignore
CLI.py		CLI.py
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
confluence_service.py		confluence_service.py
drive_service.py		drive_service.py
elevenlabs_service.py		elevenlabs_service.py
main.py		main.py
merge_pdfs.py		merge_pdfs.py
monitoring_service.py		monitoring_service.py
pdf_engine.py		pdf_engine.py
rag_engine.py		rag_engine.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_monitoring.py		test_monitoring.py

License

josedaniel-dev/rgmfn

Folders and files

Latest commit

History

Repository files navigation

Raggamuffin: The Semantic PDF Orchestrator

✨ Key Features

Core Functionality

Integrations

🚀 Quick Start

Prerequisites

Installation

⚙️ Configuration

Required Configuration

Optional Integrations

🎯 Usage

1. Web Interface (Recommended)

Using the Dashboard

2. API Endpoints

PDF Merging

Document Chat (RAG)

Audio Features

Export Features

3. Command Line Interface (CLI)

🧪 Testing

Run All Tests

Test Categories

End-to-End Testing with Playwright

🚢 Deployment

Docker Deployment

Google Cloud Run Deployment

Static Frontend Deployment (GitHub Pages)

🏗️ System Architecture

Technology Stack

Directory Structure

Data Flow

🔧 Troubleshooting

Common Issues

"Session expired" error when chatting

Google Drive upload returns a stub ID

CORS errors when using the static frontend

"API key not found" errors

Tests failing with import errors

ElevenLabs voice not working

📚 Additional Documentation

📝 Notes and Constraints

🤝 Contributing

📄 License

🆘 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages