Skip to content

A simple example of Retrieval-Augmented Generation (RAG) application with basic functionality. PDFChat-RAG is a modular, containerized application that enables semantic search and chat over PDF documents using RAG. Built with Flask, Celery, Ollama, ChromaDB, and more, it provides an end-to-end pipeline for uploading, and conversing with PDF .

License

Notifications You must be signed in to change notification settings

darshanz/pdfchat-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFChat-RAG

A simple example of Retrieval-Augmented Generation (RAG) application with basic functionality.
PDFChat-RAG is a modular, containerized application that enables semantic search and chat over PDF documents using RAG. Built with Flask, Celery, Ollama, ChromaDB, and more, it provides an end-to-end pipeline for uploading, and conversing with PDF documents using local language models.

🚀 Features

  • 📄 Upload PDFs and extract embedded text
  • 🔍 Semantic search over vector embeddings using ChromaDB
  • 💬 Chat interface for querying document content via RAG
  • 🧠 LLM integration using Ollama
  • 🧰 Background processing with Celery & Redis
  • 🗂️ Metadata storage in MongoDB

🧱 Architecture

Overall architecture

🐳 Dockerized Services

Container Role
flask-app PDF upload, RAG chat UI/API
ollama Hosts and serves LLM models
celery Runs background tasks
redis Pub/sub and Celery broker
chromadb Stores and queries vector embeddings
mongodb Stores app metadata and user sessions

Clone the Repo

git clone https://github.com/darshanz/pdfchat-rag.git
cd pdfchat-rag
  • NB. Create relevant directories for storing the uploaded files in the host machine and map them to the flask-app docker ocntainer. Similarly map the directories for mongodb and ollama-models directories.

Start the App

docker-compose up --build

This will start all containers. Access the app at http://localhost:5000

TODO

  • Include support for multiple files
  • User loging/auth
  • Save chat history for given pdf
  • Extracts images and tables from the document and Include in chat responses

📜 License

This project is licensed under the MIT License.

About

A simple example of Retrieval-Augmented Generation (RAG) application with basic functionality. PDFChat-RAG is a modular, containerized application that enables semantic search and chat over PDF documents using RAG. Built with Flask, Celery, Ollama, ChromaDB, and more, it provides an end-to-end pipeline for uploading, and conversing with PDF .

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published