Skip to content

ceperezegma/Simple-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Tutorial App

A minimal Retrieval-Augmented Generation (RAG) pipeline using OpenAI and ChromaDB to answer questions from your own PDF documents.

Features

  • Initialize OpenAI embeddings and chat models
  • Ingest PDF documents and extract text
  • Split text into overlapping chunks
  • Store embeddings in a persistent ChromaDB collection
  • Load ChromaDB and retrieve relevant chunks
  • Generate responses to user questions using OpenAI Chat API
  • .env support with .gitignore to keep secrets out of version control

Setup

  1. Create and activate a virtual environment (optional but recommended).
  2. Install dependencies: pip install -r requirements.txt
  3. Create a .env file in the project root with your OpenAI API key: OPENAI_API_KEY=sk-...

    Backward compatibility if you already have this:

    OPENAI_KEY=sk-...

Optional environment variables:

  • OPENAI_EMBEDDING_MODEL (default: text-embedding-3-small)
  • OPENAI_CHAT_MODEL (default: gpt-4o-mini)
  • CHROMA_DIR (default: ./chroma)
  • CHROMA_COLLECTION (default: rag_docs)
  • CHUNK_SIZE (default: 1000)
  • CHUNK_OVERLAP (default: 200)
  • TOP_K (default: 3)

Usage

Ingest PDFs from a folder: python -m src.main ingest path\to\pdf_folder

Ask a question: python -m src.main ask "What are the key points?"

Notes

  • The Chroma vector store is persisted under CHROMA_DIR, which you may want to add to .gitignore if committing publicly.
  • Ensure your .env is listed in .gitignore (it is included by default in this repo).

About

A simple RAG system to answer question about a single article

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages