New Rag Test App 🤖

An intelligent AI-powered assistant that helps you understand and analyze data models through natural conversation. Built with Streamlit and powered by OpenAI's language models.

🌟 Features

PDF Document Processing: Automatically extracts and processes text from uploaded PDF files
Conversational Interface: Natural dialogue with context-aware responses
Vector Database: Uses ChromaDB with OpenAI embeddings for efficient storage/retrieval
Streaming Responses: Real-time answer generation with typing indicator
Session Persistence: Maintains conversation history and document context
Debug Tools: Inspect retrieved document chunks during conversations

🚀 Quick Start

Clone Repository

git clone https://github.com/yourusername/pdfrag.git
cd pdfrag

Install Dependencies
```
pip install -r requirements.txt
```

Configure Environment

echo 'OPENAI_API_KEY="your-api-key-here"' > .env

Organize PDFs
```
mkdir -p data  # Place PDF files here
```
Launch Application
```
streamlit run app.py
```

📋 Requirements

Python 3.10+
OpenAI API key
PDF documents in data/ directory

🔧 Technical Stack

Framework: Streamlit
Language Model: GPT-4 via OpenAI API
Embeddings: text-embedding-3-large
Vector Database: ChromaDB
PDF Processing: PyPDFDirectoryLoader

🛠️ Implementation Details

The application follows these key steps:

Document Ingestion:
- Load PDFs from data/ directory
- Split documents into 2000-character chunks
- Generate vector embeddings for each chunk
Vector Storage:
- Store embeddings in ChromaDB with persistent storage
- Enable efficient similarity search
Conversation Flow:
- Maintain chat history in session state
- Combine retrieved documents with LLM context
- Stream responses character-by-character
Error Handling:
- Validate PDF directory existence
- Catch and display processing errors
- Graceful handling of missing credentials

🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests.

📝 License

This project is licensed under the terms included in the LICENSE file.

🆘 Troubleshooting

Ensure all dependencies are correctly installed
Check that your OpenAI API key is valid and properly set in the .env file
For PDF processing issues, ensure your PDFs are text-based and not scanned images
If the vector store isn't loading, check file permissions and storage space

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.openai.com		api.openai.com
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New Rag Test App 🤖

🌟 Features

🚀 Quick Start

📋 Requirements

🔧 Technical Stack

🛠️ Implementation Details

🤝 Contributing

📝 License

🆘 Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kazimali07/pdfrag

Folders and files

Latest commit

History

Repository files navigation

New Rag Test App 🤖

🌟 Features

🚀 Quick Start

📋 Requirements

🔧 Technical Stack

🛠️ Implementation Details

🤝 Contributing

📝 License

🆘 Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages