A revolutionary PDF chatbot that uses no vector embeddings or traditional RAG. Instead, it leverages Large Language Models for intelligent document selection and page relevance detection, providing a completely stateless and privacy-first experience.
Traditional PDF chatbots convert documents into vector embeddings for semantic search. This approach:
- ❌ Requires expensive vector databases
- ❌ Needs pre-processing and indexing
- ❌ Stores document data on servers
- ❌ Loses context and nuance in embeddings
Our No-Vector approach:
- ✅ Uses LLM reasoning instead of vectors
- ✅ Processes documents in real-time
- ✅ Completely stateless - no server storage
- ✅ Preserves full document context
- ✅ Privacy-first - documents stay in your browser
graph TD
A[📄 User uploads PDFs] --> B[🧠 LLM Document Selection]
B --> C[🎯 LLM Page Relevance Detection]
C --> D[💬 Contextual Answer Generation]
B --> B1[Analyzes collection description<br/>+ document filenames<br/>+ user question]
C --> C1[Examines actual page content<br/>from selected documents<br/>in parallel processing]
D --> D1[Generates comprehensive answer<br/>with proper citations]
- LLM reads your collection description and document filenames
- Intelligently selects which documents are likely to contain relevant information
- No embeddings needed - uses reasoning and context understanding
- LLM examines actual page content from selected documents
- Processes multiple documents in parallel for speed
- Identifies the most relevant pages based on question context
- Uses only the relevant pages to generate accurate answers
- Maintains full document context and nuance
- Provides proper citations and references
- Zero Server Storage: Documents processed and stored entirely in your browser
- LocalStorage Persistence: Your documents persist across browser sessions
- No Data Leakage: Document content never persists on servers
- Serverless-Friendly: Perfect for Vercel/Netlify deployments
- Up to 100 PDF documents per session
- Chunked Upload System: Automatically handles large file sets (>4.5MB)
- 4.5MB per file limit - processes substantial documents
- Real-time Processing: No pre-indexing required
- Multi-Model Support: GPT-5 GPT-5-mini, and more
- Parallel Processing: Multiple documents analyzed simultaneously
- Context Preservation: Full document context maintained throughout
- Dynamic Descriptions: Edit collection descriptions anytime
- Responsive Design: Works on desktop and mobile
- Real-time Progress: Visual feedback during uploads and processing
- GitHub Integration: Easy access to source code
- Error Handling: Comprehensive error messages and recovery
- React + Vite: SPA build with lightning-fast dev server and static output
- TypeScript: Type safety and better DX
- Tailwind CSS: Utility-first styling (via PostCSS plugin)
- Lucide React: Clean, consistent icon set
- FastAPI + Uvicorn: High-performance Python API
- PyPDF2: Robust PDF text extraction
- OpenAI: LLM reasoning for doc/page selection and answers
- Chunked Processing: Efficient multi-file uploads
- Docker Compose: One command to run frontend (Nginx) and backend
- No Databases: Completely stateless architecture
- Nginx: Serves built React app and proxies API
- Node.js 18+ and npm
- Python 3.10+ (if you need to start the backend manually)
- Docker (optional, to run with containers)
- OpenAI API key (required for full functionality)
git clone https://github.com/yansalim/no-vector_local.git
cd no-vector_local
npm install- Frontend (dev): uses
VITE_API_BASE_URLto point to the backend - Backend: uses
OPENAI_API_KEY(required)
Create .env in the backend's root directory (or export the variable):
OPENAI_API_KEY=your_openai_api_key_here# starts frontend (Vite) + backend (Uvicorn) in parallel (Windows-friendly)
npm run dev
# frontend only (port 3000)
npm run web
# backend only (port 8000)
npm run backend:winVisit http://localhost:3000 (or 3001 if 3000 is in use)
If the backend is at a different URL on your machine, export it before building the frontend:
set VITE_API_BASE_URL=http://localhost:8000 && npm run webdocker compose build --no-cache
docker compose up -dUseful environment variables (via docker-compose.yml):
BACKEND_URL(passed to the build asVITE_API_BASE_URLfor the frontend)OPENAI_API_KEY(backend)
URLs:
- Frontend: http://localhost:3000
- Backend: http://localhost:8000/health
- Click "Add Your First Document" or "Add Files"
- Select up to 100 PDF files (4.5MB each max)
- Add a description of your document collection
- Large uploads are automatically chunked for reliability
- Ask questions about your documents in natural language
- Watch the 3-step process: Document Selection → Page Detection → Answer Generation
- Get detailed answers with timing and cost breakdowns
- Add more documents anytime
- Edit collection descriptions
- Start new sessions as needed
- All data stays in your browser
┌─────────────────┐ ┌───────────────┐ ┌─────────────────┐
│ Browser │ │ FastAPI │ │ OpenAI API │
│ │ │ │ │ │
│ • LocalStorage │◄──►│ • /upload │◄──►│ • GPT Models │
│ • Document Data │ │ • /chat/stream│ │ • Real-time │
│ • Chat History │ │ • /health │ │ Processing │
│ • Session State │ │ • Stateless │ │ │
└─────────────────┘ └───────────────┘ └─────────────────┘
When uploading large document sets:
- Size Detection: Frontend calculates total upload size
- Automatic Chunking: Splits into 3.5MB chunks if needed
- Parallel Processing: Each chunk processed independently
- Progressive Results: Documents become available as chunks complete
- Error Recovery: Failed chunks can be retried individually
Upload and process PDF documents
- Input: FormData with files and description
- Output: Processed documents with extracted text
- Features: Automatic chunking, progress tracking
Stream chat responses in real-time
- Input: Question, documents, chat history
- Output: Server-sent events with processing steps
- Features: Real-time progress, cost tracking, citations
Service health check
- Output: System status and mode information
| Traditional RAG | No-Vector Approach |
|---|---|
| 🗄️ Requires vector database | 🚫 No database needed |
| 📊 Pre-processes to embeddings | 🔄 Real-time processing |
| 💰 Expensive infrastructure | 💸 Serverless & cost-effective |
| 🔒 Stores data on servers | 🛡️ Browser-only storage |
| 📏 Limited by embedding dimensions | 🧠 Full context understanding |
| ⚡ Fast retrieval, lossy context | 🎯 Accurate reasoning, full context |
- Upload: Marketing team uploads 50 company PDFs
- Describe: "Company policies, procedures, and guidelines"
- Ask: "What is our remote work policy?"
- Process:
- 🧠 LLM selects "HR Handbook" and "Remote Work Guidelines"
- 🎯 Identifies relevant pages about remote work
- 💬 Generates comprehensive answer with citations
- Result: Accurate answer in ~15 seconds with cost breakdown
- Multi-format Support: Word docs, PowerPoint, Excel
- Advanced Citations: Highlight exact text passages
- Collaboration Features: Share sessions with team members
- Analytics Dashboard: Usage patterns and insights
- Custom Models: Support for local and custom LLMs
- Batch Operations: Process multiple questions simultaneously
We welcome contributions! This project showcases how modern LLMs can replace traditional vector-based approaches while providing better accuracy and user experience.
MIT License - see LICENSE file for details.
⭐ Star us on GitHub if you find this no-vector approach interesting!
Built with ❤️ by ROE AI Inc.