A vectorless RAG (Retrieval Augmented Generation) implementation that uses tools like ripgrep and semantic document analysis instead of traditional vector databases.
- PDF Document Processing: Upload and automatically extract text, generate summaries, tags, and references for each page
- AI-Powered Query Processing: Use GPT-4 with tool-calling capabilities to search through documents intelligently
- Semantic Search: Find relevant content using natural language queries
- Reference Following: Automatically discover and follow cross-references between document pages
- Type-Safe API: Built with TRPC for end-to-end type safety
- Runtime: Bun
- Framework: Next.js 14 with App Router
- Frontend: React with Tailwind CSS
- Backend: Next.js API Routes with TRPC
- AI: Vercel AI SDK with OpenAI GPT-4
- PDF Processing: pdf-parse
-
Install dependencies:
bun install
-
Environment Setup: Copy
.env.exampleto.env.localand add your OpenAI API key:cp .env.example .env.local
Edit
.env.local:OPENAI_API_KEY=your_openai_api_key_here -
Development:
bun run dev
-
Production:
bun run build bun run start
- Upload: User uploads a PDF document (100+ pages)
- Processing: Backend extracts text from each page and creates DocumentPage structures
- Analysis: AI generates summaries, tags, and extracts references for each page
- Query: User asks questions about the document
- Search: AI agent uses tools to gather relevant context across multiple pages
- Answer: System provides comprehensive answers with source references
Each page contains:
pageNbr: PDF page numberpageContentNbr: Page number shown in document (optional)content: Extracted text contentsummary: AI-generated 1-5 sentence summarytags: Semantic topic tags for retrievalreferences: Section numbers, chapter references, etc.
The query agent has access to:
search_pages: Find pages containing keywordsget_page_content: Retrieve full page contentsearch_references: Find pages referencing specific sectionsget_related_pages: Find semantically related pages
uploadDocument: Process and store PDF documentsqueryDocument: Query documents with natural languagegetDocuments: List all uploaded documents
Perfect for:
- Legal documents with complex cross-references
- Technical manuals with section references
- Academic papers with citations
- Regulatory documents with circular references
The system creates an Abstract Syntax Tree-like structure for documents, enabling:
- Semantic understanding without vectors
- Tool-based search and retrieval
- Cross-reference following
- Multi-step reasoning across document sections