DocuChat is a Next.js app for document-grounded chat. Upload files, index them in Pinecone, and ask questions against retrieved context.
- Upload flow for PDF, DOCX, TXT, MD, and image files (PNG/JPG/JPEG)
- Text extraction pipeline:
- PDF via
pdf-parse - DOCX via
mammoth - Images via
tesseract.jsOCR - TXT/MD as UTF-8 text
- PDF via
- Chunking and embedding pipeline with LangChain
- Pinecone-backed vector retrieval
- Chat endpoint grounded in retrieved chunks with source metadata
- Sources panel in the UI showing file + chunk provenance
- BYOK (Bring Your Own Key) in the UI via local storage
- Next.js (App Router) + React + TypeScript
- LangChain
- Pinecone
- OpenAI chat + embeddings
- Tailwind CSS + Framer Motion
- Node.js 20+
- A Pinecone project and index
- OpenAI API key (server key and/or BYOK key in the UI)
Create .env.local in the project root:
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=docuchat-index
PINECONE_NAMESPACE=optional_namespace
OPENAI_API_KEY=your_openai_api_key
OPENAI_CHAT_MODEL=gpt-4o-mini
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIMENSION=1536Notes:
PINECONE_API_KEYis required.- If
OPENAI_API_KEYis not set on the server, users must set a BYOK key in the app before upload/chat will work. OPENAI_EMBEDDING_DIMENSIONshould match your Pinecone index dimension when needed.
npm install
npm run devOpen http://localhost:3000.
- Open the
Uploadview and import a document. - Wait for processing + indexing to complete.
- Open
Chatand ask questions about the uploaded content. - Optional: use the
BYOKbutton in the header to store your OpenAI key in your browser.
npm run dev- start development servernpm run build- create production buildnpm run start- run production servernpm run lint- run ESLint