A personal tool for summarizing large English PDFs with AI. SkimPDF breaks down your documents into manageable chunks and uses transformers to generate concise summaries — perfect for skimming research papers, reports, or any lengthy document.
- ✅ Upload any English PDF and get a summarized output
- ✅ Smart token-aware chunking to handle large files
- ✅ Abstractive summarization powered by
facebook/bart-large-cnn - ✅ Markdown-based parsing with
pymupdf4llmfor structured extraction - ✅ React-based sleek frontend (in separate repo)
- ✅ CORS-enabled API for easy frontend integration
| Layer | Tech |
|---|---|
| Backend | FastAPI, HuggingFace Transformers, PyMuPDF, pymupdf4llm |
| Frontend | React (modern, clean UI) |
| Model | facebook/bart-large-cnn |
git clone https://github.com/Anand-Raut/skimpdf.git
cd skimpdfpip install -r requirements.txtuvicorn main:app --reloadReturns:
{ "isrunning": true }Body: multipart/form-data with field pdfFile
Response:
{
"filename": "your_uploaded_file.pdf",
"status": "received",
"summary": "Summarized text here..."
}- User uploads a PDF from the React frontend
- Backend converts PDF to Markdown via
pymupdf4llm - Text is chunked based on token count using tokenizer logic
- Each chunk is summarized using
facebook/bart-large-cnn - Summaries are merged and sent back as the final response
SkimPDF is designed for local use only. Run it on your machine, connect it with the frontend, and you’re good to go.
- Async background summarization
- Smarter semantic chunking
- PDF upload frontend (styled, responsive)
- Optional Docker support
Anand Raut GitHub Profile
MIT License — Open source, use it, tweak it, improve it. If it breaks, you get to keep both pieces.