Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.
- 🔍 Document Extraction: Extract structured data from PDFs and images using Vision AI
- 🦙 Ollama-First: Privacy-first default using local
llama3.2-visionmodel - 🔧 Zero Setup: Auto-installs Ollama via Homebrew if needed, auto-pulls models
- 📄 Multi-Format: Supports PDFs and images (PNG, JPEG, WebP)
- 🔬 OCR-Enhanced: Uses Tesseract.js for accurate text extraction from receipts
- 💾 Local Storage: All data persists to local SQLite database
- 🔎 Semantic Search: Natural language search over indexed documents (coming soon)
- 🤖 MCP Integration: Use via Claude Desktop or any MCP-compatible assistant
- 🔒 Privacy-First: Data stays on your machine (unless you opt for cloud AI)
npm install -g doc-agentExtract document data (uses Ollama by default):
doc extract invoice.pdf💡 Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.
With Gemini (cloud, higher accuracy):
export GEMINI_API_KEY=your_key_here
doc extract invoice.pdf --provider geminiStart MCP server:
doc mcpAdd to your claude_desktop_config.json:
{
"mcpServers": {
"doc-agent": {
"command": "npx",
"args": ["-y", "doc-agent", "mcp"],
"env": {
"GEMINI_API_KEY": "your_key_here"
}
}
}
}Then in Claude Desktop:
"Extract data from ~/Downloads/invoice.pdf"
# Clone and install dependencies
git clone https://github.com/prosdevlab/doc-agent
cd doc-agent
pnpm install
# Build the project
pnpm build
# Run CLI locally
pnpm dev extract examples/invoice.pdf
# Run tests
pnpm test
# Start MCP server
pnpm mcpThe CLI is built with Ink (React for CLIs) for rich interactive output:
packages/
├── cli/ # Ink-based CLI with services, hooks, and components
├── core/ # Shared types and interfaces
├── extract/ # Document extraction (Gemini, Ollama) + OCR
├── storage/ # SQLite persistence (Drizzle ORM)
└── vector-store/ # Vector database for semantic search
See ROADMAP.md for the project plan.
MIT