doc-agent

Document extraction and semantic search CLI with MCP integration. Extract structured data from invoices, receipts, and bank statements using Vision AI.

Features

🔍 Document Extraction: Extract structured data from PDFs and images using Vision AI
🦙 Ollama-First: Privacy-first default using local llama3.2-vision model
🔧 Zero Setup: Auto-installs Ollama via Homebrew if needed, auto-pulls models
📄 Multi-Format: Supports PDFs and images (PNG, JPEG, WebP)
🔬 OCR-Enhanced: Uses Tesseract.js for accurate text extraction from receipts
💾 Local Storage: All data persists to local SQLite database
🔎 Semantic Search: Natural language search over indexed documents (coming soon)
🤖 MCP Integration: Use via Claude Desktop or any MCP-compatible assistant
🔒 Privacy-First: Data stays on your machine (unless you opt for cloud AI)

Quick Start

Installation

npm install -g doc-agent

Usage

Extract document data (uses Ollama by default):

doc extract invoice.pdf

💡 Don't have Ollama? No problem! The CLI will offer to install it for you via Homebrew.

With Gemini (cloud, higher accuracy):

export GEMINI_API_KEY=your_key_here
doc extract invoice.pdf --provider gemini

Start MCP server:

doc mcp

Claude Desktop Integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "doc-agent": {
      "command": "npx",
      "args": ["-y", "doc-agent", "mcp"],
      "env": {
        "GEMINI_API_KEY": "your_key_here"
      }
    }
  }
}

Then in Claude Desktop:

"Extract data from ~/Downloads/invoice.pdf"

Development

# Clone and install dependencies
git clone https://github.com/prosdevlab/doc-agent
cd doc-agent
pnpm install

# Build the project
pnpm build

# Run CLI locally
pnpm dev extract examples/invoice.pdf

# Run tests
pnpm test

# Start MCP server
pnpm mcp

Architecture

The CLI is built with Ink (React for CLIs) for rich interactive output:

packages/
├── cli/           # Ink-based CLI with services, hooks, and components
├── core/          # Shared types and interfaces
├── extract/       # Document extraction (Gemini, Ollama) + OCR
├── storage/       # SQLite persistence (Drizzle ORM)
└── vector-store/  # Vector database for semantic search

Roadmap

See ROADMAP.md for the project plan.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.changeset		.changeset
.github		.github
.husky		.husky
examples		examples
packages		packages
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SEMANTIC-SEARCH-ROADMAP.md		SEMANTIC-SEARCH-ROADMAP.md
WORKFLOW.md		WORKFLOW.md
biome.json		biome.json
commitlint.config.js		commitlint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
turbo.json		turbo.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

doc-agent

Features

Quick Start

Installation

Usage

Claude Desktop Integration

Development

Architecture

Roadmap

License

About

Uh oh!

Releases

Packages

Languages

License

prosdevlab/doc-agent

Folders and files

Latest commit

History

Repository files navigation

doc-agent

Features

Quick Start

Installation

Usage

Claude Desktop Integration

Development

Architecture

Roadmap

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages