A robust open source Model Context Protocol (MCP) server for reading and analyzing PDF documents. This server enables AI assistants and tools to seamlessly interact with PDF files through a standardized protocol.
π Open Source & Community Driven - Built with β€οΈ by the community, for the community.
- π§ Smart Content Analysis: Intelligent PDF content type detection (text, scanned images, mixed, or no content)
- π Server Intelligence: New
pdf_server_infotool provides comprehensive setup guidance and directory insights - π Enhanced PDF Processing: Read, validate, and extract text with automatic recommendations for next steps
- π― Workflow Guidance: Context-aware suggestions on when to use asset extraction based on content analysis
- πΌοΈ Visual Asset Extraction: Detect and extract images from PDFs with format identification
- π Smart Search: Find PDF files with fuzzy search capabilities
- π Statistics: Get comprehensive directory and file statistics
- ποΈ Structured Data Extraction: Extract content with positioning coordinates, formatting, and semantic relationships
- π Table Detection: Intelligent table structure recognition and data extraction
- π Content Querying: Search and filter extracted content using flexible criteria
- π Comprehensive Metadata: Extract document properties, page information, and custom metadata
- π Dual Mode Support:
- Stdio Mode: Standard MCP protocol for AI assistants (Zed, Claude Desktop, etc.)
- Server Mode: HTTP REST API with SSE transport for web integration
- β‘ Production Ready: Comprehensive error handling, logging, and graceful shutdown
- π§ͺ Well Tested: 65-76% test coverage with unit and integration tests
- π οΈ Easy Integration: Simple installation and configuration
- AI Code Editors: Integrate with Zed editor for PDF document analysis
- Documentation Tools: Extract and analyze technical documentation with structure preservation
- Research Assistants: Process academic papers and research documents with semantic understanding
- Data Extraction: Extract structured data from forms, tables, and formatted documents
- Content Management: Organize and search large PDF collections with intelligent querying
- Web Applications: HTTP API for web-based PDF processing and analysis
If you have Go installed, you can install directly:
# Install directly from GitHub
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest
# Verify installation
mcp-pdf-reader --help# Clone the repository
git clone https://github.com/a3tai/mcp-pdf-reader.git
cd mcp-pdf-reader
# Build and install using Go's standard install method
make install
# Ensure Go's bin directory is in your PATH (usually already is)
export PATH="$(go env GOPATH)/bin:$PATH"
# Verify installation
mcp-pdf-reader --help# Build from source (creates local binary)
make build
# Or install Go dependencies and build locally
go mod tidy
go build -o mcp-pdf-reader cmd/mcp-pdf-reader/main.go
# Or install directly with Go (installs to GOPATH/bin)
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest- Go 1.21+ for building from source
- Linux, macOS, or Windows (tested on all platforms)
Perfect for AI assistants and editors like Zed:
# Use current directory for PDFs (default)
mcp-pdf-reader
# Specify PDF directory
mcp-pdf-reader --dir=/path/to/documents
# Debug mode
mcp-pdf-reader --dir=/path/to/documents --log-level=debugFor web applications and REST API access:
# Start HTTP server
mcp-pdf-reader --mode=server --dir=/path/to/documents
# Custom host and port
mcp-pdf-reader --mode=server --host=0.0.0.0 --port=9090 --dir=/docs
# Health check
curl http://localhost:8080/health| Flag | Default | Description |
|---|---|---|
--mode |
stdio |
Server mode: stdio or server |
--dir |
current directory | Directory containing PDF files |
--host |
127.0.0.1 |
Server host (server mode only) |
--port |
8080 |
Server port (server mode only) |
--log-level |
info |
Log level: debug, info, warn, error |
--max-file-size |
104857600 |
Maximum PDF file size in bytes (100MB) |
# Basic usage (stdio mode for MCP clients) - uses current directory
mcp-pdf-reader
# Specify custom directory
mcp-pdf-reader --dir=/path/to/pdfs
# Server mode for testing/debugging
mcp-pdf-reader --mode=server --dir=./docs
# Custom port and host
mcp-pdf-reader --mode=server --host=0.0.0.0 --port=9090
# Debug mode
mcp-pdf-reader --mode=server --log-level=debug --dir=./docs
# Larger file size limit (200MB)
mcp-pdf-reader --max-file-size=209715200 --dir=./docs
# Environment variables (alternative to flags)
MCP_PDF_DIR=/path/to/pdfs mcp-pdf-reader
MCP_PDF_MODE=server MCP_PDF_PORT=9090 mcp-pdf-reader| Editor | Config File | Configuration |
|---|---|---|
| Zed | ~/.config/zed/settings.json |
"mcp-pdf-reader": {"command": {"path": "mcp-pdf-reader", "args": []}} |
| Cursor | ~/.cursor/settings.json |
"mcp-pdf-reader": {"command": "mcp-pdf-reader", "args": ["--dir=${workspaceFolder}"]} |
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json |
"mcp-pdf-reader": {"command": "mcp-pdf-reader", "args": ["--dir=/path/to/docs"]} |
| VS Code | .vscode/settings.json |
"claude.mcpServers": {"mcp-pdf-reader": {"command": "mcp-pdf-reader", "args": ["--dir=${workspaceFolder}"]}} |
# 1. Verify installation
mcp-pdf-reader --help
# 2. Test with sample directory
mkdir -p ~/test-pdfs
mcp-pdf-reader -mode=server -pdfdir=~/test-pdfs
# 3. Check health endpoint (server mode)
curl http://localhost:8080/health
# 4. Test MCP tools
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | mcp-pdf-readerThe server provides comprehensive PDF analysis tools via the MCP protocol, including both basic extraction and advanced structured analysis:
Extract text content from a PDF file.
Parameters:
path(string): Full path to the PDF file
Example:
{
"path": "/home/user/documents/research.pdf"
}Extract visual assets like images from a PDF file.
Parameters:
path(string): Full path to the PDF file
Example:
{
"path": "/home/user/documents/presentation.pdf"
}Validate if a file is a readable PDF.
Parameters:
path(string): Full path to the PDF file
Example:
{
"path": "/home/user/documents/document.pdf"
}Get detailed statistics about a PDF file including metadata.
Parameters:
path(string): Full path to the PDF file
Example:
{
"path": "/home/user/documents/report.pdf"
}List and search PDF files in a directory with optional fuzzy search.
Parameters:
directory(string): Directory path to searchquery(string): Optional fuzzy search query
Example:
{
"directory": "/home/user/documents",
"query": "machine learning"
}Get statistics about PDF files in a directory.
Parameters:
directory(string): Directory path to analyze
Example:
{
"directory": "/home/user/documents"
}Extract structured content with positioning coordinates and formatting information.
Parameters:
path(string): Full path to the PDF filemode(string): Extraction mode - "raw", "structured", "semantic", "table", or "complete" (default: "structured")config(object): Configuration optionsextract_text(bool): Extract text contentextract_images(bool): Extract imagesextract_tables(bool): Extract tablesextract_forms(bool): Extract form fieldsextract_annotations(bool): Extract annotationsinclude_coordinates(bool): Include positioning coordinatesinclude_formatting(bool): Include formatting informationpages(array): Specific pages to extract (default: all)min_confidence(number): Minimum confidence threshold
Example:
{
"path": "/home/user/documents/form.pdf",
"mode": "structured",
"config": {
"extract_text": true,
"include_coordinates": true,
"include_formatting": true,
"pages": [1, 2, 3]
}
}Extract tabular data from PDF with structure preservation and cell-level analysis.
Parameters:
path(string): Full path to the PDF fileconfig(object): Configuration optionsinclude_coordinates(bool): Include positioning coordinatespages(array): Specific pages to extract (default: all)min_confidence(number): Minimum confidence threshold
Example:
{
"path": "/home/user/documents/spreadsheet.pdf",
"config": {
"include_coordinates": true,
"min_confidence": 0.7
}
}Extract content with semantic grouping and relationship detection.
Parameters:
path(string): Full path to the PDF fileconfig(object): Configuration optionsinclude_coordinates(bool): Include positioning coordinatesinclude_formatting(bool): Include formatting informationpages(array): Specific pages to extract (default: all)min_confidence(number): Minimum confidence threshold
Example:
{
"path": "/home/user/documents/document.pdf",
"config": {
"include_coordinates": true,
"include_formatting": true
}
}Comprehensive extraction of all content types (text, images, tables, forms, annotations).
Parameters:
path(string): Full path to the PDF fileconfig(object): Configuration optionspages(array): Specific pages to extract (default: all)min_confidence(number): Minimum confidence threshold
Example:
{
"path": "/home/user/documents/complex.pdf",
"config": {
"pages": [1, 2, 3],
"min_confidence": 0.8
}
}Query and filter extracted PDF content using flexible search criteria.
Parameters:
path(string): Full path to the PDF filequery(object): Query criteria for filtering contentcontent_types(array): Content types to filter ("text", "image", "table", "form", "annotation")pages(array): Pages to searchtext_query(string): Text search querymin_confidence(number): Minimum confidence thresholdbounding_box(object): Spatial filter areax(number): X coordinatey(number): Y coordinatewidth(number): Widthheight(number): Height
Example:
{
"path": "/home/user/documents/report.pdf",
"query": {
"content_types": ["text", "table"],
"text_query": "revenue",
"pages": [1, 2, 3],
"min_confidence": 0.7
}
}Get detailed information about PDF pages including dimensions, layout, and properties.
Parameters:
path(string): Full path to the PDF file
Example:
{
"path": "/home/user/documents/document.pdf"
}Extract comprehensive document metadata and properties.
Parameters:
path(string): Full path to the PDF file
Example:
{
"path": "/home/user/documents/document.pdf"
}The PDF reader now provides intelligent content type detection and recommendations:
A new tool that provides comprehensive server information and usage guidance.
What it provides:
- π Server capabilities and configuration
- π Current directory contents (PDF files found)
- π οΈ Complete list of available tools with usage guidance
- π Step-by-step workflow recommendations
- πΌοΈ Supported image formats for asset extraction
Usage:
{
"name": "pdf_server_info",
"arguments": {}
}Why use it: Start here to understand what PDFs are available and how to best analyze them.
The pdf_read_file tool now provides smart content analysis:
Content Type Detection:
- π
text- PDF contains readable text content - πΌοΈ
scanned_images- PDF contains scanned images with minimal text - π
mixed- PDF contains both text and images - β
no_content- PDF appears empty or unreadable
Smart Recommendations:
- β
Automatic guidance on whether to use
pdf_assets_file - π Image count detection - know if images are present before extraction
- π― Next step suggestions based on content type
Enhanced Response Format:
Successfully read PDF: /path/to/document.pdf
Pages: 15
Size: 2458392 bytes
Content Type: mixed
Has Images: true
Image Count: 8
π‘ INFO: This PDF contains both text and images. You may want to use 'pdf_assets_file' to extract the images as well.
Content:
[extracted text content...]
The system now provides contextual recommendations:
- For text-based PDFs: Content is ready to use, no further action needed
- For scanned documents: Recommends using
pdf_assets_fileto extract images - For mixed content: Suggests optional image extraction based on your needs
- For problematic files: Provides specific troubleshooting guidance
- π Proactive validation - tools suggest when files might not be readable
- π Rich context - understand your PDF directory contents upfront
- π― Targeted recommendations - know which tools to use when
- π Comprehensive guidance - built-in usage instructions and examples
Add to your Zed settings (~/.config/zed/settings.json):
{
"context_servers": {
"mcp-pdf-reader": {
"command": {
"path": "mcp-pdf-reader",
"args": ["-pdfdir=${workspaceFolder}"],
"env": null
},
"settings": {}
}
}
}Project-specific Zed configuration (.zed/settings.json in your project):
{
"context_servers": {
"mcp-pdf-reader": {
"command": {
"path": "mcp-pdf-reader",
"args": ["-pdfdir=./docs"],
"env": null
},
"settings": {}
}
}
}Add to your Cursor settings (~/.cursor/settings.json):
{
"mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "${workspaceFolder}"],
"env": {}
}
}
}For specific PDF directories:
{
"mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "/path/to/your/documents"],
"env": {}
}
}
}Add to your Windsurf configuration (~/.windsurf/settings.json):
{
"mcp": {
"servers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "${workspaceRoot}"],
"env": {}
}
}
}
}Project-specific Windsurf config (.windsurf/settings.json):
{
"mcp": {
"servers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "./documentation"],
"env": {}
}
}
}
}Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "/path/to/your/documents"]
}
}
}For multiple document directories:
{
"mcpServers": {
"mcp-pdf-reader-docs": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "/Users/yourname/Documents"]
},
"mcp-pdf-reader-research": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "/Users/yourname/Research/papers"]
}
}
}Add to your VS Code settings (settings.json):
{
"claude.mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "${workspaceFolder}"],
"env": {}
}
}
}Workspace-specific settings (.vscode/settings.json):
{
"claude.mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "./docs"],
"env": {}
}
}
}Add to your Roo configuration (~/.roo/config.json):
{
"mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "{{workspace}}"],
"cwd": "{{workspace}}"
}
}
}For specific directories:
{
"mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "/path/to/pdfs"],
"cwd": "/path/to/pdfs"
}
}
}Add to your Cline settings in VS Code (settings.json):
{
"cline.mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "${workspaceFolder}/docs"],
"env": {}
}
}
}Global Cline configuration:
{
"cline.mcpServers": {
"mcp-pdf-reader": {
"command": "mcp-pdf-reader",
"args": ["-pdfdir", "${env:HOME}/Documents"],
"env": {}
}
}
}# Most editors support workspace variables
-pdfdir=${workspaceFolder} # Zed, VS Code-based
-pdfdir=${workspaceRoot} # Windsurf
-pdfdir={{workspace}} # Roo# For documentation in your project
-pdfdir=./docs
-pdfdir=./documentation
-pdfdir=./papers# For personal document collections
-pdfdir=${env:HOME}/Documents
-pdfdir=/Users/yourname/Documents # macOS
-pdfdir=/home/yourname/Documents # Linux
-pdfdir=C:\Users\yourname\Documents # WindowsYou can run multiple instances for different directories:
{
"context_servers": {
"mcp-pdf-reader-docs": {
"command": {
"path": "mcp-pdf-reader",
"args": ["-pdfdir=./docs", "-port=8080"]
}
},
"mcp-pdf-reader-research": {
"command": {
"path": "mcp-pdf-reader",
"args": ["-pdfdir=/path/to/research", "-port=8081"]
}
}
}
}-
After Installation: The
mcp-pdf-readerbinary will be globally available if$(go env GOPATH)/binis in your PATH (default with Go installations). -
Verify Installation: Run
mcp-pdf-reader --helpto ensure it's working. -
Test Configuration: Start with stdio mode (default) for MCP clients, use server mode for debugging.
-
Path Variables: Most editors support workspace variables - use them for portable configurations.
-
Multiple Directories: Create separate MCP server instances for different PDF collections.
Problem: After installation, the binary is not found in PATH.
Solutions:
# Check if Go's bin directory is in your PATH
echo $PATH | grep $(go env GOPATH)/bin
# If not found, add to your shell profile
echo 'export PATH="$(go env GOPATH)/bin:$PATH"' >> ~/.bashrc # Linux/WSL
echo 'export PATH="$(go env GOPATH)/bin:$PATH"' >> ~/.zshrc # macOS (if using zsh)
# Reload your shell
source ~/.bashrc # or ~/.zshrcProblem: Installation fails with permission errors.
Solutions:
# Don't use sudo with go install - it should install to your user directory
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest
# If still having issues, check your GOPATH
go env GOPATH
go env GOBINProblem: Build fails with module or dependency errors.
Solutions:
# Clean module cache and retry
go clean -modcache
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest
# Or build from source
git clone https://github.com/a3tai/mcp-pdf-reader.git
cd mcp-pdf-reader
go mod tidy
make installProblem: Editor can't connect to the MCP server.
Solutions:
-
Verify binary is accessible:
which mcp-pdf-reader mcp-pdf-reader --help
-
Test in stdio mode:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | mcp-pdf-reader
-
Check editor-specific config location:
- Zed:
~/.config/zed/settings.json - Cursor:
~/.cursor/settings.json - Claude Desktop:
~/Library/Application Support/Claude/claude_desktop_config.json(macOS) - VS Code:
.vscode/settings.json(workspace) or user settings
- Zed:
Problem: PDF directory path is invalid.
Solutions:
# Use absolute paths
"args": ["-pdfdir=/home/user/Documents"]
# Or verify workspace variables work in your editor
"args": ["-pdfdir=${workspaceFolder}/docs"]
# Create the directory if it doesn't exist
mkdir -p ~/Documents/pdfsProblem: Server can't find PDFs in the specified directory.
Solutions:
-
Check file extensions (must be
.pdf):ls -la /path/to/pdfs/*.pdf -
Test directory access:
mcp-pdf-reader -mode=server -pdfdir=/path/to/pdfs # Then visit http://localhost:8080/health -
Check permissions:
ls -la /path/to/pdfs/ # Ensure read permissions on directory and files
Problem: MCP server terminates unexpectedly.
Solutions:
-
Run in server mode for debugging:
mcp-pdf-reader -mode=server -pdfdir=./docs -loglevel=debug
-
Check for port conflicts (server mode):
lsof -i :8080 # Check if port 8080 is in use mcp-pdf-reader -mode=server -port=8081 # Try different port
-
Verify PDF directory permissions:
# Test with a simple directory mkdir -p ~/test-pdfs mcp-pdf-reader -mode=server -pdfdir=~/test-pdfs
Problem: "File too large" or memory errors.
Solutions:
# Increase file size limit (default: 100MB)
mcp-pdf-reader -maxfilesize=209715200 # 200MB
# Check file sizes
ls -lh /path/to/pdfs/*.pdfProblem: PDF content appears empty or garbled.
Solutions:
- Test with different PDFs (some PDFs may be image-only or encrypted)
- Use validation tool:
mcp-pdf-reader -mode=server -pdfdir=./docs # Then test with the validate_pdf tool
- Restart Zed after config changes
- Check Zed's output panel for MCP errors
- Use absolute paths if workspace variables don't work
- Restart Cursor after configuration changes
- Check the "Output" tab for MCP-related logs
- Ensure the MCP extension is enabled
- Restart Claude Desktop after config changes
- Check
~/Library/Logs/Claude/for error logs (macOS) - Verify JSON syntax in config file
- Check extension logs in the "Output" panel
- Verify the extension supports MCP servers
- Try disabling/re-enabling the extension
If you're still having issues:
-
Check the server health (server mode):
curl http://localhost:8080/health
-
Enable debug logging:
mcp-pdf-reader -mode=server -loglevel=debug -pdfdir=./docs
-
Create a minimal test case:
mkdir test-mcp cd test-mcp echo "Test content" > test.pdf # Not a real PDF, but tests basic functionality mcp-pdf-reader -mode=server -pdfdir=.
-
Open an issue on GitHub with:
- Your operating system
- Go version (
go version) - Editor/tool being used
- Complete error messages
- Configuration file contents
# Install dependencies
make deps
# Run tests
make test
# Run tests with coverage
make test-coverage
# Build for development
make build
# Run development server
make run
# Run in server mode
make run-server# Format code
make fmt
# Run linter (requires golangci-lint)
make lint
# Cross-compile for all platforms
make build-allmcp-pdf-reader/
βββ cmd/mcp-pdf-reader/ # Main application entry point
βββ internal/
β βββ config/ # Configuration management
β βββ mcp/ # MCP server implementation
β βββ pdf/ # PDF processing logic
βββ Makefile # Build and development commands
βββ go.mod # Go module definition
βββ README.md # This file
GET /healthReturns server health status and version information.
GET /sse # Server-Sent Events endpoint
POST /message # MCP message endpointWe love contributions! This is an open source project and we welcome contributions from everyone. Whether you're fixing bugs, adding features, improving documentation, or helping with tests - every contribution matters.
- π΄ Fork the repository on GitHub
- πΏ Create a feature branch:
git checkout -b feature/amazing-feature - β¨ Make your changes and add comprehensive tests
- π§ͺ Run the test suite:
make test(ensure all tests pass) - π¨ Format your code:
make fmt - π Update documentation if needed
- π Submit a pull request with a clear description
- π Bug Reports: Found a bug? Open an issue with reproduction steps
- π‘ Feature Requests: Have an idea? We'd love to hear it!
- π Documentation: Help improve our docs and examples
- π§ͺ Testing: Add tests or improve existing ones
- π§ Code: Fix bugs or implement new features
- π Translation: Help make this accessible to more people
- Write clear, documented code
- Add tests for new functionality
- Follow Go best practices and idioms
- Keep pull requests focused and atomic
- Be respectful and constructive in discussions
- Memory Efficient: Streaming PDF processing with configurable limits
- Fast Search: Optimized file system traversal and indexing
- Concurrent Safe: Handle multiple requests simultaneously
- Resource Limits: Configurable file size limits and timeouts
- Input Validation: Comprehensive validation of all inputs
- Path Sanitization: Prevents directory traversal attacks
- File Size Limits: Configurable limits to prevent resource exhaustion
- Secure Defaults: Safe configuration out of the box
- Automated Security Scanning: Continuous security analysis with gosec
This project uses gosec for automated security scanning of Go code. Security scans are automatically run on every pull request and release.
# Install gosec
go install github.com/securego/gosec/v2/cmd/gosec@latest
# Run security scan
make gosec
# Or run directly with gosec
gosec -conf .gosec.json ./...Security scanning is configured via .gosec.json with:
- Customized rules for Go security best practices
- Exclusions for test files and false positives
- Integration with GitHub Security tab via SARIF reports
This project is licensed under the MIT License - see the LICENSE file for details.
This project is proudly open source and maintained by contributors from around the world. We believe in the power of community-driven development to create better tools for everyone.
- π¬ Discussions: Share ideas and get help in GitHub Discussions
- π Issues: Report bugs or request features in GitHub Issues
- π Contributors: Check out our amazing contributors
- π Open: Transparent development and decision-making
- π€ Inclusive: Welcoming to all contributors regardless of experience level
- π Quality: Maintaining high standards through testing and code review
- π Documentation: Keeping documentation up-to-date and comprehensive
Rude Company LLC is building innovative AI-powered development tools and open source solutions. We create intelligent systems that enhance developer productivity and enable seamless human-AI collaboration.
A3T is brought to you by Rude Company LLC and focuses on AI development tools and automation.
- Website: https://rude.la
- A3T Project GitHub: https://github.com/a3tai
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Support: For support, please use GitHub Issues
Built with β€οΈ by Rude Company LLC.