Node.js server implementing Model Context Protocol (MCP) for PDF text extraction operations. Built for Claude Desktop integration with secure directory access controls.
Based on the patterns from @modelcontextprotocol/server-filesystem.
- Single Tool:
extract_pdf_text- Extract plain text from PDF files - Directory Access: Same security model as filesystem MCP server
- Text Limiting:
max_charsparameter to control output size and token usage - Secure: Path validation and sandboxed directory access
- Fast: Lightweight Node.js implementation using unpdf
Add the following to your mcp server configuration file:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["-y", "@johangorter/mcp-pdf-server", "/Users/username/Desktop"]
}
}
}For more information about Desktop Extensions, see the official MCPB documentation.
- Download the
.mcpbfile from the Releases page. - Navigate to Settings > Extensions in Claude Desktop
- Click "Install Extension" and select the
.mcpbfile
The server provides one main tool:
Extract text content from PDF files with optional character limiting.
Parameters:
path(string, required): Path to PDF file within allowed directoriesmax_chars(number, optional): Maximum characters to return (default: unlimited)
Example:
// Claude will call this tool when you ask:
// "Extract the first 1000 characters from report.pdf"
{
"tool": "extract_pdf_text",
"arguments": {
"path": "reports/quarterly-report.pdf",
"max_chars": 1000
}
}Specify allowed directories as command-line arguments:
npx @johangorter/mcp-pdf-server /path/to/documents /path/to/pdfsThe server supports dynamic directory updates via MCP Roots protocol, enabling runtime directory changes without restart.
Standard MCP error codes:
-32602: Invalid params (file not found, invalid path)-32603: Internal error (PDF parsing failed, file corrupted)
{
"@modelcontextprotocol/sdk": "^1.17.0",
"pdf-parse": "^1.1.1",
"zod-to-json-schema": "^3.23.5"
}- Node.js 22+
- TypeScript
# Clone the repository
git clone https://github.com/johan-gorter/mcp-pdf-server.git
cd mcp-pdf-server
# Install dependencies
npm install
# Build the project
npm run buildThis project uses GitHub Actions for automated testing and deployment:
- CI Pipeline: Runs on every push and pull request
- Tests on Ubuntu and Windows
- Runs linting, formatting checks, and tests
- Release Pipeline: Triggers on package.json version changes
- Creates GitHub releases with MCPB bundle
- Builds and publishes Docker images
# Build TypeScript to JavaScript
npm run build
# Build in watch mode (rebuilds on file changes)
npm run watch
# Run the server locally for development
npm run dev
# Start the compiled server
npm run start
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
# Lint the code
npm run lint
# Fix linting issues
npm run lint:fix
# Format code with Prettier
npm run format
# Check code formatting
npm run format:check
# Clean build artifacts
npm run cleanThe project uses Jest for testing with TypeScript support:
# Run all tests
npm test
# Run tests in watch mode during development
npm run test:watch
# Generate test coverage report
npm run test:coverageTest files are located in src/__tests__/ and follow the pattern *.test.ts.
This project supports creating MCPB (MCP Bundle) files for easy distribution and installation:
# Build the MCPB bundle
npm run build:mcpbThe MCPB bundle includes:
- Compiled TypeScript server (
dist/) - Runtime dependencies (
node_modules/) - Bundle manifest (
manifest.json) - Documentation and license files
MCPB bundles can be installed in Claude Desktop and other MCP-compatible clients with a single click.
Build and run with Docker:
# Build Docker image
docker build -t mcp-pdf-server .
# Run with Docker
docker run -i --rm \
--mount type=bind,src=/path/to/pdfs,dst=/pdfs \
mcp-pdf-server /pdfsOr use the pre-built Docker image:
# Pull from GitHub Container Registry
docker pull ghcr.io/johan-gorter/mcp-pdf-server:latest
# Run the container
docker run -i --rm \
--mount type=bind,src=/path/to/pdfs,dst=/pdfs \
ghcr.io/johan-gorter/mcp-pdf-server:latest /pdfsThe project enforces code quality through:
- TypeScript: Strong typing and compile-time error checking
- ESLint: Code linting with TypeScript-specific rules
- Prettier: Consistent code formatting
- Jest: Comprehensive unit testing
mcp-pdf-server/
├── src/
│ ├── __tests__/ # Test files
│ │ ├── lib.test.ts
│ │ ├── path-utils.test.ts
│ │ └── roots-utils.test.ts
│ ├── index.ts # Main server entry point
│ ├── lib.ts # Core functionality
│ ├── path-utils.ts # Path handling utilities
│ ├── path-validation.ts # Security validation
│ └── roots-utils.ts # MCP roots support
├── dist/ # Compiled JavaScript (generated)
├── coverage/ # Test coverage reports (generated)
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
├── jest.config.cjs # Jest testing configuration
├── .eslintrc.cjs # ESLint configuration
├── .prettierrc # Prettier configuration
├── Dockerfile # Docker build configuration
└── README.md
- Text Only: No image, table, or metadata extraction
- No OCR: Scanned PDFs without embedded text won't work
- Memory: Large PDFs (>100MB) may cause memory issues
- No Concurrent Processing: Processes one PDF at a time
MIT License