ocrbase

Turn PDFs into structured data at scale. Powered by frontier open-weight OCR models with a type-safe TypeScript SDK.

Features

Best-in-class OCR - PaddleOCR-VL-0.9B for accurate text extraction
Structured extraction - Define schemas, get JSON back
Built for scale - Queue-based processing for thousands of documents
Type-safe SDK - Full TypeScript support with React hooks
Real-time updates - WebSocket notifications for job progress
Self-hostable - Run on your own infrastructure

Quick Start

npm install ocrbase

import { createClient } from "ocrbase";

const { parse, extract } = createClient({
  baseUrl: "https://your-instance.com",
  apiKey: "ak_xxx",
});

// Parse document to markdown
const job = await parse({ file: document });
console.log(job.markdownResult);

// Extract structured data
const job = await extract({
  file: invoice,
  hints: "invoice number, date, total, line items",
});
console.log(job.jsonResult);

See SDK documentation for React hooks and advanced usage.

Self-Hosting

See Self-Hosting Guide for deployment instructions.

Requirements: Docker, Bun

Architecture

License

MIT - See LICENSE for details.

Contact

For API access, on-premise deployment, or questions: adammajcher20@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.claude		.claude
.vscode		.vscode
apps		apps
docs		docs
examples		examples
packages		packages
spec		spec
.env.example		.env.example
.gitignore		.gitignore
.oxfmtrc.jsonc		.oxfmtrc.jsonc
.oxlintrc.json		.oxlintrc.json
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
lefthook.yml		lefthook.yml
package.json		package.json
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocrbase

Features

Quick Start

Self-Hosting

Architecture

License

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

majcheradam/ocrbase

Folders and files

Latest commit

History

Repository files navigation

ocrbase

Features

Quick Start

Self-Hosting

Architecture

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages