Named Entity Recognition (NER) for invoice processing using LayoutLMv3 with LoRA fine-tuning. Extract invoice numbers and key information from invoice images.
- π€ Hybrid Extraction Pipeline - Combines fast heuristic pattern matching with deep learning fallback (LayoutLMv3 & Gemini 2.5 Flash)
- π― LayoutLMv3 with LoRA - Efficient fine-tuning on multimodal document understanding
- π Dual Interface - REST API for programmatic access + Gradio UI for interactive use
- π Production Ready - Comprehensive test suite (107 tests), Docker support, health checks
- π Multi-Format Support - Accepts TXT and JSON OCR data formats
- β‘ ONNX Support - Optimized inference with ONNX Runtime (FP32/FP16/INT8)
- π Benchmarking - Compare models (LayoutLMv3, Gemini, ONNX) with W&B integration
- π§ Device Flexible - Runs on CPU, CUDA (NVIDIA), or MPS (Apple Silicon)
- π Interactive Docs - Auto-generated Swagger/ReDoc API documentation
invoice-ner/
βββ app.py # Main FastAPI application
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Docker image definition
βββ pyproject.toml # Python project configuration & dependencies
βββ setup.sh # Development environment setup script
βββ .env.example # Environment variables template
βββ uv.lock # Lock file for reproducible installs
β
βββ data/ # Dataset and labeling tools
β βββ app.py # Streamlit labeling application
β βββ scripts/ # Data processing utilities
β β βββ create_dataframe.py # Creates DataFrame from labeled data
β β βββ validate_labels.py # Validates label quality
β βββ SROIE2019/ # Invoice dataset (train/test images & OCR)
β βββ labels.json # Training data labels
β βββ test_labels.json # Test data labels
β
βββ models/ # Model files and checkpoints
β βββ artifacts/ # Exported models (ONNX, etc.)
β βββ layoutlmv3-lora-invoice-number/ # Fine-tuned LoRA adapter
β βββ adapter_config.json
β βββ adapter_model.safetensors
β βββ ...
β
βββ triton_model_repo/ # Triton Inference Server model repository
β βββ ...
β
βββ notebooks/ # Jupyter notebooks for experimentation
β βββ 01_heuristics.ipynb # Heuristic-based extraction
β βββ 02_labeling.ipynb # Data labeling analysis
β βββ 03_inference.ipynb # Model inference testing
β βββ 04_postprocess.ipynb # Post-processing experiments
β βββ 05_evaluations.ipynb # Evaluation metrics and analysis
β
βββ benchmarks/ # Benchmarking suite
β βββ models/ # Model wrappers (Gemini, ONNX, etc.)
β βββ benchmark_results/ # Benchmark run results
β βββ benchmark.py # Main benchmark script
β βββ README.md # Benchmarking documentation
β
βββ scripts/ # Utility scripts
β βββ preprocess.py # Data preprocessing utilities
β βββ export_to_onnx.py # ONNX export script
β βββ setup_triton_repo.py # Triton repo setup script
β βββ train.py # Model training script
β
βββ src/ # Core application modules
β βββ __init__.py
β βββ api.py # FastAPI endpoints
β βββ gradio_ui.py # Gradio interface
β βββ inference.py # Model inference logic
β βββ heuristics.py # Pattern-based extraction
β βββ postprocessing.py # Result postprocessing
β βββ validation.py # Input validation
β βββ utils.py # Utility functions
β
βββ docs/ # Additional documentation
β βββ API_USAGE.md # Complete API documentation and examples
β βββ DEV_SETUP.md # Developer setup guide
β βββ TESTING.md # Testing guide and validation
β
βββ tests/ # Test suite
β βββ conftest.py # Shared test fixtures
β βββ test_app.py # Application tests
β βββ test_scripts.py # Script tests
β βββ test_api.py # API endpoint tests
β βββ README.md # Testing documentation
β
βββ LICENSE # MIT License
βββ README.md # This file
src/- Core application modules (API endpoints, inference, UI, validation, utilities)data/- Contains the SROIE2019 dataset and Streamlit labeling tool for annotating invoice imagesmodels/- Stores fine-tuned LoRA adapters and exported ONNX models for deploymentnotebooks/- Jupyter notebooks for experimentation, analysis, and prototypingscripts/- Utility scripts for data preprocessing, model export, and deployment preparationtests/- Comprehensive test suite with 107 tests for production validationdocs/- Documentation for API usage, development setup, testing, and deployment
# 1. Copy environment file (optional)
cp .env.example .env
# Edit .env to customize settings (port, log level, etc.)
# 2. Build and start
docker-compose up -d --build
# 3. Check logs
docker-compose logs -f
# 4. Open browser
open http://localhost:7860
# 5. Stop when done
docker-compose down# 1. Set up virtual environment with uv
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 2. Copy environment file
cp .env.example .env
# 3. Install dependencies
uv pip install -e .
# 4. Run the app (automatically loads .env)
python app.py
# 5. Open browser
open http://localhost:7860- Docker (>= 20.10) and Docker Compose (>= 2.0) - for containerized deployment
- Python (>= 3.10) - for local development
- uv - fast Python package installer (installation guide)
- 8GB RAM minimum (16GB recommended)
- Model files in
models/layoutlmv3-lora-invoice-number/
Ensure these exist before running:
models/
βββ layoutlmv3-lora-invoice-number/
βββ adapter_config.json
βββ adapter_model.safetensors
βββ ... (other config files)
# Check health endpoint
curl http://localhost:7860/health
# Expected response:
# {"status": "healthy", "model_loaded": true, "device": "cpu"}# Extract invoice number from an invoice
curl -X POST http://localhost:7860/predict \
-F "image=@path/to/invoice.jpg" \
-F "ocr_file=@path/to/ocr_data.json"
# Response:
# {
# "invoice_number": "INV-2023-001234",
# "extraction_method": "heuristic",
# "total_words": 127,
# "model_device": "cpu"
# }For detailed API documentation with code examples in Python, JavaScript, and more, see docs/API_USAGE.md.
The easiest way to configure the application:
-
Copy the example file:
cp .env.example .env
-
Edit
.envto customize settings:# Example: Enable debug logging LOG_LEVEL=DEBUG # Example: Change port PORT=8080 # Example: Use Apple MPS DEVICE=mps
docker-compose up -d
The application supports both local ONNX Runtime (default) and remote Triton Inference Server.
1. Local ONNX (Default) No extra configuration needed.
2. Triton Inference Server
First, create the model repository structure:
python scripts/setup_triton_repo.py --model_path models/layoutlmv3-lora-invoice-numberThen start the server:
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $(pwd)/triton_model_repo:/models \
nvcr.io/nvidia/tritonserver:23.10-py3 \
tritonserver --model-repository=/modelsConfigure .env and run python app.py to use the API:
INFERENCE_BACKEND=triton
TRITON_URL=localhost:8000
TRITON_MODEL_NAME=layoutlmv3-lora-invoice-numberKey variables (see .env.example for all options):
LOG_LEVEL: Logging level (DEBUG,INFO,WARNING,ERROR). Default:INFODEVICE: Device to run on (cpu,cuda, ormps). Default:cpuPORT: Port to expose. Default:7860MODEL_PATH: Path to model directory. Default:models/layoutlmv3-lora-invoice-numberDOCKER_CPU_LIMIT: CPU cores limit. Default:4DOCKER_MEMORY_LIMIT: Memory limit. Default:8G
Override .env values from the command line:
# Override port
PORT=9000 python app.py
# Override multiple variables
LOG_LEVEL=DEBUG DEVICE=cpu PORT=8080 python app.py
# Docker Compose
PORT=9000 docker-compose up# Build and start
docker-compose up -d --build
# View logs
docker-compose logs -f
# Stop
docker-compose down
# Rebuild from scratch
docker-compose down
docker-compose build --no-cache
docker-compose up -dAdjust resource limits in docker-compose.yml or .env:
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4GOr in .env:
DOCKER_CPU_LIMIT=4
DOCKER_MEMORY_LIMIT=8GChange the exposed port in docker-compose.yml:
ports:
- "8080:7860" # Map host port 8080 to container port 7860Or in .env:
PORT=8080The application provides both a Gradio web interface and a REST API:
- URL: http://localhost:7860/
- Features: Drag-and-drop upload, visual preview, no coding required
- Best for: Manual testing, demos, non-technical users
- Interactive docs: http://localhost:7860/docs (Swagger UI)
- Alternative docs: http://localhost:7860/redoc (ReDoc)
- Health check: http://localhost:7860/health
Detailed API Guide: See docs/API_USAGE.md for:
- Complete endpoint documentation
- Request/response formats
- Code examples in Python, JavaScript, cURL
- Error handling and best practices
For development setup, data labeling, and model training, see docs/DEV_SETUP.md. For detailed testing documentation, see docs/TESTING.md.
The repository includes a comprehensive benchmarking suite to evaluate and compare different models:
- Supported Models: LayoutLMv3, Hybrid (Heuristics + Model), ONNX, and Google Gemini 2.5 Flash.
- Metrics: Accuracy, Latency (P50/P95/P99), Fallback Rate, and Human Review Rate.
- Tracking: Integrated with Weights & Biases for experiment tracking.
See benchmarks/README.md for detailed usage instructions.