DoclingNet

A complete .NET port of Docling for converting documents (images, PDFs) to structured Markdown using state-of-the-art AI/ML models.

Features

Layout Detection - Automatic document layout analysis using Heron ONNX model
OCR Text Extraction - Powered by EasyOCR with CRAFT detection and CRNN recognition
Table Structure Recognition - Advanced table analysis with TableFormer (TorchSharp)
Markdown Export - Clean, structured Markdown output
Unified SDK - Simple, single-entry-point API
CLI Tool - Command-line interface for batch processing
Performance - Optimized pipeline with intelligent image reuse and pre-allocation

Quick Start

Prerequisites

.NET 9.0 SDK
Operating System: Windows, macOS, or Linux

Installation

git clone https://github.com/yourusername/doclingnet.git
cd doclingnet
git submodule update --init --recursive
dotnet build

Basic Usage (SDK)

using DoclingNetSdk;

// 1. Create configuration (auto-detects model paths)
var config = DoclingConfiguration.CreateDefault();

// 2. Initialize converter
using var converter = new DoclingConverter(config);

// 3. Convert image to markdown
var result = await converter.ConvertImageAsync("document.png");

// 4. Use the results
Console.WriteLine(result.Markdown);
File.WriteAllText("output.md", result.Markdown);

// Access statistics
Console.WriteLine($"Layout elements: {result.LayoutElementCount}");
Console.WriteLine($"OCR elements: {result.OcrElementCount}");
Console.WriteLine($"Tables: {result.TableCount}");

CLI Usage

# Convert a single document
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png

# With custom output and options
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png \
  --output result.md \
  --tableformer Accurate \
  --language en \
  --verbose

Architecture

DoclingNet implements a multi-stage AI/ML pipeline:

┌─────────────────┐
│  Input Image    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  1. Layout      │  Heron ONNX Model
│     Detection   │  Identifies regions (title, paragraph, table, etc.)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  2. Full-Page   │  EasyOCR (CRAFT + CRNN)
│     OCR         │  Extracts all text with bounding boxes
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  3. Table       │  TableFormer (TorchSharp)
│     Structure   │  Analyzes table structure (cells, spans)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  4. Document    │  DoclingDocument builder
│     Building    │  Assembles structured document
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  5. Markdown    │  MarkdownDocSerializer
│     Export      │  Exports to Markdown format
└─────────────────┘

Key Components

DoclingConverter - Main SDK entry point
LayoutSdk - Document layout detection (Heron ONNX)
EasyOcrNet - OCR text extraction (CRAFT detection + CRNN recognition)
TableFormerTorchSharpSdk - Table structure recognition (Transformer-based)
Docling.Core - Document models (DoclingDocument, DocItem, etc.)
Docling.Export - Markdown serialization

Configuration

DoclingConfiguration

public sealed class DoclingConfiguration
{
    // Directory for model cache and temporary files
    // Default: ./artifacts
    public string ArtifactsPath { get; set; } = "./artifacts";

    // OCR language code (e.g., "en", "it", "fr", "de")
    // Default: "en"
    public string OcrLanguage { get; set; } = "en";

    // Enable/disable table structure recognition
    // Default: true
    public bool EnableTableRecognition { get; set; } = true;

    // Enable/disable OCR text extraction
    // Default: true
    public bool EnableOcr { get; set; } = true;

    // TableFormer model variant: Fast, Base, or Accurate
    // Default: Accurate (best quality, only +4.8% slower than Fast)
    public TableFormerVariant TableFormerVariant { get; set; } = TableFormerVariant.Accurate;

    // Factory method with default settings
    public static DoclingConfiguration CreateDefault();
}

TableFormer Variants

Variant	Speed	Quality	Use Case
Fast	~300ms/table	Good	High-volume processing
Base	~500ms/table	Better	Balanced use cases
Accurate	~1s/table	Best	Quality-critical applications

Based on benchmarks, Accurate is the recommended default as it provides significantly better table recognition (correct headers, accurate cell detection) with only a 4.8% performance penalty over Fast mode.

Advanced Usage

Custom Configuration

var config = new DoclingConfiguration
{
    ArtifactsPath = "/path/to/models",
    OcrLanguage = "it",  // Italian
    EnableTableRecognition = true,
    EnableOcr = true,
    TableFormerVariant = TableFormerVariant.Accurate
};

// Optional: Pass custom logger
using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.AddConsole();
    builder.SetMinimumLevel(LogLevel.Debug);
});
var logger = loggerFactory.CreateLogger<DoclingConverter>();

using var converter = new DoclingConverter(config, logger);
var result = await converter.ConvertImageAsync("document.png");

Batch Processing

var imagePaths = Directory.GetFiles("documents", "*.png");
var results = await converter.ConvertImagesAsync(imagePaths);

foreach (var (path, result) in results)
{
    var mdPath = Path.ChangeExtension(path, ".md");
    await File.WriteAllTextAsync(mdPath, result.Markdown);
    Console.WriteLine($"Processed: {path} -> {mdPath}");
}

Working with Structured Documents

var result = await converter.ConvertImageAsync("document.png");

// Access structured document
var document = result.Document;

// Iterate through all items
foreach (var item in document.Items)
{
    Console.WriteLine($"{item.Kind}: {item.Label}");

    if (item is TableItem table)
    {
        Console.WriteLine($"  Rows: {table.RowCount}, Cols: {table.ColumnCount}");
        foreach (var cell in table.Cells)
        {
            Console.WriteLine($"  Cell [{cell.RowIndex},{cell.ColumnIndex}]: {cell.Text}");
        }
    }
    else if (item is ParagraphItem paragraph)
    {
        Console.WriteLine($"  Text: {paragraph.Text}");
    }
}

Cancellation Support

using var cts = new CancellationTokenSource();
cts.CancelAfter(TimeSpan.FromMinutes(5));

try
{
    var result = await converter.ConvertImageAsync("document.png", cts.Token);
}
catch (OperationCanceledException)
{
    Console.WriteLine("Conversion cancelled");
}

CLI Reference

Command-Line Options

docling-cli <image-path> [options]

Option	Description	Default
`--output <path>`	Output markdown file path	Same as input with .md extension
`--artifacts <path>`	Artifacts directory for models	`./artifacts`
`--language <code>`	OCR language code (en, it, fr, de, etc.)	`en`
`--no-ocr`	Disable OCR text extraction	OCR enabled
`--no-tables`	Disable table recognition	Tables enabled
`--tableformer <variant>`	TableFormer variant: Fast, Base, Accurate	`Accurate`
`--verbose`	Enable verbose logging	Info level

CLI Examples

# Simple conversion
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png

# Custom output location
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png --output results/doc.md

# Italian OCR with verbose logging
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png \
  --language it \
  --verbose

# Fast mode for high-volume processing
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png \
  --tableformer Fast

# Layout-only (no OCR or tables)
dotnet run --project src/Docling.Cli/Docling.Cli.csproj document.png \
  --no-ocr \
  --no-tables

Models

All AI/ML models are automatically downloaded on first use and cached in the artifacts directory.

Layout Detection Model

Model: Heron ONNX
Size: ~150MB
Location: Auto-detected from submodule src/submodules/ds4sd-docling-layout-heron-onnx/models/
Purpose: Identifies document regions (title, paragraph, table, figure, etc.)

OCR Models

Detection Model: CRAFT (Character Region Awareness For Text)
- Size: ~79MB
- Downloaded from GitHub releases
Recognition Model: CRNN (Convolutional Recurrent Neural Network)
- Size: ~14MB per language
- Downloaded from GitHub releases
Purpose: Text extraction from document regions

TableFormer Models

Fast Variant: ~30MB
Base Variant: ~60MB
Accurate Variant: ~120MB
Location: Downloaded from Hugging Face on first use
Purpose: Table structure recognition (rows, columns, cell spans)

Performance

Benchmark Results (DocLayNet Dataset)

Comprehensive benchmark on 20 diverse document pages from DocLayNet:

Metric	Value
Average time per page	9.75s ± 3.25s
Median time	10.00s
Fastest page	4.00s
Slowest page	17.00s
Configuration	TableFormer Accurate, OCR enabled
Hardware	AMD EPYC 7763 @ 2.45GHz (single core)
Total measurements	100 (20 files × 5 runs, warmup excluded)

Performance Distribution:

Fast (<7s): 17.3% of pages
Medium (7-12s): 56.1% of pages
Slow (≥12s): 26.5% of pages

Detailed Timing Breakdown

Typical processing components per page:

Operation	Time	Notes
Layout Detection	~400-500ms	ONNX CPU runtime (Heron model)
Full-Page OCR	~2-4s	Depends on text density (EasyOCR)
Table Recognition (Accurate)	~1-2s per table	TableFormer with structure analysis
Document Building	<100ms	Assembly and indexing
Markdown Export	<50ms	Serialization
Total	~9.75s avg	Measured on DocLayNet dataset

TableFormer Mode Comparison

Mode	Avg Time	Quality	Use Case
Fast	~8-9s	Good	High-volume batch processing
Accurate	~9-10s	Excellent	Default - best quality/speed ratio

Performance Notes

Consistent performance: Standard deviation of only 3.25s indicates reliable throughput
First run takes longer due to model downloads (~250MB total)
90% of time is spent in ML inference (ONNX Runtime, TorchSharp)
GPU acceleration can significantly reduce inference time
7% outliers removed from statistics (system interference, anomalies)

Optimization Tips

Use Fast variant for high-volume processing where quality is less critical
Disable OCR (EnableOcr = false) if you only need layout structure
Disable tables (EnableTableRecognition = false) if document contains no tables
Process multiple documents in parallel using Task.WhenAll
Consider GPU acceleration for production deployments

Markdown Quality

DoclingNet produces clean, high-quality markdown output that accurately replicates Python Docling's behavior.

Quality Validation (DocLayNet Dataset)

Tested on 20 diverse document pages from the DocLayNet dataset:

Metric	Result
Placeholder artifacts	0 ✅
Average word count	711.9 words/page
Table markers detected	24.6/page (average)
Clean output rate	100%

Correct Handling of Document Elements

DoclingNet properly handles all document element types:

✅ Elements with text content: Extracted and included in markdown

Paragraphs, titles, section headers
List items, code blocks
Table cells with text

✅ Elements without text content: Skipped (matching Python Docling)

Page headers/footers without text
Empty picture regions
Decorative elements

✅ No placeholder artifacts:

No [Page-header] or [Page-footer] placeholders
No [Picture] or [Text] markers
Clean, readable markdown output

Example Output Quality

Input: Complex document page with tables, headers, and mixed content

DoclingNet Output:

LIVING ROOM

2019 IKEA CATALOGUE

01  PH151988 LANDSKRONA 3-seat sofa 5729

02  PH152639 EKTORP armchair 5405

Key Features:

Clean text extraction
No unnecessary placeholders
Proper whitespace handling
Table structure preserved (when tables present)

For detailed quality analysis, see benchmark_results/doclingnet_quality_analysis.md.

Project Structure

DoclingNet/
├── src/
│   ├── DoclingNetSdk/              # Main SDK entry point
│   │   ├── DoclingConverter.cs     # Primary conversion class
│   │   ├── DoclingConfiguration.cs # Configuration
│   │   └── DoclingConversionResult.cs
│   │
│   ├── Docling.Core/               # Core document models
│   │   ├── DoclingDocument.cs      # Document representation
│   │   ├── DocItem.cs              # Base item class
│   │   ├── ParagraphItem.cs        # Text paragraphs
│   │   ├── TableItem.cs            # Tables
│   │   └── ...
│   │
│   ├── Docling.Export/             # Export functionality
│   │   └── Serialization/
│   │       └── MarkdownDocSerializer.cs
│   │
│   ├── Docling.Backends/           # Input backends
│   │   └── ImageBackend.cs         # Image processing
│   │
│   ├── Docling.Cli/                # Command-line tool
│   │   └── Program.cs
│   │
│   └── submodules/                 # AI/ML libraries
│       ├── ds4sd-docling-layout-heron-onnx/
│       ├── easyocrnet/
│       └── ds4sd-docling-tableformer-onnx/
│
├── dataset/                        # Test datasets
│   ├── groundtruth/                # Ground truth samples
│   └── golden/                     # Golden outputs
│
└── docs/                           # Documentation

API Reference

DoclingConverter

Main converter class for document processing.

Constructor:

public DoclingConverter(
    DoclingConfiguration config,
    ILogger? logger = null)

Methods:

// Convert single image
Task<DoclingConversionResult> ConvertImageAsync(
    string imagePath,
    CancellationToken cancellationToken = default)

// Convert multiple images
Task<Dictionary<string, DoclingConversionResult>> ConvertImagesAsync(
    IEnumerable<string> imagePaths,
    CancellationToken cancellationToken = default)

DoclingConversionResult

Result object containing converted document and statistics.

Properties:

public DoclingDocument Document { get; }       // Structured document
public string Markdown { get; }                // Markdown export
public int LayoutElementCount { get; }         // Layout elements detected
public int OcrElementCount { get; }            // OCR elements processed
public int TableCount { get; }                 // Tables found
public int TotalItems { get; }                 // Total document items

DoclingDocument

Structured document representation.

Properties:

public IReadOnlyList<DocItem> Items { get; }   // All document items
public IReadOnlyList<PageReference> Pages { get; }  // Page references
public string Id { get; }                      // Document ID
public string SourceId { get; }                // Source file ID

DocItem Types

ParagraphItem - Text paragraphs
- string Text - Paragraph text content
- string Label - Item label (e.g., "paragraph", "title")
TableItem - Tables with structure
- int RowCount - Number of rows
- int ColumnCount - Number of columns
- IReadOnlyList<TableCellItem> Cells - Table cells
TableCellItem - Individual table cells
- string Text - Cell text content
- int RowIndex - Row position (0-based)
- int ColumnIndex - Column position (0-based)
- int RowSpan - Row span (default: 1)
- int ColumnSpan - Column span (default: 1)
PictureItem - Images and figures
CaptionItem - Image/table captions

Troubleshooting

Model Not Found

Error: "Layout model not found at path: ..."

Solution: Ensure submodules are initialized:

git submodule update --init --recursive

The layout model should exist at:

src/submodules/ds4sd-docling-layout-heron-onnx/models/heron-converted.onnx

Out of Memory

Error: OutOfMemoryException during processing

Solutions:

Disable table recognition: EnableTableRecognition = false
Use Fast variant: TableFormerVariant = TableFormerVariant.Fast
Process pages individually instead of batch
Increase available memory

Slow Processing

Issue: Conversion takes too long

Solutions:

Use Fast variant for TableFormer
Disable OCR if only layout is needed
Consider GPU acceleration
Process multiple documents in parallel

Poor Table Quality

Issue: Table structure not recognized correctly

Solutions:

Use Accurate variant (default): TableFormerVariant = TableFormerVariant.Accurate
Ensure image quality is good (high resolution, clear text)
Check that tables have visible borders
Verify OCR is enabled (required for table text extraction)

OCR Not Detecting Text

Issue: No text extracted from regions

Solutions:

Verify EnableOcr = true in configuration
Check language setting matches document language
Ensure image resolution is sufficient (min 150 DPI recommended)
Verify text is not too small or blurry

Development

Building from Source

# Clone repository with submodules
git clone --recursive https://github.com/yourusername/doclingnet.git
cd doclingnet

# Build entire solution
dotnet build DoclingNet.sln

# Run tests
dotnet test

# Build specific project
dotnet build src/DoclingNetSdk/DoclingNetSdk.csproj

Running Tests

# Run all tests
dotnet test

# Run with verbose output
dotnet test --logger "console;verbosity=detailed"

# Run specific test project
dotnet test tests/DoclingNetSdk.Tests/

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Related Projects

Docling (Python) - Original Python implementation
docling-layout-heron-onnx - Layout detection model
docling-tableformer-onnx - Table structure recognition
EasyOCR - Original Python OCR library

License

MIT License - see LICENSE file for details.

Credits

DoclingNet is built on top of excellent open-source projects:

Docling - Original Python implementation by DS4SD
Heron - Document layout detection model
TableFormer - Table structure recognition transformer
EasyOCR - OCR engine (ported to .NET as EasyOcrNet)
TorchSharp - .NET bindings for PyTorch
ONNX Runtime - Cross-platform ML inference

Special thanks to the DS4SD team at IBM Research for the original Docling implementation and pre-trained models.

Citation

If you use DoclingNet in your research or project, please cite both DoclingNet and the original Docling:

@software{doclingnet2024,
  title = {DoclingNet: .NET Port of Docling Document Conversion},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/doclingnet}
}

@software{docling2024,
  title = {Docling Technical Report},
  author = {Deep Search Team},
  year = {2024},
  url = {https://github.com/DS4SD/docling}
}

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: docs/

Changelog

See docs/progress.md for detailed development progress and version history.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.claude		.claude
artifacts/model_artifacts/tableformer/fast		artifacts/model_artifacts/tableformer/fast
dataset		dataset
docs		docs
eng		eng
models/hf-cache		models/hf-cache
results		results
src		src
tests/Docling.Tests		tests/Docling.Tests
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
Directory.Build.props		Directory.Build.props
DoclingNet.sln		DoclingNet.sln
NuGet.config		NuGet.config
README.md		README.md
coverlet.runsettings		coverlet.runsettings
extract_text_from_layout.py		extract_text_from_layout.py
test_easyocr_download.cs		test_easyocr_download.cs

mapo80/doclingnet

Folders and files

Latest commit

History

Repository files navigation

DoclingNet

Features

Quick Start

Prerequisites

Installation

Basic Usage (SDK)

CLI Usage

Architecture

Key Components

Configuration

DoclingConfiguration

TableFormer Variants

Advanced Usage

Custom Configuration

Batch Processing

Working with Structured Documents

Cancellation Support

CLI Reference

Command-Line Options

CLI Examples

Models

Layout Detection Model

OCR Models

TableFormer Models

Performance

Benchmark Results (DocLayNet Dataset)

Detailed Timing Breakdown

TableFormer Mode Comparison

Performance Notes

Optimization Tips

Markdown Quality

Quality Validation (DocLayNet Dataset)

Correct Handling of Document Elements

Example Output Quality

Project Structure

API Reference

DoclingConverter

DoclingConversionResult

DoclingDocument

DocItem Types

Troubleshooting

Model Not Found

Out of Memory

Slow Processing

Poor Table Quality

OCR Not Detecting Text

Development

Building from Source

Running Tests

Contributing

Related Projects

License

Credits

Citation

Support

Changelog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages