- Earth
Document
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
A curated list of resources dedicated to table recognition
CDLA: A Chinese document layout analysis (CDLA) dataset
An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowe…
Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community
Given a scholarly PDF, extract figures, tables, captions, and section titles.
CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包
Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
FormulaNet is a new large-scale Mathematical Formula Detection dataset.
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection(公式检测冠军方案)
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
The ICDAR 2019 cTDaR is to evaluate the performance of methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables a…
Document Artifical Intelligence
End-to-end neural table-text understanding models.
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model


