[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译，支持 Google/DeepL/Ollama/OpenAI 等服务，提供 CLI/GUI/MCP/Docker/Zotero

Python 31,686 2,858 Updated Nov 25, 2025

microsoft / generative-ai-for-beginners

21 Lessons, Get Started Building with Generative AI

Jupyter Notebook 106,089 56,832 Updated Feb 3, 2026

666ghj / BettaFish

微舆：人人可用的多Agent舆情分析助手，打破信息茧房，还原舆情原貌，预测未来走向，辅助决策！从0实现，不依赖任何框架。

Python 35,280 6,768 Updated Jan 20, 2026

michaelyin / luke-3.5.0

tool for lucene index, lucene 3.5.0 or 3.6.0

Java 1 Updated Sep 21, 2016

luceneplusplus / LucenePlusPlus

Lucene++ is an up to date C++ port of the popular Java Lucene library, a high-performance, full-featured text search engine.

C++ 783 239 Updated Jan 25, 2026

twelfth-star / universal-font-recognition

Recognize font from image using DeepFont technique.

Python 14 3 Updated May 9, 2023

yh-hust / PDF-Wukong

【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

127 4 Updated Jun 4, 2025

elsejj / mcp-cn-a-stock

这是一个为大模型提供 A 股数据的的 MCP(Model Content Protocol) 服务。

Python 402 68 Updated Dec 15, 2025

0voice / Awesome_Qt_Learning

2025年 qt 开发最新总结，提供全面的 qt 开发学习资源，涵盖从基础知识到实战项目的资料、文献、书籍、项目和示例，帮助你快速入门并逐步进阶，持续更新维护中！

692 89 Updated Sep 9, 2025

assimp / assimp

The official Open-Asset-Importer-Library Repository. Loads 40+ 3D-file-formats into one unified and clean data structure.

C++ 12,702 3,138 Updated Feb 4, 2026

infiniflow / ragflow

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Python 72,774 8,056 Updated Feb 4, 2026

corpus-solutions / tsa-server

portable Time Stamp Server (over HTTP)

Groovy 51 5 Updated Oct 17, 2017

PaddlePaddle / PaConvert

PaddlePaddle Code Convert Toolkit. 『飞桨』深度学习代码转换工具

Python 122 92 Updated Feb 3, 2026

hgoldfish / qtnetworkng

QtNetwork Next Generation. A coroutine based network framework for Qt/C++, with more simpler API than boost::asio.

C 282 59 Updated Jan 13, 2026

NanoNets / docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

Python 1,851 136 Updated Aug 25, 2025

InternScience / ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Python 248 20 Updated Sep 26, 2024

docling-project / docling

Get your documents ready for gen AI

Python 52,114 3,567 Updated Feb 4, 2026

yujunhuics / LayoutReader

阅读顺序、Layoutreader

Python 19 4 Updated May 8, 2025

CaseDrive / publaynet-models

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

Python 29 2 Updated Apr 16, 2023

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,779 553 Updated Jul 11, 2024

bytedance / Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 8,777 743 Updated Dec 17, 2025

TapXWorld / ChinaTextbook

所有小初高、大学PDF教材。

Roff 64,809 14,458 Updated Oct 18, 2025

apple / ml-fastvlm

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,192 538 Updated May 5, 2025

datalab-to / marker

Convert PDF to markdown + JSON quickly with high accuracy

Python 31,434 2,156 Updated Jan 31, 2026

RapidAI / TableStructureRec

整理目前开源的最优表格识别模型，完善前后处理，模型转换为ONNX | Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.

Python 920 80 Updated Aug 3, 2025

THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]

Python 11,211 1,173 Updated Mar 14, 2025

HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

TypeScript 26,356 3,367 Updated Feb 4, 2026

kamranahmedse / developer-roadmap

Interactive roadmaps, guides and other educational content to help developers grow in their careers.

TypeScript 348,557 43,695 Updated Feb 4, 2026