Stars
Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
Data browser based on s3. 一个基于 S3 的数据(json / jsonl / parquet / html / md等)可视化工具。👇 Try online.
A deep learning library built from scratch with complex neural networks examples built on top for learning purposes.
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
OpenMMLab Foundational Library for Training Deep Learning Models
PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
Data annotation toolbox supports image, audio and video data.
Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
A tool for extracting plain text from Wikipedia dumps
Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.
A series of demos to show how chromium is constructed.
A Web UI for Elasticsearch and OpenSearch: Import, browse and edit data with rich filters and query views, create reference search UIs.
Out-of-the-box Annotation Toolbox
OpenMMLab FewShot Learning Toolbox and Benchmark
A Progressive Web App for local file sharing
Copier for golang, copy value from struct to struct and more
A cloud-native Go microservices framework with cli tool for productivity.
Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.


