Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 70,367 9,800 Updated Feb 6, 2026

ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Python 56,795 17,435 Updated Feb 3, 2026

ageitgey / face_recognition

The world's simplest facial recognition api for Python and the command line

Python 56,102 13,715 Updated Aug 21, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 54,810 5,994 Updated Dec 30, 2025

ultralytics / ultralytics

Ultralytics YOLO 🚀

Python 53,049 10,156 Updated Feb 7, 2026

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 44,476 5,954 Updated Aug 16, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 38,672 4,206 Updated Jan 18, 2026

XingangPan / DragGAN

Official Code for DragGAN (SIGGRAPH 2023)

Python 35,977 3,442 Updated May 18, 2024

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 35,910 4,005 Updated Apr 19, 2025

xinntao / Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Python 34,257 4,260 Updated Aug 6, 2024

open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark

Python 32,382 9,849 Updated Aug 21, 2024

deezer / spleeter

Deezer source separation library including pretrained models.

Python 28,028 3,068 Updated Apr 2, 2025

deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project

Python 27,812 5,919 Updated Feb 2, 2026

matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Python 25,515 11,712 Updated Jun 7, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,434 2,728 Updated Aug 12, 2024

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 22,981 2,503 Updated Feb 7, 2026

danielgatis / rembg

Rembg is a tool to remove images background

Python 21,796 2,224 Updated Feb 3, 2026

w-okada / voice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer

Python 19,670 2,237 Updated Aug 24, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,714 3,328 Updated Feb 7, 2026

albumentations-team / albumentations

Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125

Python 15,269 1,705 Updated Jun 25, 2025

LlamaFamily / Llama-Chinese

Llama中文社区，实时汇总最新Llama学习资料，构建最好的中文Llama大模型开源生态，完全开源可商用

Python 14,746 1,304 Updated Apr 6, 2025

aleju / imgaug

Image augmentation for machine learning experiments.

Python 14,727 2,468 Updated Jul 30, 2024

davidsandberg / facenet

Face recognition using Tensorflow

Python 14,300 4,804 Updated Jul 24, 2023

PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Python 14,069 3,013 Updated Oct 10, 2025

OpenTalker / SadTalker

[CVPR 2023] SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Python 13,585 2,602 Updated Jun 26, 2024

facebookresearch / AnimatedDrawings

Code to accompany "A Method for Animating Children's Drawings of the Human Figure"

Python 12,759 1,148 Updated Sep 3, 2025

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…

Python 12,523 1,951 Updated Jan 27, 2026

instantX-research / InstantID

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,908 880 Updated Jul 18, 2024

alexjc / neural-enhance

Super Resolution for images using deep learning.

Python 11,888 1,370 Updated Dec 29, 2020

Zheng Li pango99

Lists (32)

3DPose

3D目标检测

ai toy

AIGame

AI绘图

GIBHUB代理

GL_DX_InterOP

Live2D

Mocap

NDI

NERF

TensorRT

Text->Image

tracking

TRT Plugin

UE_Plugin

Unity

VRoid

人脸检测

体育检测

图像变化检测

图像拼接

多目标跟踪

慢动作

手部检测

数据可视化

流媒体

深度估计

视频抠像

视频编解码

语音

超分辨率

Starred repositories

slow-motion

video-frame-interpolation