Lists (3)
Sort Name ascending (A-Z)
Stars
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization
Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding
"Single-image Layer Decomposition for Anime Characters" (SIGGRAPH 2026, Conditionally Accepted)
High-Quality Voice Cloning TTS for 600+ Languages
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]
Sharp Monocular View Synthesis in Less Than a Second
CVPR 2025🎉 Offical repository for "TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts"
MegaFlow: Zero-Shot Large Displacement Optical Flow
PixelSmile: Fine-grained facial expression editing with continuous control, reduced semantic entanglement, and strong identity preservation.
AutoGaze automatically removes redundant patches in a video, reducing #tokens in ViT/MLLM by 4x-100x.
Mount Hugging Face Buckets and repos as local filesystems. No download, no copy, no waiting.
[ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning.
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video.
Any2Full: Prompting Depth Anything for Depth Completion in One Stage
DVD: Deterministic Video Depth Estimation with Generative Priors
Helios: Real Real-Time Long Video Generation Model
CVPR 2026 - Generative Quanta Image Reconstruction
[CVPR 2026🔥] 🧑🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator that produces Lottie JSONs.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Real-time text-to-speech with Qwen3-TTS
[CVPR 2026] Official code and models for Video Encoder-only Mask Transformer (VidEoMT).
Give your agents the power of the Hugging Face ecosystem
Official implementation of "Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion"
A real-time and multilingual speech translation model
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis



