Starred repositories
🎓 Update Talking-Face Research Papers Daily
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Quizz App Using Custom Diffusion Model to create deep fake of Satelite Earth image
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
[CVPR 2024] U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
[CVPR 2024] Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework.
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
[CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Production-ready platform for agentic workflow development.
A generative speech model for daily dialogue.
InstantID-ROME: Improved Identity-Preserving Generation in Seconds 🔥
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
A collection of resources on controllable generation with text-to-image diffusion models.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
[AAAI 2025] Follow-Your-Click: This repo is the official implementation of "Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts"
Mora: More like Sora for Generalist Video Generation
Open-Sora: Democratizing Efficient Video Production for All
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[CSUR] A Survey on Video Diffusion Models
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)