Hi, I am Ruihang Chu (储瑞航).

I am a Research Scientist at the Alibaba Wan Team via the “AliStar” talent program. I also hold a joint postdoctoral position at Tsinghua University, supervised by Prof. Yujiu Yang. I received my Ph.D. degree from CUHK, supervised by Prof. Jiaya Jia and Prof. Chi-Wing FU. During my Ph.D. stage, I had the pleasure of working with Prof. Xiaojuan Qi.

I work on Generative AI and Computer Vision. My long-term goal is to enable models to simulate and reason about the visual world. Now I focus on two key directions:

  • 🎬 Video generation: Pre- and post-training of foundation models, native multimodal architecture, controllable generation, and AR models.
  • 🧾 Multimodal language models: VLMs, coding LLMs, and multimodal captioning.

We have developed world-leading video generative foundation models Wan, now including Wan2.1, Wan2.2, Wan2.5 and Wan2.6. Welcome to to try it online, call our APIs, or download the open-source models.

If you are interested in collaboration, feel free to drop me an email.

🔥 News

  • [2025/12]   Wan2.6 is launched. New features: Starring (role-to-video), multi-shot generation, 15s generation length!
  • [2025/12]   Wan-Move tops Hugging Face Weekly Papers (Dec 7–13)!
  • [2025/12]   Mini-Gemini is accepted to T-PAMI!
  • [2025/11]   O-DisCo-Edit is accepted to AAAI 2026 as Oral!
  • [2025/09]   Wan2.5 is launched, the Chinese first commercial-grade video-audio genation model with the native multimodal backbone!
  • [2025/09]   Wan models have generated 70+ million videos online and 40+ million open-source model downloads!
  • [2025/09]   Four papers are accepted to NeurIPS 2025!
  • [2025/08]   Target-DPO is accepted to EMNLP 2025 Main Conference!
  • [2025/07]   Wan2.2 is open-sourced. New MoE architecture with cinematic-level aesthetics!
  • [2025/06]   FreeScale is accepted to ICCV 2025!
  • [2025/05]   InSerter is accepted to ACL 2025 Main Conference!
  • [2025/04]   The survey of reasoning with foundation models is accepted to ACM Computing Surveys!
  • [2025/02]   Wan2.1 is open-sourced!

📝 Selected Publications (Full List)

Technical Report

Wan: Open and Advanced Large-Scale Video Generative Models

Core Contributor

  • Wan2.1: SOTA performance, powerful video VAE.
  • Wan2.2: MoE architecture, cinematic-level aesthetics, hybrid TI2V.
  • Wan2.5: audio-visual sync, native multimodal structure.
  • Wan2.6: role-to-video, multi-shot generation.
NeurIPS 2025
sym

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Ruihang Chu, Yefei He, Zhekai Chen, Shiwei Zhang, Xiaogang Xu, Bin Xia, Dingdong Wang, Hongwei Yi, Xihui Liu, Hengshuang Zhao, Yu Liu, Yingya Zhang, Yujiu Yang

  • Latent Trajectory Guidance: no architecture change on base I2V models.
  • Point-level Control: on par with commercial models.
  • MoveBench Benchmark: diverse, longer durations, rich annotations.
NeurIPS 2023
sym

DiffComplete: Diffusion-based Generative 3D Shape Completion

Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nießner, Chi-Wing Fu, Jiaya Jia

  • Hierarchical Feature Aggregation: 40% decrease on L1 completion error.
  • Occupancy-aware Fusion: Enable multiple partial shapes as the condition.
Arxiv 2025
sym

LongLive: Real-time Interactive Long Video Generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han

  • Real-time: 20.7 FPS generation on a single H100 GPU.
  • Interactive: KV-recache for new prompts inserted.
  • Long: train-long–test-long, up to 240-second generation.
  • Fine-tuning Cost: 32 H100 GPU-days on Wan2.1 1.3B.
Arxiv 2025
sym

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Junsong Chen, Yuyang Zhao, Jincheng Yu, Ruihang Chu, Junyu Chen, Shuai Yang, Xianbang Wang, Yicheng Pan, Daquan Zhou, Huan Ling, Haozhe Liu, Hongwei Yi, Hao Zhang, Muyang Li, Yukang Chen, Han Cai, Sanja Fidler, Ping Luo, Song Han, Enze Xie

  • Linear DiT: more efficient than vanilla attention.
  • Constant-Memory KV Cache: enable minute-long video generation.
  • Training Cost: only 1% of the cost of MovieGen.
  • Inference: 16× faster in latency with competitive performance.
ICCV 2019
sym

Vehicle Re-identification with Viewpoint-aware Metric Learning

Ruihang Chu, Yifan Sun, Yadong Li, Zheng Liu, Chi Zhang, Yichen Wei

  • Viewpoint-aware Metric Learning: handle view variations in image retrieval.
  • SOTA Performance: + ~10% top-1 accuracy.

💬 Invited Talks and Reports

📋 Academic Services

  • Journal Reviewer: T-PAMI, T-IP, RA-L.
  • Conference Program Committee/Reviewer: Neurips, ICLR, ICML, CVPR, ICCV, ECCV, AAAI, and IROS.