-
Apple
- New York
- www.jiataogu.me
- @thoma_gu
Highlights
- Pro
Stars
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Accelerate TarFlow Sampling with GS-Jacobi Iteration
A unified media (Image, Video, Audio, Text) diffusion repository, for education and learning.
Code release for paper "Test-Time Training Done Right"
Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Official implementation of Inductive Moment Matching
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[CVPR 2025] Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Lora traing script for Lightricks LTX-video
Official implementation of Continuous 3D Perception Model with Persistent State
Witness the aha moment of VLM with less than $3.
Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.
Unofficial implementation of "Simplifying, Stabilizing & Scaling Continuous-Time Consistency Models" for MNIST
A framework that calibrates object properties through differentiable simulations of robot-object interactions.
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
A mini-library for training consistency models.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[ICLR'25 Oral] No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Official inference library for Mistral models



