-
Zhejiang University, Shanghai AI Lab
- https://kszpxxzmc.github.io/
Highlights
- Pro
Stars
Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Unified automatic quality assessment for speech, music, and sound.
[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
[ACMMM 2025] CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation
An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"
[Siggraph Asia 25] SS4D: Native 4D Generative Model via Structured Spacetime Latents
Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
Official code of "Imagine360: Immersive 360 Video Generation from Perspective Anchor"
Official repo for "IDArb: Intrinsic Decomposition for arbitrary number of input views and illuminations"
[SIGGRAPH 2025] LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation"
This is the official implementation of SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation.
Customize your arXiv recommendation every day.
