Skip to content
View kszpxxzmc's full-sized avatar

Highlights

  • Pro

Block or report kszpxxzmc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 293 31 Updated Dec 29, 2025

Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"

Python 73 1 Updated Dec 5, 2025

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

107 4 Updated Dec 11, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 654 48 Updated Jun 5, 2025
Python 34 1 Updated Nov 4, 2025

Spatial Audio Rendering on the web.

JavaScript 890 114 Updated Jan 1, 2026

[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Python 294 29 Updated Nov 5, 2025

[ACMMM 2025] CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation

Python 6 Updated Nov 4, 2025

An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

Python 169 6 Updated Dec 26, 2025

An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"

Python 24 Updated Oct 23, 2025

[Siggraph Asia 25] SS4D: Native 4D Generative Model via Structured Spacetime Latents

Python 28 2 Updated Dec 17, 2025
Python 6 Updated Sep 19, 2024

Mirror of https://git.ffmpeg.org/ffmpeg.git

C 56,028 13,337 Updated Jan 6, 2026

Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)

Python 39 3 Updated Sep 10, 2025

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

Python 1,126 65 Updated Nov 25, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,045 240 Updated Nov 30, 2025

Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"

Python 216 19 Updated Aug 7, 2025

[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"

Python 343 9 Updated Jun 27, 2025

Project page of Sonic4D

JavaScript 7 Updated Jun 26, 2025

[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Jupyter Notebook 3,271 309 Updated Oct 27, 2024

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,143 1,289 Updated Oct 11, 2025

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

Python 97 5 Updated Dec 31, 2025
Python 43 3 Updated Oct 13, 2025
JavaScript 4 Updated Dec 12, 2025

Official code of "Imagine360: Immersive 360 Video Generation from Perspective Anchor"

Python 155 6 Updated May 14, 2025

Official repo for "IDArb: Intrinsic Decomposition for arbitrary number of input views and illuminations"

Python 94 7 Updated Jul 9, 2025

[SIGGRAPH 2025] LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation"

Python 306 13 Updated Jul 24, 2025

This is the official implementation of SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation.

Jupyter Notebook 114 6 Updated Nov 26, 2024

Customize your arXiv recommendation every day.

Python 139 24 Updated Sep 24, 2025
Python 73 5 Updated Oct 25, 2024
Next