Stars
[EMNLP'25] s3 - ⚡ Efficient & Effective Search Agent Training via RL for RAG (RLVR for Search with Minimal Data)
[NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
Sparking "Thinking with Videos" via Reinforcement Learning
[NeurIPS 2025🔥]Main source code of SRPO framework.
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Time-R1: Framework and resources for endowing LLMs with comprehensive temporal reasoning (understanding, prediction, creative generation) using a novel three-stage RL curriculum. Includes the Time-…
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Official PyTorch Implementation for Advancing Bayesian Optimization via Learning Correlated Latent Space (CoBO)
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection" (CVPR 2024).
Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
CoS: Chain-of-Shot Prompting for Long Video Understanding
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
Official Implementation (Pytorch) of "Super-class guided Transformer for Zero-Shot Attribute Classification", AAAI 2025
Official Implementation (Pytorch) of the "Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation", EMNLP 2024 (main)
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
Official Implementation (Pytorch) of the "LLaMo: Large Language Model-based Molecular Graph Assistant", NeurIPS 2024
Official Implementation (Pytorch) of "Constant Acceleration Flow", NeurIPS 2024
Official Implementation (Pytorch) of "DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems", ECCV 2024 Oral paper
Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".

