SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
-
Updated
Dec 15, 2025 - Python
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Describe this video in detail" or custom questions), and receive detailed textual analyses.
🎥 Generate videos with advanced multimodal reasoning to enhance understanding and interaction, pushing the boundaries of video content creation.
🎥 Explore cutting-edge research focused on reasoning with video models, featuring key papers and projects in the field of video intelligence.
Add a description, image, and links to the video-reasoning topic page so that developers can more easily learn about it.
To associate your repository with the video-reasoning topic, visit your repo's landing page and select "manage topics."