Fold the Pants (AgiLex).
Clean the Table (AgiLex).
Fill the Mug (RoboCasa).
Close the Drawer (RoboCasa).
Place Bowl on the Plate (LIBERO).
Open the Drawer (LIBERO).
Stack the Blocks (Flanka).
Stack the Blocks (Human).
RoboBrain 2.5: Depth in Sight, Time in Mind.
RoboBrain-2.5 is a next-generation embodied AI foundation model that advances general perception, spatial reasoning, and temporal modeling through extensive training on high-quality spatiotemporal supervision. Building upon its predecessor, RoboBrain 2.5 introduces two major capability upgrades. Specifically, it unlocks Precise 3D Spatial Reasoning by shifting from 2D pixel-relative grounding to depth-aware coordinate prediction and absolute metric constraint comprehension, generating complete 3D manipulation traces as ordered keypoint sequences under physical constraints. Complementing this spatial precision, the model establishes Dense Temporal Value Estimation that provides dense, step-aware progress prediction and execution state understanding across varying viewpoints, producing stable feedback signals for downstream learning. Together, these upgrades extend the framework toward more physically grounded and execution-aware embodied intelligence for complex, fine-grained manipulation.
Compared to version 2.0, RoboBrain-2.5 achieves a leap in spatial perception and reasoning capabilities:
RoboBrain-2.5 makes significant progress in temporal modeling by constructing a General Reward Model (GRM):
RoboBrain 2.5 also maintains the three core capabilities of version 2.0, which supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions, temporal perception for future trajectory estimation, and scene reasoning through real-time structured memory construction and update.
Fold the Pants (AgiLex).
Clean the Table (AgiLex).
Fill the Mug (RoboCasa).
Close the Drawer (RoboCasa).
Place Bowl on the Plate (LIBERO).
Open the Drawer (LIBERO).
Stack the Blocks (Flanka).
Stack the Blocks (Human).
Insert the Square Block.
Trigger the Circuit.
Cap the Pen.
This demo shows the visualization of the performance of RoboBrain 2.5 on TraceSpatial-Bench. Yellow masks mark the target objects, and pink 3D boxes mark correct end regions. Despite similar 2D projections, our model yields more accurate spatial traces than strong general VLMs, which often produce floating or colliding traces due to inaccurate depth estimation. Leveraging richer geometric cues further improves performance.
The demo shows how robotic arms follow 3D spatial traces generated by RoboBrain 2.5 to successfully complete a diverse set of manipulation tasks, demonstrating its strong spatial reasoning ability and effective support for embodied task execution.
Visualizations of spatial tracing in complex, cluttered environments using RoboBrain 2.5.






Demos below show that RoboBrain 2.5 can handle challenging long-horizon spatial tracing tasks requiring complex multi-step metric-grounded reasoning in cluttered and dynamic environments by integrating various control policies diverse robots.
System Stability
This video demonstrates the model's referential ability in color recognition and its stability in continuous operation.
Real-time Scene Adaptation
This video demonstrate the model's rapid scene adaptation ability and its capability to judge object proximity, recognize orientation, and determine distance.
Real-time Voice Interruption Adjustment
This video demonstrates the model's capabilities in object spatial relationship recognition, multi-step reasoning, rapid interactive reasoning, and real-time interruption adjustment.
Part-level Orientation-related Referring
This video demonstrates the model's capabilities in object spatial height recognition and part-level orientation-related region identification.
Functionality-oriented Referring
This video demonstrating the model's capabilities in object spatial height recognition and illuminated area identification.
Multi-step Spatial Referring with Reasoning
This video demonstrates the model's object spatial relationship recognition and multi-step spaital referring with reasoning capability.
Structured Arrangement
This video demonstrates the model's ability to understand spatial relationships and pattern reasoning between objects.
Mobile Manipulation
This video demonstrates the model's ability to control a humanoid for both tabletop object manipulation and indoor navigation.
Object Attribute Recognition
This video demonstrates the model's ability to accurately recognize and differentiate objects by their sizes and its stability in continuous operation.
Object Affordance Localization
This video demonstrates the model's capability in object affordance prediction (grasping the handle of the mug) as well as locating objects based on their colors and distances.
Spatial Relations Reasoning
This video demonstrates the model's spatial reasoning capabilities, including distance perception (nearest), position awareness (left and front), and free space localization.
Spatial Referencing and Vacancy Detection
This video demonstrates the model's object referencing capability based on spatial relations and its ability to locate vacant areas in 3D space.
We highlight the distributed training framework FlagScale developed by BAAI Framework R&D team, and the evaluation framework FlagEvalMM developed by BAAI FlagEval team. Both are used for RoboBrain 2.5. Many thanks to the teams for their contributions!
If you find our model helpful, feel free to cite it:
@article{tan2026robobrain25depthsight,
title={RoboBrain 2.5: Depth in Sight, Time in Mind},
author={Tan, Huajie and Zhou, Enshen and Li, Zhiyu and Xu, Yijie and Ji, Yuheng and Chen, Xiansheng and Chi, Cheng and Wang, Pengwei and Jia, Huizhu and Ao, Yulong and Cao, Mingyu and Chen, Sixiang and Li, Zhe and Liu, Mengzhen and Wang, Zixiao and Rong, Shanyu and Lyu, Yaoxu and Zhao, Zhongxia and Co, Peterson and Li, Yibo and Han, Yi and Xie, Shaoxuan and Yao, Guocai and Wang, Songjing and Zhang, Leiduo and Yang, Xi and Jiao, Yance and Shi, Donghai and Xie, Kunchang and Nie, Shaokai and Men, Chunlei and Lin, Yonghua and Wang, Zhongyuan and Huang, Tiejun and Zhang, Shanghang},
journal={arXiv preprint arXiv:2601.14352},
year={2026}
}
@article{RoboBrain2.0TechnicalReport,
title={RoboBrain 2.0 Technical Report},
author={BAAI RoboBrain Team},
journal={arXiv preprint arXiv:2507.02029},
year={2025}
}
@article{RoboBrain1.0,
title={Robobrain: A unified brain model for robotic manipulation from abstract to concrete},
author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
journal={arXiv preprint arXiv:2502.21257},
year={2025}
}
@article{tan2025robo,
title={Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation},
author={Tan, Huajie and Chen, Sixiang and Xu, Yijie and Wang, Zixiao and Ji, Yuheng and Chi, Cheng and Lyu, Yaoxu and Zhao, Zhongxia and Chen, Xiansheng and Co, Peterson and others},
journal={arXiv preprint arXiv:2512.23703},
year={2025}
}
@article{zhou2025robotracer,
title={RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics},
author={Zhou, Enshen and Chi, Cheng and Li, Yibo and An, Jingkun and Zhang, Jiayuan and Rong, Shanyu and Han, Yi and Ji, Yuheng and Liu, Mengzhen and Wang, Pengwei and others},
journal={arXiv preprint arXiv:2512.13660},
year={2025}
}
@article{Reason-RFT,
title={Reason-rft: Reinforcement fine-tuning for visual reasoning},
author={Tan, Huajie and Ji, Yuheng and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Wang, Zhongyuan and Zhang, Shanghang},
journal={arXiv preprint arXiv:2503.20752},
year={2025}
}
@article{tan2025roboos,
title={Roboos-next: A unified memory-based framework for lifelong, scalable, and robust multi-robot collaboration},
author={Tan, Huajie and Chi, Cheng and Chen, Xiansheng and Ji, Yuheng and Zhao, Zhongxia and Hao, Xiaoshuai and Lyu, Yaoxu and Cao, Mingyu and Zhao, Junkai and Lyu, Huaihai and others},
journal={arXiv preprint arXiv:2510.26536},
year={2025}
}