Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
[🌐 Website] [📄 Paper] [🤗 Models] [🎯 Datasets] [💬 Demo]
-
[2025-08-21] 🎉 Inference Scripts Released! We have released our inference prompts and scripts for embodied pointing abilities.
-
[2025-08-20] 🎉 Models and Datasets Released! We have released our pre-trained models, training datasets, and comprehensive evaluation benchmarks. Check out our HuggingFace collection for all available resources.
-
[Coming Soon] 📚 Complete training code and detailed training tutorials will be released soon. Stay tuned!
Embodied-R1 is a 3B vision-language model (VLM) designed for general robotic manipulation. Through an innovative "Pointing" mechanism and Reinforced Fine-tuning (RFT) training methodology, it effectively bridges the "seeing-to-doing" gap in robotics, achieving remarkable zero-shot generalization capabilities.
Figure 1: Embodied-R1 framework overview, comprehensive performance evaluation, and zero-shot robotic manipulation demonstrations.
-
Clone the repository:
git clone https://github.com/pickxiguapi/Embodied-R1.git cd Embodied-R1 -
Create and activate Conda environment:
conda create -n embodied_r1 python=3.11 -y conda activate embodied_r1
-
Install dependencies for inference:
pip install transformers==4.51.3 accelerate pip install qwen-vl-utils[decord]
-
Install dependencies for training (optional):
pip install -r requirements.txt
Run the example code:
cd Embodied-R1/
python inference_example.pyTask instruction: put the red block on top of the yellow block
Before prediction (original image):
After prediction (visualization result):
Task instruction: put pepper in pan
Before prediction (original image):
After prediction (visualization result):
Task instruction: bring me the camel model
Before prediction (original image):
After prediction (visualization result):
Task instruction: loosening stuck bolts
Before prediction (original image):
After prediction (visualization result):
cd eval
python hf_inference_where2place.py
python hf_inference_vabench_point.py
...We plan to release the complete training code, datasets, and detailed guidelines soon. Stay tuned!
If you use our work in your research, please cite our paper:
@article{yuan2025embodiedr1,
title={Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation},
author={Yuan, Yifu and Cui, Haiqin and Huang, Yaoting and Chen, Yibin and Ni, Fei and Dong, Zibin and Li, Pengyi and Zheng, Yan and Hao, Jianye},
year={2025}
}






