Skip to content

Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"

License

Notifications You must be signed in to change notification settings

quark404/Embodied-R1

 
 

Repository files navigation

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

[🌐 Website] [📄 Paper] [🤗 Models] [🎯 Datasets] [💬 Demo]


🔥 Updates

  • [2025-08-21] 🎉 Inference Scripts Released! We have released our inference prompts and scripts for embodied pointing abilities.

  • [2025-08-20] 🎉 Models and Datasets Released! We have released our pre-trained models, training datasets, and comprehensive evaluation benchmarks. Check out our HuggingFace collection for all available resources.

  • [Coming Soon] 📚 Complete training code and detailed training tutorials will be released soon. Stay tuned!


📖 Overview

Embodied-R1 is a 3B vision-language model (VLM) designed for general robotic manipulation. Through an innovative "Pointing" mechanism and Reinforced Fine-tuning (RFT) training methodology, it effectively bridges the "seeing-to-doing" gap in robotics, achieving remarkable zero-shot generalization capabilities.

Embodied-R1 Framework Figure 1: Embodied-R1 framework overview, comprehensive performance evaluation, and zero-shot robotic manipulation demonstrations.


🛠️ Setup

  1. Clone the repository:

    git clone https://github.com/pickxiguapi/Embodied-R1.git
    cd Embodied-R1
  2. Create and activate Conda environment:

    conda create -n embodied_r1 python=3.11 -y
    conda activate embodied_r1
  3. Install dependencies for inference:

    pip install transformers==4.51.3 accelerate
    pip install qwen-vl-utils[decord]
  4. Install dependencies for training (optional):

    pip install -r requirements.txt

🚀 Inference

Run the example code:

cd Embodied-R1/
python inference_example.py

VTG Example

Task instruction: put the red block on top of the yellow block

Before prediction (original image):

Original input image

After prediction (visualization result):

Visualization result with predicted points

RRG Example

Task instruction: put pepper in pan

Before prediction (original image):

Original input image

After prediction (visualization result):

Visualization result with predicted points

REG Example

Task instruction: bring me the camel model

Before prediction (original image):

Original input image

After prediction (visualization result):

Visualization result with predicted points

OFG Example

Task instruction: loosening stuck bolts

Before prediction (original image):

Original input image

After prediction (visualization result):

Visualization result with predicted points


📊 Evaluation

cd eval
python hf_inference_where2place.py
python hf_inference_vabench_point.py
...

🧠 Training

We plan to release the complete training code, datasets, and detailed guidelines soon. Stay tuned!

📜 Citation

If you use our work in your research, please cite our paper:

@article{yuan2025embodiedr1,
  title={Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation},
  author={Yuan, Yifu and Cui, Haiqin and Huang, Yaoting and Chen, Yibin and Ni, Fei and Dong, Zibin and Li, Pengyi and Zheng, Yan and Hao, Jianye},
  year={2025}
}

About

Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Other 1.0%