Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

[🌐 Website] [📄 Paper] [🤗 Models] [🎯 Datasets] [💬 Demo]

🔥 Updates

[2025-08-21] 🎉 Inference Scripts Released! We have released our inference prompts and scripts for embodied pointing abilities.
[2025-08-20] 🎉 Models and Datasets Released! We have released our pre-trained models, training datasets, and comprehensive evaluation benchmarks. Check out our HuggingFace collection for all available resources.
[Coming Soon] 📚 Complete training code and detailed training tutorials will be released soon. Stay tuned!

📖 Overview

Embodied-R1 is a 3B vision-language model (VLM) designed for general robotic manipulation. Through an innovative "Pointing" mechanism and Reinforced Fine-tuning (RFT) training methodology, it effectively bridges the "seeing-to-doing" gap in robotics, achieving remarkable zero-shot generalization capabilities.

Figure 1: Embodied-R1 framework overview, comprehensive performance evaluation, and zero-shot robotic manipulation demonstrations.

🛠️ Setup

Clone the repository:

git clone https://github.com/pickxiguapi/Embodied-R1.git
cd Embodied-R1

Create and activate Conda environment:

conda create -n embodied_r1 python=3.11 -y
conda activate embodied_r1

Install dependencies for inference:

pip install transformers==4.51.3 accelerate
pip install qwen-vl-utils[decord]

Install dependencies for training (optional):
```
pip install -r requirements.txt
```

🚀 Inference

Run the example code:

cd Embodied-R1/
python inference_example.py

VTG Example

Task instruction: put the red block on top of the yellow block

Before prediction (original image):

After prediction (visualization result):

RRG Example

Task instruction: put pepper in pan

Before prediction (original image):

After prediction (visualization result):

REG Example

Task instruction: bring me the camel model

Before prediction (original image):

After prediction (visualization result):

OFG Example

Task instruction: loosening stuck bolts

Before prediction (original image):

After prediction (visualization result):

📊 Evaluation

cd eval
python hf_inference_where2place.py
python hf_inference_vabench_point.py
...

🧠 Training

We plan to release the complete training code, datasets, and detailed guidelines soon. Stay tuned!

📜 Citation

If you use our work in your research, please cite our paper:

@article{yuan2025embodiedr1,
  title={Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation},
  author={Yuan, Yifu and Cui, Haiqin and Huang, Yaoting and Chen, Yibin and Ni, Fei and Dong, Zibin and Li, Pengyi and Zheng, Yan and Hao, Jianye},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
eval		eval
exsample_data		exsample_data
scripts		scripts
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.legacy		Dockerfile.legacy
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
inference_example.py		inference_example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

🔥 Updates

📖 Overview

🛠️ Setup

🚀 Inference

VTG Example

RRG Example

REG Example

OFG Example

📊 Evaluation

🧠 Training

📜 Citation

About

Uh oh!

Releases

Packages

Languages

License

quark404/Embodied-R1

Folders and files

Latest commit

History

Repository files navigation

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

🔥 Updates

📖 Overview

🛠️ Setup

🚀 Inference

VTG Example

RRG Example

REG Example

OFG Example

📊 Evaluation

🧠 Training

📜 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages