Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

The repo is the official implementation of "Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model".

Paper: 📖 Arxiv
Model: 🤗 Affordance-R1

News

[Dec 21th, 2025] 🔥 ReasonAff is coming! We have released the original dataset, and as stated in the appendix of the paper, we will filter the test data and provide a cleaner dataset soon, stay tuned!!!

[Aug 11th, 2025] 🔥 Affordance-R1 is coming! We have released the code !!!

Performance of Affordance-R1:

Affordance-R1 demonstrates extraordinary affordance reasoning ability and powerful generalization ability.

Model

Affordance-R1 framework overview. The model processes queries through policy-based reasoning with < think > and < rethink > stages to generate affordance predictions. The policy optimization uses a sophisticated reward system comprising (a) format rewards for reasoning structure, (b) perception rewards for spatial accuracy (Box-Num, IOU, L1), and (c) recognition rewards for semantic similarity, enabling effective GRPO-based training for affordance reasoning

Visualization on Web Image

Affordance-R1 can understand complex scenarios and shows good generalization.

Installation

git clone https://github.com/hq-King/Affordance-R1.git
cd Affordance-R1
conda create -n Affordance-R1 python=3.12
conda activate Affordance-R1
pip install torch==2.6.0 torchvision==0.21.0
pip install -e .
pip install gensim

Inference

Download pretrained models: 🤗 Affordance-R1 Modify the path in inference_scripts/infer.py and then run the following

python inference_scripts/infer.py

And you will get results like this:

Dataset

Download our ReasonAff datasrt here As mentioned in the paper, we found there are some coarse ground truth in the original dataset, and we are trying to filter some dataset in the test split of the data, and we will release it soon! Stay tuned!

Training

Download pretrained models:Qwen2.5-VL-7B and SAM2 Modify the path in training_scripts/aff_r1.sh and training_scripts/aff_r1.yaml and then run the following command to start training:

bash training_scripts/run_aff_r1.sh

After training, run the following command to merge the model"

python3 training_scripts/model_merger.py --local_dir [path_to_your_actor_checkpoint]

Evaluation

Download the dataset, and modify the dataset path in the following file

bash evaluation_scripts/eval_aff_r1.sh

Acknowledgement

We would like to thank the following repos for their great work:

This work is built upon the seg_zero and veRL.
This work utilizes models from Qwen2-VL, Qwen2.5-VL and SAM2.

Acknowledgement

@article{wang2025affordance,
  title={Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model},
  author={Wang, Hanqing and Wang, Shaoyang and Zhong, Yiming and Yang, Zemin and Wang, Jiamin and Cui, Zhiqing and Yuan, Jiahao and Han, Yifan and Liu, Mingyu and Ma, Yuexin},
  journal={arXiv preprint arXiv:2508.06206},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
evaluation_scripts		evaluation_scripts
inference_scripts		inference_scripts
training_scripts		training_scripts
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
1.jpg		1.jpg
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
comp.png		comp.png
model.png		model.png
performance.png		performance.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
vis.png		vis.png
visualization.py		visualization.py
zero_shot.py		zero_shot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

News

Model

Visualization on Web Image

Installation

Inference

Dataset

Training

Evaluation

Acknowledgement

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

hq-King/Affordance-R1

Folders and files

Latest commit

History

Repository files navigation

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

News

Model

Visualization on Web Image

Installation

Inference

Dataset

Training

Evaluation

Acknowledgement

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages