A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

Changmeng Zheng¹, Dayong Liang², Wengyu Zhang¹, Xiao-Yong Wei^*,1, Tat-Seng Chua³, Qing Li¹

¹The Hong Kong Polytechnic ²South China University of Technology ³National University of Singapore
^*Corresponding author

Blueprint Debate-on-Graph (BDoG)

🔥News

🔥 [2024.10] Our paper has been nominated as the best paper award!
🔥 [2024.07] The paper and Code are released!

🚀 Method

🏗️ QuickStart

1. Installation

git clone https://github.com/thecharm/BDoG.git
cd BDoG
pip install -e .

2. Download model weights

Download the model weights and set the model path in the BDoG/vlmeval/config.py file

3. Running

torchrun --nproc_per_node=1 run.py --data ScienceQA_TEST \
                                   --stage BDebate \
                                   --debate 2

--data
- Dataset supported: ScienceQA_TEST and MMBench_DEV_EN.
--stage
- Prompt Type: BDebate(Blueprint Debate on Graph) or ODebate(Debate without Graph).
--debate
- Number of rounds for the debate.
--kg_init
- (optional) Use Gemini Graph as the initialization for multi-round debates.
--nproc_per_node=2
- (optional) Speed up the inference process if you have two GPUs.
--openai
- (optional) Use the Openai API key to perform the final result validation.

The results are saved in the BDoG/results/instructblip_13b folder.

During this process, the datasets will be automatically downloaded to the /root/LMUData/ directory. If you need to change the data storage path, please reset --lmudata.

❤️ Acknowledgments

VLMEvalKit: An open-source evaluation toolkit of large vision-language models (LVLMs).
LLaVA: Wounderful MLLM based on Large Language and Vision Assistant.
LAVIS: The amazing open-sourced multimodality learning codebase.

📑 Citation

If this repo is useful to you, please cite using this BibTeX.

@inproceedings{zheng2024picture,
  title={A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning},
  author={Zheng, Changmeng and Liang, Dayong and Zhang, Wengyu and Wei, Xiao-Yong and Chua, Tat-Seng and Li, Qing},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={419--428},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
data/kg_init		data/kg_init
scripts		scripts
vlmeval		vlmeval
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

🔥News

🚀 Method

🏗️ QuickStart

1. Installation

2. Download model weights

3. Running

❤️ Acknowledgments

📑 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

thecharm/BDoG

Folders and files

Latest commit

History

Repository files navigation

A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

🔥News

🚀 Method

🏗️ QuickStart

1. Installation

2. Download model weights

3. Running

❤️ Acknowledgments

📑 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages