Changmeng Zheng1, Dayong Liang2, Wengyu Zhang1, Xiao-Yong Wei*,1, Tat-Seng Chua3, Qing Li1
1The Hong Kong Polytechnic 2South China University of Technology 3National University of Singapore
*Corresponding author
Blueprint Debate-on-Graph (BDoG)
🔥 [2024.10] Our paper has been nominated as the best paper award!
🔥 [2024.07] The paper and Code are released!
git clone https://github.com/thecharm/BDoG.git
cd BDoG
pip install -e .Download the model weights and set the model path in the BDoG/vlmeval/config.py file
torchrun --nproc_per_node=1 run.py --data ScienceQA_TEST \
--stage BDebate \
--debate 2
--data- Dataset supported:
ScienceQA_TESTandMMBench_DEV_EN.
- Dataset supported:
--stage- Prompt Type:
BDebate(Blueprint Debate on Graph) orODebate(Debate without Graph).
- Prompt Type:
--debate- Number of rounds for the debate.
--kg_init- (optional) Use Gemini Graph as the initialization for multi-round debates.
--nproc_per_node=2- (optional) Speed up the inference process if you have two GPUs.
--openai- (optional) Use the Openai API key to perform the final result validation.
The results are saved in the BDoG/results/instructblip_13b folder.
During this process, the datasets will be automatically downloaded to the /root/LMUData/ directory. If you need to change the data storage path, please reset --lmudata.
- VLMEvalKit: An open-source evaluation toolkit of large vision-language models (LVLMs).
- LLaVA: Wounderful MLLM based on Large Language and Vision Assistant.
- LAVIS: The amazing open-sourced multimodality learning codebase.
If this repo is useful to you, please cite using this BibTeX.
@inproceedings{zheng2024picture,
title={A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning},
author={Zheng, Changmeng and Liang, Dayong and Zhang, Wengyu and Wei, Xiao-Yong and Chua, Tat-Seng and Li, Qing},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={419--428},
year={2024}
}
