Chuang Yu1,2,5,
Jinmiao Zhao1,2,
Mingxuan Zhao3,
Yunpeng Liu1*,
Xiujun Shu4,
Yuanhao Feng4,
Bo Wang4,
Xiangyu Yue5*
1 Shenyang Institute of Automation, Chinese Academy of Sciences
2 University of Chinese Academy of Sciences
3 HKUST(GZ)
4 Tencent
5 MMLab, The Chinese University of Hong Kong
Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of “Understand → Rethink → Correct”, and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence.
We are finalizing the release of the paper, dataset and code and aim to complete it as soon as possible. Please stay tuned! ⚡⚡⚡
- Release paper. [Paper/arXiv]
- Release training and inference code.
- Release ScienceQA-RAD, V-OKVQA-RAD, and M3CoT-RAD datasets.
- Release model weights.
