Skip to content

[ICCV 2025] Official repository of "FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers"

License

Notifications You must be signed in to change notification settings

JiuTian-VL/FALCON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
ICCV 2025

1Harbin Institute of Technology, Shenzhen
2Huawei Noah's Ark Lab
†Corresponding author

arXiv project page FALCON-8B falcon++

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

  • [01/2026] 🔥 The extended paper of FALCON++ is released on TechRxiv.
  • [12/2025] 🔥 Checkpoint released. Enjoy it!
  • [07/2025] 🔥 The code and project page are released. Enjoy it!
  • [06/2025] 🔥 The arXiv paper is updated.
  • [06/2025] FALCON is accepted to ICCV 2025!
  • [01/2025] arXiv paper released.

Introduction

This is the github repository of FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.

Installation

  1. Clone this repository and navigate to the folder
git clone git@github.com:JiuTian-VL/JiuTian-FALCON.git
cd falcon
  1. Install Package
conda create -n falcon python=3.10 -y
conda activate falcon
pip install --upgrade pip
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Quick Start

We have developed a well-encapsulated class JiutianHDInfer specifically designed for model inference in jiutian/eval/model_infer.py.

Below is an example of how to use the JiutianHDInfer class. By calling the inference method, you can easily obtain the model's inference results.

from jiutian.eval.model_infer import JiutianHDInfer

model_infer = JiutianHDInfer(
    model_path='/path/to/ckpt',
    model_base='/path/to/base_ckpt or None',
    conv_mode='llama_3_1',
)

image_file = '/path/to/image'
question = 'question'
model_infer.inference(image_file, question)

Evaluations

See docs/Evaluation.md for details.

Citation

If you find this work useful for your research, please kindly cite our paper:

@inproceedings{zhang2025falcon,
  title={Falcon: Resolving visual redundancy and fragmentation in high-resolution multimodal large language models via visual registers},
  author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={23530--23540},
  year={2025}
}

About

[ICCV 2025] Official repository of "FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published