Official implementation of GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts.
- Clone this repository:
git clone git@github.com:Zengwh02/GlimpRouter.git
cd GlimpRouter- Set up your Python environment:
conda create -n glimp_router python=3.12 -y
conda activate glimp_router
pip install -r requirements.txt- (Optional) Download datasets using the helper script:
bash setup.shThis project uses a simple config.json to pass experiment settings to src/main.py. You can generate a config with the provided script and then run the main entrypoint:
cd src
bash run.shThe script writes config.json and launches main.py in the background, logging to ./logs/.
If you serve models locally with vLLM, fill in placeholders in server/serve.sh and run:
cd server
bash serve.shNotes:
- Replace
YOUR_API_KEY,YOUR_MODEL_NAME_OR_PATH,YOUR_BASE_URL, andYOUR_PORTwith your own values. - Use the matching chat template from
server/template/or provide your own.
src/main.py: entrypoint for GlimpRouter experiments; loads a dataset, performs routing, and writes results.src/glimprouter.py: core implementation of GlimpRouter step-wise routing and entropy-based decision logic.src/run.sh: example runner that generatesconfig.jsonand startsmain.py.server/serve.sh: helper script for spinning up a vLLM server.
The current code supports the following datasets:
- AIME24/AIME25 (math)
- MATH-500 (math)
- GPQA (general reasoning)
- LiveCodeBench v5/v6 (code generation)
To use datasets stored locally (e.g., GPQA or LiveCodeBench), edit the placeholders in src/main.py:
data_files="YOUR_DIRECTORY_OF_GPQA_DATASET"
data_files="YOUR_DIRECTORY_OF_LCB_DATASET"For public datasets loaded via datasets.load_dataset, the dataset identifiers are set directly in the code. If you want to change them, update the dataset names in src/main.py.
config.json is read automatically by src/main.py. CLI arguments override config values. Example:
{
"dataset_name": "lcbv5",
"repeat_num": 6,
"score_method": "first_token_entropy",
"token_budget": 8192,
"output_dir": "./results",
"model_size": "32b",
"small_model_size": "4b",
"score_threshold": 1.0
}.
├── data/ # Local dataset files (optional)
├── server/
│ ├── serve.sh # vLLM server helper script
│ └── template/ # Chat templates for different models
├── src/
│ ├── main.py # Experiment entrypoint
│ ├── glimprouter.py # GlimpRouter core logic
│ └── run.sh # Example runner
├── requirements.txt
└── setup.sh
@misc{zeng2026glimprouterefficientcollaborativeinference,
title={GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts},
author={Wenhao Zeng and Xuteng Zhang and Yuling Shi and Chao Hu and Yuting Chen and Beijun Shen and Xiaodong Gu},
year={2026},
eprint={2601.05110},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.05110},
}
