GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Official implementation of GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts.


GlimpRouter is a lightweight, training-free step-wise routing framework that uses the entropy of the initial token in each reasoning step to decide whether a small model or a large model should generate the step.

Code

Installation

Clone this repository:

git clone git@github.com:Zengwh02/GlimpRouter.git
cd GlimpRouter

Set up your Python environment:

conda create -n glimp_router python=3.12 -y
conda activate glimp_router
pip install -r requirements.txt

(Optional) Download datasets using the helper script:

bash setup.sh

Quick Start

This project uses a simple config.json to pass experiment settings to src/main.py. You can generate a config with the provided script and then run the main entrypoint:

cd src
bash run.sh

The script writes config.json and launches main.py in the background, logging to ./logs/.

Running the vLLM Server

If you serve models locally with vLLM, fill in placeholders in server/serve.sh and run:

cd server
bash serve.sh

Notes:

Replace YOUR_API_KEY, YOUR_MODEL_NAME_OR_PATH, YOUR_BASE_URL, and YOUR_PORT with your own values.
Use the matching chat template from server/template/ or provide your own.

Key Scripts

src/main.py: entrypoint for GlimpRouter experiments; loads a dataset, performs routing, and writes results.
src/glimprouter.py: core implementation of GlimpRouter step-wise routing and entropy-based decision logic.
src/run.sh: example runner that generates config.json and starts main.py.
server/serve.sh: helper script for spinning up a vLLM server.

Datasets

The current code supports the following datasets:

AIME24/AIME25 (math)
MATH-500 (math)
GPQA (general reasoning)
LiveCodeBench v5/v6 (code generation)

To use datasets stored locally (e.g., GPQA or LiveCodeBench), edit the placeholders in src/main.py:

data_files="YOUR_DIRECTORY_OF_GPQA_DATASET"
data_files="YOUR_DIRECTORY_OF_LCB_DATASET"

For public datasets loaded via datasets.load_dataset, the dataset identifiers are set directly in the code. If you want to change them, update the dataset names in src/main.py.

Configuration

config.json is read automatically by src/main.py. CLI arguments override config values. Example:

{
  "dataset_name": "lcbv5",
  "repeat_num": 6,
  "score_method": "first_token_entropy",
  "token_budget": 8192,
  "output_dir": "./results",
  "model_size": "32b",
  "small_model_size": "4b",
  "score_threshold": 1.0
}

Project Structure

.
├── data/                         # Local dataset files (optional)
├── server/
│   ├── serve.sh                  # vLLM server helper script
│   └── template/                 # Chat templates for different models
├── src/
│   ├── main.py                   # Experiment entrypoint
│   ├── glimprouter.py            # GlimpRouter core logic
│   └── run.sh                    # Example runner
├── requirements.txt
└── setup.sh

BibTeX

@misc{zeng2026glimprouterefficientcollaborativeinference,
      title={GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts}, 
      author={Wenhao Zeng and Xuteng Zhang and Yuling Shi and Chao Hu and Yuting Chen and Beijun Shen and Xiaodong Gu},
      year={2026},
      eprint={2601.05110},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.05110}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
server		server
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Code

Installation

Quick Start

Running the vLLM Server

Key Scripts

Datasets

Configuration

Project Structure

BibTeX

About

Uh oh!

Releases

Packages

Languages

Zengwh02/GlimpRouter

Folders and files

Latest commit

History

Repository files navigation

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Code

Installation

Quick Start

Running the vLLM Server

Key Scripts

Datasets

Configuration

Project Structure

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages