[CVPR2026]The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

Runhao Mao^* Hanshi Wang^* Yixiang Yang Qianli Ma Jingmeng Zhou Zhipeng Zhang^✉

AutoLab, School of Artificial Intelligence, Shanghai Jiao Tong University

^* Equal contribution
^✉ Corresponding author

amao769909148@gmail.com · zhipeng.zhang.cv@outlook.com

The first systematic benchmark and mitigation framework for catastrophic forgetting in VLM-centric autonomous driving.

📰 News

[2026.04.07] 🎉 🎉 Code, paper and dataset are released.
[2026.02.21] 🎉 🎉 Our paper has been accepted by CVPR 2026!

📖 Overview

Vision-Language Models bring strong world knowledge and long-tail generalization to autonomous driving, but standard fine-tuning can silently destroy these capabilities. FidelityAD studies this blind spot systematically by introducing a dedicated forgetting benchmark and a mitigation framework tailored for driving VLMs.

We build Fidelity Driving Bench, a large-scale benchmark for quantifying forgetting in autonomous driving, and propose Drive Expert Adapter (DEA), which shifts adaptation from destructive weight updates to prompt-level and expert-level routing. Extensive experiments show that DEA improves downstream driving performance while better preserving pretrained knowledge.

Our main contributions are summarized as follows:

We provide the first systematic investigation of catastrophic forgetting in VLM-centric autonomous driving.
We introduce Fidelity Driving Bench, a large-scale benchmark built from 180K scenes and 900K QA pairs across 15 data sources.
We propose DEA, a new framework with a Prompt Adapter and a Task-Adaptive Expert Module for scene-aware knowledge routing.
We demonstrate that DEA mitigates forgetting while maintaining strong performance on driving-specific tasks.

🧠 Method

Prompt Adapter

DEA learns prompt-level task priors and retrieves the most relevant prompt tokens according to the input question, helping the model adapt without overwriting core parameters.

Task-Adaptive Expert Module

DEA further introduces a scene-aware expert routing mechanism that dynamically selects suitable driving experts according to prompt semantics and scene-specific cues.

📊 Main Results

Fidelity Driving Bench shows that many existing driving VLMs suffer substantial forgetting after adaptation. On the Qwen2.5VL-3B backbone, DEA achieves stronger task performance with better knowledge retention than full fine-tuning.

Method	KRR	SD	T-QA	NoPR
Base (Qwen2.5VL-3B)	-	56.6	28.7	36.8
ImpromptuVLA-3B	68.4%	59.1	33.0	25.2
DEA (Base + TAEM + PA)	79.0%	58.8	41.0	29.0

🧪 Benchmark

Data Scale

180K training scenes
900K language QA pairs
15 source datasets
1,000 manually verified long-tail test images

Evaluation Tasks

Scene Description
Traffic-QA
Noteworthy Objects' Perception

Metrics

LLM-as-Judge (GPT Score)
Noteworthy Objects' Perception Recall (NoPR)
Knowledge Retention Rate (KRR)

🌐 Start DEA Training

The training pipeline can be launched with the provided shell script. A typical workflow is:

Clone the repository.
Create and activate a conda environment.
Install the training dependencies.
Update the paths in train/train_DEA.sh.
Run the training script.

git clone https://github.com/AutoLab-SAI-SJTU/FidelityDrivingBench.git
cd FidelityDrivingBench

conda create -n fidelityad python=3.10
conda activate fidelityad

pip install -r requirements.txt

cd train
# Please update the dataset path, checkpoint path, and output path in train_DEA.sh first.
sh train_DEA.sh

▶️ Start Evaluation

The evaluation service is designed as a local API server. You can start it with the following steps:

Install the evaluation dependencies.
Enter the eval directory.
Update the paths in eval/app.py.
Launch the FastAPI service with uvicorn.
Submit a .jsonl file for scoring.

pip install -r requirements_eval.txt

cd eval
uvicorn app:app --host 0.0.0.0 --port 10086 --reload

Evaluation APIs

gpt_score: returns the GPT-based score.
gpt_eval: returns the NoPR score.
gpt_acc: returns the Traffic-QA accuracy.

Example Request

Replace the input file path and server address with your own environment before running:

curl -F "file=@/path/to/test_input.jsonl" \
     -F "output_name=result.jsonl" \
     http://<server-ip>:8000/gpt_score

📋 Checklist

Release paper
Release dataset
Release code
Release trained models

📜 Citation

If you find this work useful, please consider citing:

@inproceedings{mao2026blindspot,
  title={The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models},
  author={Mao, Runhao and Wang, Hanshi and Yang, Yixiang and Ma, Qianli and Zhou, Jingmeng and Zhang, Zhipeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
eval		eval
train		train
README.md		README.md
requirements.txt		requirements.txt
requirements_eval.txt		requirements_eval.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR2026]The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

📰 News

📖 Overview

🧠 Method

Prompt Adapter

Task-Adaptive Expert Module

📊 Main Results

🧪 Benchmark

Data Scale

Evaluation Tasks

Metrics

🌐 Start DEA Training

▶️ Start Evaluation

Evaluation APIs

Example Request

📋 Checklist

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[CVPR2026]The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

📰 News

📖 Overview

🧠 Method

Prompt Adapter

Task-Adaptive Expert Module

📊 Main Results

🧪 Benchmark

Data Scale

Evaluation Tasks

Metrics

🌐 Start DEA Training

▶️ Start Evaluation

Evaluation APIs

Example Request

📋 Checklist

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages