SSViT

official code for "Vision Transformer with Sparse Scan Prior"

Abstract

In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye’s efficient information processing. Inspired by the human eye’s sparse scanning mechanism, we propose a Sparse Scan Self-Attention mechanism (S3A). This mechanism predefines a series of Anchors of Interest for each token and employs local attention to efficiently model the spatial information around these anchors, avoiding redundant global modeling and excessive focus on local information. This approach mirrors the human eye’s functionality and significantly reduces the computational load of vision models. Building on S3A, we introduce the Sparse Scan Vision Transformer (SSViT). Extensive experiments demonstrate the outstanding performance of SSViT across a variety of tasks. Specifically, on ImageNet classification, without additional supervision or training data, SSViT achieves top-1 accuracies of 84.4%/85.7% with 4.4G/18.2G FLOPs. SSViT also excels in downstream tasks such as object detection, instance segmentation, and semantic segmentation. Its robustness is further validated across diverse datasets.

Results

Model	Params	FLOPs	Acc	log	ckpt
SSViT-T	15M	2.4G	83.0%	SSViT-T(epoch297)	SSViT-T
SSViT-S	27M	4.4G	84.4%	SSViT-S(epoch244)	SSViT-S
SSViT-B	57M	9.6G	85.3%	SSViT-B(epoch295)	SSViT-B
SSViT-L	100M	18.2G	85.7%	SSViT-L(epoch214)	SSViT-L

Citation

If you use SSViT in your research, please consider the following BibTeX entry and giving us a star:

@misc{fan2024vision,
      title={Vision Transformer with Sparse Scan Prior}, 
      author={Qihang Fan and Huaibo Huang and Mingrui Chen and Ran He},
      year={2024},
      eprint={2405.13335},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
SSViT_log		SSViT_log
classfication_release		classfication_release
LICENSE		LICENSE
README.md		README.md
SSViT.png		SSViT.png
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSViT

Abstract

Results

Citation

About

Uh oh!

Releases

Packages

Languages

License

DL-ViT/SSViT

Folders and files

Latest commit

History

Repository files navigation

SSViT

Abstract

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages