Skip to content

WRD397/ViT_REPLICATION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

226 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This project is a structured replication of the Vision Transformer (ViT) architecture introduced by Google Research in the paper:
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
Dosovitskiy et al., 2020 (arXiv:2010.11929)


Goal

The goal is to:

  • Reproduce the model architecture and results on CIFAR-10 and TinyImageNet
  • Validate core claims (e.g., performance vs CNNs)
  • Extend the study with visualization and interpretability techniques

This is being developed as part of a personal research initiative to demonstrate competency in modern AI architectures.

About

This project is a structured replication of the Vision Transformer (ViT) architecture introduced by Google Research in the paper: **"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"** *Dosovitskiy et al., 2020 (arXiv:2010.11929)*

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors