Vision Transformer

This is the implementation of the paper An Image is Worth 16x16 Words. Thanks to Brian Pulfer and his medium article on the topic, they were extremely helpful and a valuable resource.

More comments will be added in the future to make the process of understanding the code easier.

I have used the MNIST dataset for this code.

Citation:

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020).
An Image is Worth 16x16 Words.
ArXiv.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
datasets/MNIST/raw		datasets/MNIST/raw
.DS_Store		.DS_Store
README.md		README.md
config.py		config.py
model.py		model.py
train.py		train.py
vision_transformer.ipynb		vision_transformer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer

Citation:

About

Releases

Packages

Languages

hardaatbaath/vision_transformer-pytorch

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer

Citation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages