This is the implementation of the paper An Image is Worth 16x16 Words. Thanks to Brian Pulfer and his medium article on the topic, they were extremely helpful and a valuable resource.
More comments will be added in the future to make the process of understanding the code easier.
I have used the MNIST dataset for this code.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020).
An Image is Worth 16x16 Words.
ArXiv.