Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data_loading.py		Data_loading.py
Encoder.py		Encoder.py
LICENSE		LICENSE
README.md		README.md
input_processing.py		input_processing.py
main.py		main.py
multihead_attention.py		multihead_attention.py

Repository files navigation

vision-tranformer-coding

coding trying

This is on going. So there can be some mistakes.

At main.py model runs.

At Data_loading.py, only subset of CIFAR10 dataset will be loaded.

At input_processing.py, there are two functions

at image_to_patches, images divided into patches
shape : B , N , embedding vector_size = patch_size^2 * channel_num
at preprocessing, cls_tocken and positional embedding will be added to image.

At multihead_attention.py, attention process will be runned.

At Encoder.py, encoder structure will be runned.(several encoding can be easily runned if you just add more encoders)

About

coding trying

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%