ViT-large on 8 x Titan XP (12GB)
ViT-huge on 8 x 3090 ti (24GB)
- python 3.7
- pytorch 1.7.1+cu101
- torchvision 0.8.2
- timm 0.3.2
1. Get SnakeCLEF 2022 dataset
root/
├─ SnakeCLEF2022-ISOxSpeciesMapping.csv
├─ train/
│ ├─ SnakeCLEF2022-TrainMetadata.csv
│ ├─ SnakeCLEF2022-small_size/
│ ├─ SnakeCLEF2022-medium_size/
│ └─ SnakeCLEF2022-large_size/
└─ test/
├─ SnakeCLEF2022-TestMetadata.csv
└─ SnakeCLEF2022-large_size/
2. Get MAE pretrained models following README_MAE.md
3. Calculate sample per class
python preprocess_sample_per_class.py
output: ./preprocessing/sample_per_class.json
4. Prepocess metadata
python preprocess_endemic_metadata.py
output: ./preprocessing/endemic_label.json
python preprocess_code_metadata.py
output: ./preprocessing/code_label_train.json
./preprocessing/code_label_test.json
python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
--accum_iter 4 \
--batch_size 2 \
--input_size 432 \
--model vit_large_patch16 \
--epochs 50 \
--blr 1e-3 \
--layer_decay 0.75 \
--weight_decay 0.05 --drop_path 0.2 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--root root/to/your/data \
--data snakeclef2022 \
--nb_classes 1572 \
--log_dir ./log_dir/vit_large_patch16_432_50e \
--output_dir ./output_dir/vit_large_patch16_432_50e \
--finetune ./pretrained_model/mae_pretrain_vit_large.pth \
--use_prior --loss LogitAdjustment
python main_finetune.py \
--accum_iter 4 \
--batch_size 64 \
--input_size 432 \
--model vit_large_patch16 \
--epochs 50 \
--blr 1e-3 \
--layer_decay 0.75 \
--weight_decay 0.05 --drop_path 0.2 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--root root/to/your/data \
--data snakeclef2022 \
--nb_classes 1572 \
--log_dir ./log_dir/vit_large_patch16_432_50e \
--output_dir ./output_dir/vit_large_patch16_432_50e \
--resume ./output_dir/vit_large_patch16_432_50e/checkpoint-xx.pth \
--use_prior --loss LogitAdjustment \
--eval --test \
--tencrop --crop_pct 0.875
python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
--accum_iter 4 \
--batch_size 2 \
--input_size 392 \
--model vit_huge_patch14 \
--epochs 45 \
--blr 1e-3 \
--layer_decay 0.8 \
--weight_decay 0.05 --drop_path 0.2 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--root root/to/your/data \
--data snakeclef2022 \
--nb_classes 1572 \
--log_dir ./log_dir/vit_huge_patch14_392_40e \
--output_dir ./output_dir/vit_huge_patch14_392_40e \
--finetune ./pretrained_model/mae_pretrain_vit_huge.pth \
--use_prior --loss LogitAdjustment
python main_finetune.py \
--accum_iter 4 \
--batch_size 64 \
--input_size 392 \
--model vit_huge_patch14 \
--epochs 45 \
--blr 1e-3 \
--layer_decay 0.8 \
--weight_decay 0.05 --drop_path 0.2 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--root root/to/your/data \
--data snakeclef2022 \
--nb_classes 1572 \
--log_dir ./log_dir/vit_huge_patch14_392_40e \
--output_dir ./output_dir/vit_huge_patch14_392_40e \
--resume /data/mae/output_dir/vit_huge_patch14_392_40e/checkpoint-xx.pth \
--use_prior --loss LogitAdjustment \
--eval --test \
--tencrop --crop_pct 0.875
python ensemble.py
model | resolution | public | private | checkpoint |
---|---|---|---|---|
ViT-large | 384 | 0.87996 | 0.81997 | [Google] |
ViT-large | 432 | 0.89173 | 0.83063 | [Google] |
ViT-huge | 392 | 0.89449 | 0.84057 | [Google] |
Ensemble | -- | 0.89822 | 0.84565 | -- |