Training recipes

We provide the specific commonds and hyper-parameters for ViTs, ResNets and ConvNexts in this recipe.

Training of ViT

1) Training with Setting I

This is a prevalent setting for training ResNets. To train ViT-small, you can use the following command.

python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR}   \
    --model deit_small_patch16_224 \
    --sched cosine -j 10 \
    --epochs ${EPOCH} --weight-decay 0.02 \
    --opt Adan \ 
    --lr 1.5e-2  --opt-betas 0.98 0.92 0.99 \
    --opt-eps 1e-8 --max-grad-norm 0.0 \
    --warmup-lr 1e-8 --min-lr 1.0e-08 \
    -b 256 --amp \
    --aug-repeats 0 \
    --warmup-epochs 60 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.1 \
    --remode pixel \
    --reprob 0.0 \
    --bce \
    --drop 0.0 --drop-path 0.05 \
    --mixup 0.2 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}

After training, this command should give the following results. Note, it seems that this setting cannot improve the results of ViT-Base under training setting II (see below).

	150 Epoch	300 Epoch
ViT small	80.1	81.1
download	config/log/model	config/log/model

2) Training with Setting II

This is the official setting used in Deit. Note, without distillation, DeiTs and ViTs are the same models. To train ViT-small, you can use the following command.

python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR} \
    --model ${MODEL_NAME} \
    --sched cosine -j 10 \
    --epochs ${EPOCH} --weight-decay .02 \
    --opt Adan \ 
    --lr 1.5e-2  --opt-betas 0.98 0.92 0.99 \
    --opt-eps 1e-8 --max-grad-norm 5.0 \
    --warmup-lr 1e-8 --min-lr 1e-5 \
    -b 256 --amp \
    --aug-repeats ${REP} \
    --warmup-epochs 60 \
    --aa ${AUG}  \
    --smoothing 0.1 \
    --remode pixel \
    --reprob 0.25 \
    --drop 0.0 --drop-path ${Dp} \
    --mixup 0.8 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}

There is some differences between hyper-parameters for ViT-Base and ViT-Small. --bce means using the Binary Cross Entropy loss.

	MODEL_NAME	REP	AUG	BCE	Bias-Decay	Drop-path
ViT-Small	deit_small_patch16_224	0	rand-m7-mstd0.5-inc1	True	False	0.1
ViT-Base	deit_base_patch16_224	3	rand-m9-mstd0.5-inc1	False	True	0.2

After training, you should expect the following results. Note that ViT-Base (300 epoch) is trained by the faster version of Adan (foreach=True). For more details and settings, please refer to the corresponding configure files.

	150 Epoch	300 Epoch
ViT-Small	79.6	80.9
download	config/log/model	config/log/model
ViT-Base	81.7	82.3/82.6
download	config/log/model	config/log/model

ResNet-50

This is a default setting used to train ResNets. To train ResNet-50, you can use the following command.

python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR} \
    --model resnet50 \
    --sched cosine -j 8 \
    --epochs ${EPOCH} --weight-decay .02 \
    --opt Adan \ 
    --lr ${LR}  --opt-betas 0.98 0.92 0.99 \
    --opt-eps 1e-8 --max-grad-norm 5.0 \
    --warmup-lr 1e-9 --min-lr 1.0e-05 --bias-decay \
    -b 256 --amp \
    --aug-repeats 0 \
    --warmup-epochs 60 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.0 \
    --remode pixel \
    --crop-pct 0.95 \
    --reprob 0.0 \
    --bce \
    --drop 0.0 --drop-path 0.05 \
    --mixup 0.1 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}

When training different epochs, we use slightly different learning rate, namely, LR = 3e-2 for EPOCH = 100 and LR = 1.5e-2 for EPOCH = 200 and 300. After training, you can get the following resutls:

	100 Epoch	200 Epoch	300 Epoch
ResNet-50	78.1	79.7	80.2
download	config/log/model	config/log/model	config/log/model

ResNet-101

To train ResNet-101, you may use the following command.

python -m torch.distributed.launch --nproc_per_node=8 train.py \ 
    --data-dir ${IMAGENET_DIR} \
    --model resnet101 \
    --sched cosine -j 8 \
    --epochs 300 --weight-decay .02 \
    --lr 1.5e-2  --warmup-lr 1e-9 --min-lr 1.0e-05 \
    -b 256 --amp --opt adan --opt-betas 0.98 0.92 0.99 --opt-eps 1e-8 \
    --max-grad-norm 5 \
    --bias-decay \
    --aug-repeats 0 \
    --warmup-epochs 90 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.0 \
    --remode pixel \ 
    --bce-loss \
    --crop-pct 0.95 \
    --reprob 0.0 \
    --drop 0.0 --drop-path 0.2 \
    --mixup 0.1 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}

We use slightly different learning rate, namely, LR = 1e-2 for EPOCH = 100 and LR = 1.5e-2 for EPOCH = 200 and 300. For more detailed training settings, please refer to the following configuration files. Note that the results for 100 and 300 epochs are obtained by the faster version Adan (foreach=True).

	100 Epoch	200 Epoch	300 Epoch
ResNet-101	80.0	81.6	81.9
download	config/log/model	config/log/model	config/log/model

ConvNext

This is a default setting to train ConvNext-tiny. To train ConvNext-tiny, you can use the following command.

python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR} \
    --model convnext_tiny_hnf \
    --sched cosine -j 8 \
    --epochs ${EPOCH} --weight-decay .02 \
    --opt Adan \ 
    --lr 1.6e-2  --opt-betas 0.98 0.92 0.90 \
    --opt-eps 1e-8 --max-grad-norm 0.0 \
    --warmup-lr 1e-9 --min-lr 1.0e-05 --bias-decay \
    -b 256 --amp \
    --aug-repeats 0 \
    --warmup-epochs 150 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.1 \
    --remode pixel \
    --reprob 0.25 \
    --drop 0.0 --drop-path 0.1 \
    --mixup 0.8 --cutmix 1.0 \
    --model-ema \
    --train-interpolation random \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}

For this training, the performance is NOT sensitive to some hyper-params, such as warmup-epochs and lr. But whether using model-ema plays a key role.

You can use the following config to train convnext tiny for 150 epoch, in which we do not utilize model-ema.

This results should be:

	150 Epoch	300 Epoch
ConvNext-tiny	81.7	82.4
download	config/log/model	config/log/model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supervised.md

supervised.md

Training recipes

Training of ViT

1) Training with Setting I

2) Training with Setting II

ResNet-50

ResNet-101

ConvNext

Files

supervised.md

Latest commit

History

supervised.md

File metadata and controls

Training recipes

Training of ViT

1) Training with Setting I

2) Training with Setting II

ResNet-50

ResNet-101

ConvNext