Skip to content

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

License

Notifications You must be signed in to change notification settings

ShoufaChen/AdaptFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6967d67 · Sep 16, 2022

History

6 Commits
Jun 12, 2022
May 26, 2022
Jun 12, 2022
May 26, 2022
May 26, 2022
May 26, 2022
May 26, 2022
Jun 12, 2022
Sep 16, 2022
May 26, 2022
May 26, 2022
Jun 12, 2022
May 26, 2022

Repository files navigation

[NeurIPS 2022] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

teaser

This is a PyTorch implementation of the paper AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.

Shoufa Chen1*, Chongjian Ge1*, Zhan Tong2, Jiangliu Wang2,3, Yibing Song2, Jue Wang2, Ping Luo1
1The University of Hong Kong, 2Tencent AI Lab, 3The Chinese University of Hong Kong
*denotes equal contribution

Catalog

  • Video code
  • Image code

Usage

Install

  • Tesla V100 (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0
  • timm 0.4.8
  • einops
  • easydict

Data Preparation

See DATASET.md.

Training

Start

# video
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch \
    --nproc_per_node=8 --nnodes=8 \
    --node_rank=$1 --master_addr=$2 --master_port=22234 \
    --use_env main_video.py \
    --finetune /path/to/pre_trained/checkpoints \
    --output_dir /path/to/output \
    --batch_size 16 --epochs 90 --blr 0.1 --weight_decay 0.0 --dist_eval \
    --data_path /path/to/SSV2 --data_set SSV2 \
    --ffn_adapt

on each of 8 nodes. --master_addr is set as the ip of the node 0. and --node_rank is 0, 1, ..., 7 for each node.

# image
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_image.py \
    --batch_size 128 --cls_token \
    --finetune /path/to/pre_trained/mae_pretrain_vit_b.pth \
    --dist_eval --data_path /path/to/data \
    --output_dir /path/to/output  \
    --drop_path 0.0  --blr 0.1 \
    --dataset cifar100 --ffn_adapt

To obtain the pre-trained checkpoint, see PRETRAIN.md.

Acknowledgement

The project is based on MAE, VideoMAE, timm, and MAM. Thanks for their awesome works.

Citation

@article{chen2022adaptformer,
      title={AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition},
      author={Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping},
      journal={arXiv preprint arXiv:2205.13535},
      year={2022}
}

License

This project is under the MIT license. See LICENSE for details.

About

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages