GitHub - ShoufaChen/AdaptFormer: [NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

[NeurIPS 2022] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Project Page | arXiv

This is a PyTorch implementation of the paper AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.

Shoufa Chen¹*, Chongjian Ge¹*, Zhan Tong², Jiangliu Wang^2,3, Yibing Song², Jue Wang², Ping Luo¹
¹The University of Hong Kong, ²Tencent AI Lab, ³The Chinese University of Hong Kong
*denotes equal contribution

Catalog

Video code
Image code

Usage

Install

Tesla V100 (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0
timm 0.4.8
einops
easydict

Data Preparation

See DATASET.md.

Training

Start

# video
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch \
    --nproc_per_node=8 --nnodes=8 \
    --node_rank=$1 --master_addr=$2 --master_port=22234 \
    --use_env main_video.py \
    --finetune /path/to/pre_trained/checkpoints \
    --output_dir /path/to/output \
    --batch_size 16 --epochs 90 --blr 0.1 --weight_decay 0.0 --dist_eval \
    --data_path /path/to/SSV2 --data_set SSV2 \
    --ffn_adapt

on each of 8 nodes. --master_addr is set as the ip of the node 0. and --node_rank is 0, 1, ..., 7 for each node.

# image
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_image.py \
    --batch_size 128 --cls_token \
    --finetune /path/to/pre_trained/mae_pretrain_vit_b.pth \
    --dist_eval --data_path /path/to/data \
    --output_dir /path/to/output  \
    --drop_path 0.0  --blr 0.1 \
    --dataset cifar100 --ffn_adapt

To obtain the pre-trained checkpoint, see PRETRAIN.md.

Acknowledgement

The project is based on MAE, VideoMAE, timm, and MAM. Thanks for their awesome works.

Citation

@article{chen2022adaptformer,
      title={AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition},
      author={Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping},
      journal={arXiv preprint arXiv:2205.13535},
      year={2022}
}

License

This project is under the MIT license. See LICENSE for details.

Name	Name	Last commit message	Last commit date
Latest commit ShoufaChen NeurIPS 2022 Sep 16, 2022 6967d67 · Sep 16, 2022 History 6 Commits
datasets	datasets	image related code	Jun 12, 2022
figs	figs	init	May 26, 2022
models	models	image related code	Jun 12, 2022
util	util	init	May 26, 2022
.gitignore	.gitignore	init	May 26, 2022
DATASET.md	DATASET.md	init	May 26, 2022
LICENSE	LICENSE	init	May 26, 2022
PRETRAIN.md	PRETRAIN.md	image related code	Jun 12, 2022
README.md	README.md	NeurIPS 2022	Sep 16, 2022
convert.py	convert.py	init	May 26, 2022
engine_finetune.py	engine_finetune.py	init	May 26, 2022
main_image.py	main_image.py	image related code	Jun 12, 2022
main_video.py	main_video.py	init	May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2022] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Project Page | arXiv

Catalog

Usage

Install

Data Preparation

Training

Acknowledgement

Citation

License

About

Releases 1

Packages

Languages

License

ShoufaChen/AdaptFormer

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2022] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Project Page | arXiv

Catalog

Usage

Install

Data Preparation

Training

Acknowledgement

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages