Part way through the conversion of models to multi-weight support (model_arch.pretrain_tag
), module reorg for future building, and lots of new weights and model additions as we go...
This is considered a development release. Please stick to 0.6.x if you need stability. Some of the model names, tags will shift a bit, some old names have already been deprecated and remapping support not added yet. For code 0.6.x branch is considered 'stable' https://github.com/rwightman/pytorch-image-models/tree/0.6.x
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
efficientnet_b5.in12k_ft_in1k
- 85.9 @ 448x448
vit_medium_patch16_gap_384.in12k_ft_in1k
- 85.5 @ 384x384
vit_medium_patch16_gap_256.in12k_ft_in1k
- 84.5 @ 256x256
convnext_nano.in12k_ft_in1k
- 82.9 @ 288x288
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py
, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
model |
top1 |
param_count |
gmac |
macts |
hub |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.2 |
304.5 |
191.1 |
270.2 |
link |
eva_large_patch14_336.in22k_ft_in1k |
88.7 |
304.5 |
191.1 |
270.2 |
link |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.6 |
304.1 |
61.6 |
63.5 |
link |
eva_large_patch14_196.in22k_ft_in1k |
87.9 |
304.1 |
61.6 |
63.5 |
link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py
.
model |
top1 |
param_count |
gmac |
macts |
hub |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.8 |
1014.4 |
1906.8 |
2577.2 |
link |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.6 |
1013 |
620.6 |
550.7 |
link |
eva_giant_patch14_336.clip_ft_in1k |
89.4 |
1013 |
620.6 |
550.7 |
link |
eva_giant_patch14_224.clip_ft_in1k |
89.1 |
1012.6 |
267.2 |
192.6 |
link |
Dec 5, 2022
- Pre-release (
0.8.0dev0
) of multi-weight support (model_arch.pretrained_tag
). Install with pip install --pre timm
- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompile
argument
- Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
model |
top1 |
param_count |
gmac |
macts |
hub |
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k |
88.6 |
632.5 |
391 |
407.5 |
link |
vit_large_patch14_clip_336.openai_ft_in12k_in1k |
88.3 |
304.5 |
191.1 |
270.2 |
link |
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k |
88.2 |
632 |
167.4 |
139.4 |
link |
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k |
88.2 |
304.5 |
191.1 |
270.2 |
link |
vit_large_patch14_clip_224.openai_ft_in12k_in1k |
88.2 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k |
87.9 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_224.openai_ft_in1k |
87.9 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_336.laion2b_ft_in1k |
87.9 |
304.5 |
191.1 |
270.2 |
link |
vit_huge_patch14_clip_224.laion2b_ft_in1k |
87.6 |
632 |
167.4 |
139.4 |
link |
vit_large_patch14_clip_224.laion2b_ft_in1k |
87.3 |
304.2 |
81.1 |
88.8 |
link |
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k |
87.2 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.openai_ft_in12k_in1k |
87 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.laion2b_ft_in1k |
86.6 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.openai_ft_in1k |
86.2 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k |
86.2 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch16_clip_224.openai_ft_in12k_in1k |
85.9 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k |
85.8 |
88.3 |
17.9 |
23.9 |
link |
vit_base_patch16_clip_224.laion2b_ft_in1k |
85.5 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k |
85.4 |
88.3 |
13.1 |
16.5 |
link |
vit_base_patch16_clip_224.openai_ft_in1k |
85.3 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_384.openai_ft_in12k_in1k |
85.2 |
88.3 |
13.1 |
16.5 |
link |
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k |
83.3 |
88.2 |
4.4 |
5 |
link |
vit_base_patch32_clip_224.laion2b_ft_in1k |
82.6 |
88.2 |
4.4 |
5 |
link |
vit_base_patch32_clip_224.openai_ft_in1k |
81.9 |
88.2 |
4.4 |
5 |
link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
model |
top1 |
param_count |
gmac |
macts |
hub |
maxvit_xlarge_tf_512.in21k_ft_in1k |
88.5 |
475.8 |
534.1 |
1413.2 |
link |
maxvit_xlarge_tf_384.in21k_ft_in1k |
88.3 |
475.3 |
292.8 |
668.8 |
link |
maxvit_base_tf_512.in21k_ft_in1k |
88.2 |
119.9 |
138 |
704 |
link |
maxvit_large_tf_512.in21k_ft_in1k |
88 |
212.3 |
244.8 |
942.2 |
link |
maxvit_large_tf_384.in21k_ft_in1k |
88 |
212 |
132.6 |
445.8 |
link |
maxvit_base_tf_384.in21k_ft_in1k |
87.9 |
119.6 |
73.8 |
332.9 |
link |
maxvit_base_tf_512.in1k |
86.6 |
119.9 |
138 |
704 |
link |
maxvit_large_tf_512.in1k |
86.5 |
212.3 |
244.8 |
942.2 |
link |
maxvit_base_tf_384.in1k |
86.3 |
119.6 |
73.8 |
332.9 |
link |
maxvit_large_tf_384.in1k |
86.2 |
212 |
132.6 |
445.8 |
link |
maxvit_small_tf_512.in1k |
86.1 |
69.1 |
67.3 |
383.8 |
link |
maxvit_tiny_tf_512.in1k |
85.7 |
31 |
33.5 |
257.6 |
link |
maxvit_small_tf_384.in1k |
85.5 |
69 |
35.9 |
183.6 |
link |
maxvit_tiny_tf_384.in1k |
85.1 |
31 |
17.5 |
123.4 |
link |
maxvit_large_tf_224.in1k |
84.9 |
211.8 |
43.7 |
127.4 |
link |
maxvit_base_tf_224.in1k |
84.9 |
119.5 |
24 |
95 |
link |
maxvit_small_tf_224.in1k |
84.4 |
68.9 |
11.7 |
53.2 |
link |
maxvit_tiny_tf_224.in1k |
83.4 |
30.9 |
5.6 |
35.8 |
link |
Oct 15, 2022
- Train and validation script enhancements
- Non-GPU (ie CPU) device support
- SLURM compatibility for train script
- HF datasets support (via ReaderHfds)
- TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
- in_chans !=3 support for scripts / loader
- Adan optimizer
- Can enable per-step LR scheduling via args
- Dataset 'parsers' renamed to 'readers', more descriptive of purpose
- AMP args changed, APEX via
--amp-impl apex
, bfloat16 supportedf via --amp-dtype bfloat16
- main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
- master -> main branch rename