Replies: 5 comments 6 replies
-
@chjej202 yeah, I've been having some doubts about this model and may end up removing, I didn't train the weighs, they're from the official paper authors, but it seems the model won't train in AMP (it's unstable). So not really that practical for a model that's supposed to be 'efficient' ... if I can't find a way to improve stability in AMP I may get rid of it... |
Beta Was this translation helpful? Give feedback.
-
FYI it should at least start to converge if you use float32 and hparams similar to vit models... adamw or nadamw, gradient clipping, wd .05, etc |
Beta Was this translation helpful? Give feedback.
-
this seems to help dc18cda ... let me know if you have any luck... |
Beta Was this translation helpful? Give feedback.
-
Here are the training configurations from the official repo: https://github.com/mit-han-lab/efficientvit/tree/master/configs/cls/imagenet |
Beta Was this translation helpful? Give feedback.
-
@andravin for the exact recipe reproduction, I'm not sure. They use 'mesa' for their L models, otherwise I think most aspects can be matched here. I did some basic train tests on small data and training was not stable unless you used float32 or disabled autocast for the attn calculation when using AMP... |
Beta Was this translation helpful? Give feedback.
-
I found that the EfficientVit was uploaded yesterday. What is the training recipe of EfficientVit (MIT)?
I tried with several hyperparameters but failed to train with EfficientVit (MIT)
Beta Was this translation helpful? Give feedback.
All reactions