DropPath implementation #2119
-
Hey @rwightman, Continuing our discussion in #2118 , the paper says" independently for each sample or As for the second question, (as far as I understand) equation 5 mentions that this is only done during inference/testing time, not during training (equation 2 is used for the training forward pass), and even in equation 5, it's a multiplication operation with p_l, not a division operation, can you please help me understand this part? pytorch-image-models/timm/layers/drop.py Line 166 in 492947d |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@IsmaelElsharkawi better to divide at train time so the next layer gets consistent activation stats than muck around at test time :) And note, you can see a note a TF impl of this that accompanied the original EfficientNet code, they called it drop connect (which conflicted with another paper name) https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/utils.py#L276-L291 |
Beta Was this translation helpful? Give feedback.
@IsmaelElsharkawi better to divide at train time so the next layer gets consistent activation stats than muck around at test time :)
And note, you can see a note a TF impl of this that accompanied the original EfficientNet code, they called it drop connect (which conflicted with another paper name)
https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/utils.py#L276-L291