You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We adopt the same dropping strategy ("we randomly drop image only (zero image) for 5% of samples, drop text only (empty string) for 5% of samples, drop both of them for 5% of samples for dual-cross-attention.") in all training phases.
"We did not drop image conditional latent in VDG" means the concatenated frame latent with noise will not be dropped.
"We did not drop image conditional latent in VDG" means the concatenated frame latent with noise will not be dropped.
Originally posted by @Doubiiu in #8 (comment)
Have you ever trid randomly drop image conditional latent(concated latent) for training? I'm curious why you think this strategy is unnecessary.
The text was updated successfully, but these errors were encountered: