NaN value in loss #8

wusize · 2024-06-28T08:34:24Z

Hi, thanks for your great work! I am trying to reproduce vqgan on imagenet by running this script (stage 1). However, the training processes always collapsed between 3k iters and 6k iters with NaN in losses. Is there any trick to avoid NaN during training?

hyc9 · 2024-06-28T11:49:25Z

I have encountered this before and found that reducing the number of warm-up steps can be solved

wusize · 2024-06-28T12:57:51Z

Thanks for the feedback! I have an additional question on why the warm-up steps of the discriminator are 500000 (--dis_warmup_steps 500000), i.e., the discriminator loss is increased linearly across the whole training process.

wdrink · 2024-07-06T03:37:07Z

Could you share more details, e.g., what type of data did you use? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN value in loss #8

NaN value in loss #8

wusize commented Jun 28, 2024

hyc9 commented Jun 28, 2024

wusize commented Jun 28, 2024

wdrink commented Jul 6, 2024

NaN value in loss #8

NaN value in loss #8

Comments

wusize commented Jun 28, 2024

hyc9 commented Jun 28, 2024

wusize commented Jun 28, 2024

wdrink commented Jul 6, 2024