loss nan in VAE training #144

shwj114514 · 2024-09-19T10:51:28Z

Thank you for your excellent work and the well-designed open-source code.

When I use your training code to train from scratch, I frequently encounter a situation where the loss becomes NaN after a certain number of training steps. Is this behavior expected?

This issue occurs when training both 44100 mono and stereo audio files. I have to repeat the training multiple times to ensure the loss remains stable.

I am using the stable audio 2.0 config.

apply74 · 2024-09-23T01:48:02Z

I also encountered this problem. When I increased the model parameters, the training was unstable. Is it going to be solved?

shwj114514 · 2024-09-23T02:43:27Z

I also encountered this problem. When I increased the model parameters, the training was unstable. Is it going to be solved?

I solved this problem by reducing the learning rates of both the generator and discriminator to 1/10 of their original values, and the training became stable.

apply74 · 2024-09-23T02:45:54Z

I also tried reducing the learning rate. Although the training is stable, the reconstruction result will be very poor.

fletcherist · 2024-09-26T08:41:59Z

the same thing

apply74 · 2024-09-26T08:44:52Z

the same thing

I have solved the problem by increating the batch_size from 1 to 5.

fletcherist · 2024-09-26T09:00:07Z

the same thing

I have solved the problem by increating the batch_size from 1 to 5.
@apply74
oh rly? let me try it but i think this batch size doesn't fit to gpu))
i'll message here after a try. thanks for your help very appreciate it

fletcherist · 2024-09-27T13:31:05Z

reducing the learning rates of both the generator and discriminator to 1/10 of their original values

this works

nateraw · 2024-09-28T23:56:22Z

You have to tune the learning rates. Higher batch size helps keep things stable.

Another tip is if you can't get large enough batch size, you can reduce the sample size which should free up enough memory to bump back up the batch size.

Hope this helps ❤️

stg1205 · 2024-10-25T09:05:02Z

Also I noticed that doing vad to remove silence part help

github-staff deleted a comment from Kami-prog Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss nan in VAE training #144

loss nan in VAE training #144

shwj114514 commented Sep 19, 2024 •

edited

Loading

apply74 commented Sep 23, 2024

shwj114514 commented Sep 23, 2024

apply74 commented Sep 23, 2024

fletcherist commented Sep 26, 2024

apply74 commented Sep 26, 2024

fletcherist commented Sep 26, 2024 •

edited

Loading

fletcherist commented Sep 27, 2024

nateraw commented Sep 28, 2024

stg1205 commented Oct 25, 2024

loss nan in VAE training #144

loss nan in VAE training #144

Comments

shwj114514 commented Sep 19, 2024 • edited Loading

apply74 commented Sep 23, 2024

shwj114514 commented Sep 23, 2024

apply74 commented Sep 23, 2024

fletcherist commented Sep 26, 2024

apply74 commented Sep 26, 2024

fletcherist commented Sep 26, 2024 • edited Loading

fletcherist commented Sep 27, 2024

nateraw commented Sep 28, 2024

stg1205 commented Oct 25, 2024

shwj114514 commented Sep 19, 2024 •

edited

Loading

fletcherist commented Sep 26, 2024 •

edited

Loading