any way to generate music longer than 47 seconds？ #154

lszhou0126 · 2024-10-09T08:12:08Z

As the title described. Can continuation methods be used?

kenkalang · 2024-10-13T07:59:08Z

just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it

lszhou0126 · 2024-10-15T11:35:39Z

just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it

@kenkalang Thank you for your response. While the length has indeed increased, the audio quality seems to have deteriorated, and the conditions for seconds_start and seconds_total are not functioning as expected. The pre-trained model had a duration of 47 seconds, so it seems that a direct modification like this might not be appropriate.

kenkalang · 2024-10-16T08:10:06Z

yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality

lszhou0126 · 2024-10-17T08:46:54Z

yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality

@kenkalang If fine-tuning is to be performed, aside from DiT, does the Autoencoder part of the network also require fine-tuning?

NZqian · 2024-10-17T11:47:44Z

As the autoencoder is fully convolution-based, I think it does not need any fine-tuning

kenkalang · 2024-10-18T06:55:17Z

As the autoencoder is fully convolution-based, I think it does not need any fine-tuning

yeah, just tune the DiT that's where the most impact happens

zaptrem · 2024-10-19T08:20:46Z

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

lszhou0126 · 2024-10-21T06:06:31Z

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?

zaptrem · 2024-10-21T07:04:19Z

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?

Page 49 and 61 of Movie Gen they use it for their audio accompaniment model: https://ai.meta.com/static-resource/movie-gen-research-paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any way to generate music longer than 47 seconds？ #154

any way to generate music longer than 47 seconds？ #154

lszhou0126 commented Oct 9, 2024

kenkalang commented Oct 13, 2024

lszhou0126 commented Oct 15, 2024

kenkalang commented Oct 16, 2024

lszhou0126 commented Oct 17, 2024 •

edited

Loading

NZqian commented Oct 17, 2024

kenkalang commented Oct 18, 2024

zaptrem commented Oct 19, 2024

lszhou0126 commented Oct 21, 2024

zaptrem commented Oct 21, 2024 •

edited

Loading

any way to generate music longer than 47 seconds？ #154

any way to generate music longer than 47 seconds？ #154

Comments

lszhou0126 commented Oct 9, 2024

kenkalang commented Oct 13, 2024

lszhou0126 commented Oct 15, 2024

kenkalang commented Oct 16, 2024

lszhou0126 commented Oct 17, 2024 • edited Loading

NZqian commented Oct 17, 2024

kenkalang commented Oct 18, 2024

zaptrem commented Oct 19, 2024

lszhou0126 commented Oct 21, 2024

zaptrem commented Oct 21, 2024 • edited Loading

lszhou0126 commented Oct 17, 2024 •

edited

Loading

zaptrem commented Oct 21, 2024 •

edited

Loading