Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any way to generate music longer than 47 seconds? #154

Open
lszhou0126 opened this issue Oct 9, 2024 · 9 comments
Open

any way to generate music longer than 47 seconds? #154

lszhou0126 opened this issue Oct 9, 2024 · 9 comments

Comments

@lszhou0126
Copy link

As the title described. Can continuation methods be used?

@kenkalang
Copy link

just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it

@lszhou0126
Copy link
Author

just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it

@kenkalang Thank you for your response. While the length has indeed increased, the audio quality seems to have deteriorated, and the conditions for seconds_start and seconds_total are not functioning as expected. The pre-trained model had a duration of 47 seconds, so it seems that a direct modification like this might not be appropriate.

@kenkalang
Copy link

yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality

@lszhou0126
Copy link
Author

lszhou0126 commented Oct 17, 2024

yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality

@kenkalang If fine-tuning is to be performed, aside from DiT, does the Autoencoder part of the network also require fine-tuning?

@NZqian
Copy link

NZqian commented Oct 17, 2024

As the autoencoder is fully convolution-based, I think it does not need any fine-tuning

@kenkalang
Copy link

As the autoencoder is fully convolution-based, I think it does not need any fine-tuning

yeah, just tune the DiT that's where the most impact happens

@zaptrem
Copy link

zaptrem commented Oct 19, 2024

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@lszhou0126
Copy link
Author

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?

@zaptrem
Copy link

zaptrem commented Oct 21, 2024

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?

Page 49 and 61 of Movie Gen they use it for their audio accompaniment model: https://ai.meta.com/static-resource/movie-gen-research-paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants