Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Loss Values. #50

Open
mmderakhshani opened this issue Nov 11, 2024 · 10 comments
Open

Training Loss Values. #50

mmderakhshani opened this issue Nov 11, 2024 · 10 comments

Comments

@mmderakhshani
Copy link

Hi, thank you for releasing this GitHub repository. I am trying to reproduce the stage 1 training on ImageNet. Could you please share the W&B log or let me know the initial and final loss values for that stage? I am getting the following loss, and it turns out the model is not converging. Thanks.

Screenshot 2024-11-11 at 22 33 42
@Sierkinhane
Copy link
Collaborator

Hi, thanks for your interest. Your loss curve looks a bit weird. When I trained the first stage, the loss would quickly reduce to around 8, and then converge to around 7 (500k steps).
image

@Sierkinhane
Copy link
Collaborator

What's the resolution, are you training on 256x256?

@mmderakhshani
Copy link
Author

I am training with 256x256 resolution. Can you tell at which training steps at stage 1, you are getting meaningful sampled images in class conditional manner?

One more thing, can you also tell me what is your total batch size and number of training steps in stage 1?

@Sierkinhane
Copy link
Collaborator

Here are the generated samples at 10k iterations. The total batch size of t2i, mmu, and language modeling is 1152, and I guess the batch size of t2i is around 700.
image

@mmderakhshani
Copy link
Author

mmderakhshani commented Nov 12, 2024 via email

@Sierkinhane
Copy link
Collaborator

Sierkinhane commented Nov 12, 2024

mmu loss is also used in this stage. Here are samples generated without CFG.
image

@mmderakhshani
Copy link
Author

Could you please clarify whether these samples were generated after the training was completed? Additionally, could you let me know how you adjusted the learning rate? Did you use a method similar to YOLO-style initialization, or did you conduct several experiments to determine the best approach? I want to increase the batch size and train a larger model, so I would appreciate some hints about the initialization of the learning rate.

@Sierkinhane
Copy link
Collaborator

"These samples were generated at 499,000 iterations. For the learning rate and adjustments, we just follow Muse[1] and you can refer to the configs, and we didn't put much effort into these configurations.

[1] muse: text-to-image generation via masked generative transformers

@Sierkinhane
Copy link
Collaborator

BTW, welcome to star our project :)

@mmderakhshani
Copy link
Author

Happy to star! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants