-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
best fid score? #8
Comments
@yangsenwxy Hi, could you please provide some visual samples? and also the training config like how many epochs, freeze clip/blip/resnet or not. |
btw, in our experiment, we just use the ckpt in 50 epochs, cause the loss is irrelevant with the fid score. |
can I have you wechat?I tell you by email? ys810137152@gmail.com |
@yangsenwxy Yeah, I am happy to chat. But I prefer to discuss in this thread because this is a new implementation and we haven't train a model using this repo. If we can figure out the problem, it may help a lot~ |
You may follow the config given in https://github.com/Flash-321/ARLDM/blob/main/config.yaml (which is the config for the report performance.) The ddim scheduler with 50 steps provide a 2-3 higher FID score result. We suggest using ddim-6-250 or pndm-7.5-50 (this is faster and provide a 1-2 higher FID score) |
链接:https://pan.baidu.com/s/1s4WU9xdS7Qn_XO-O2kvwaA |
here |
Copy! So I think it is a training issue, since the generated visual stories are unacceptable. |
@yangsenwxy Have you ever try to train the model for 50 epochs using a init_lr of 1e-5? |
I will try it |
@yangsenwxy Thanks! I have tried using a large lr, but it produce a bad performance when training for many epochs. You can also try to train the model for 5 epochs using a lr of 1e-4 and see if it works. I tried this setting in my early experiment and it can also generate reasonable results. Feel free to at me in this thread if there is an update or further issue! |
Ok, let me try, do you mean smaller learning rate, smaller FID? |
@yangsenwxy Not actually, but large learning rate do not work well when you train the model really long. |
@Flash-321 I used your exact same settings for the last 50 epoch and ran out a FID score of 25 |
@yangsenwxy yeah, and do you also use the default setting during sampling? |
also, may I see your learning rate curve to find out if the scheduler work correctly? |
OK,wait for me for half an hour |
Sure! |
@yangsenwxy Really sorry about a bug in our code. I just check our dataset implementation, and found that during immigranting to Pytorch Lightning code, we forget to add the normalization for training data. My sincere apologize for this issue. The FID score should be further improved after fixing this bug. |
And this bug only happends in FlintstonesSV dataset, if you have done experiments in other datasets, the performance was not affected. |
I discovered this bug a long time ago, and it has been corrected during the original training |
Yes |
Now,FID is 25.069521856381982 |
Got it! Could you please post 1-2 visual samples on this thread? There are also several differences between the two implementation. Like in this version, we use stable diffusion v1.5 instead of v1.4, but I think it doesn't matter. We also use a gradient clip in our training process, I guess this may be a factor trainer = Trainer(gradient_clip_val=1.0) You can also check our original internal implementation (https://github.com/Flash-321/ARLDM/tree/a24e2e94332eb86fcc071abb83aaf341006aa622) to find some difference, we can discuss them in this thread. |
Here ,some visual samples |
@yangsenwxy Copy, your experiments help a lot, thanks for that! It looks like a training issue, the val loss should be around 0.11, and the visual samples are not reasonable and coherent. |
OK |
@Flash-321 I have added clip_norm as you suggested, but the FID is 24. 24.87433417204761 |
@yangsenwxy Hi, thanks for your feedback, it seems the conditioning part is ruined, since each frame is not coherent. I will check the implementation soon and also ask my mentor to release the ckpt and training log for reference. |
hello. I also have a difficult time reproducing your best-performing results on Pororo. Can you share some config settings? |
@KyonP Hi, our config is the same as https://github.com/xichenpan/ARLDM/blob/main/config.yaml |
@xichenpan I am looking forward to having your pre-trained weights. BTW, is there any chance you will improve the inference speed of this code? it is very time-consuming. |
@KyonP Great! I just asked my mentor, and he told me the release request have been approved by Alibaba, we will provide the pre trained weight this week! |
Hey, I found the same issue. @xichenpan Did you try to save the images first and then evaluate the FID? Or you only use the FID calculation in the "main.py" code? Cause I found the Inception Network in the code seems to lose ".eval()", which would norm the features and get a lower FID. |
@xichenpan Hi, I am having a hard time finding the released weight, and I am trying to do some follow-ups of your work. If you haven't upload them yet, can you share them via google drive or maybe other platforms? |
@xichenpan, Are your checkpoints ready ? If possible, will you be ready to share them. Can you explain, how to learn ARLDM with one available CUDA index. How to beat CUDA out of memory error using CLIP, BLIP, RESNET freezing or other methodics. |
Can somebody (who made training) share checkpoints, because I have CUDA out of memory, training the model. |
Hi,
Thank you so much for your work
How to choose the weight of the best FID score, I used your code to reproduce, none of them reached the performance to the original paper, My model in the FlintstonesSV of FID is 30
The text was updated successfully, but these errors were encountered: