-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux.1.dev.fp8 CKPT trainning the avr_loss keep 'nan' #76
Comments
Not seen that happen myself, I'd recommend updating to torch 2.4.1 though, it's what kohya recommends to be used and it has solved lots of memory and speed issues for many who have updated. |
ok ,i will try to update,thank you so much |
hi, i am upgrade the pytorch to 2.4.1,but the loss still keep "nan"…… |
What is your optimizer used? Or maybe u attach your workflow here. |
大概率是学习率问题 你是怎么做到这么慢的,如果是batch size太大按理说这个速度早就oom了 |
速度应该和笔记本而非台式机有关。nan和你的batch size,alpha,lr这几个有关 |
i was trying to trainning a lora which use flux.1.dev.fp8 CKPT,and the log keep telling me that avr_loss is nan,i do not know where i setting wrong or someting?
the system & version:
[START] Security scan
[DONE] Security scan
ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-09-28 16:52:30.478163
** Platform: Windows
** Python version: 3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)]
** Python executable: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\python_embeded\python.exe
** ComfyUI Path: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI
** Log path: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\comfyui.log
Prestartup times for custom nodes:
0.0 seconds: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI\custom_nodes\rgthree-comfy
0.0 seconds: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI\custom_nodes\ComfyUI-Easy-Use
4.2 seconds: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI\custom_nodes\ComfyUI-Manager
Total VRAM 6144 MB, total RAM 32461 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 Laptop GPU : cudaMallocAsync
Using pytorch cross attention
The text was updated successfully, but these errors were encountered: