-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.cuda.OutOfMemoryError #91
Comments
CPU: Intel 8700K Yes. I am also getting this error, now. It's been awhile since I trained a lora using this Kijai's trainer, but the last two loras I used it for ran perfectly well. Now, the workflow runs OOM before even one step of the first Flux Train Loop node can complete. This is my error:
File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 323, in execute File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 198, in get_output_data File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\nodes.py", line 798, in train File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\train_network.py", line 1173, in training_loop File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\flux_train_network_comfy.py", line 427, in get_noise_pred_and_target File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\utils\operations.py", line 820, in forward File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\utils\operations.py", line 808, in call File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\library\flux_models.py", line 1255, in forward File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\library\flux_models.py", line 835, in forward File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_compile.py", line 32, in inner File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_dynamo\eval_frame.py", line 632, in _fn File "H:\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\checkpoint.py", line 496, in checkpoint File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\library\flux_models.py", line 826, in _forward File "H:\ComfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\library\flux_models.py", line 445, in attention
2024-11-04 12:05:53,697 - root - INFO - Total VRAM 12282 MB, total RAM 65463 MB 2024-11-04 12:06:15,400 - root - INFO - To see the GUI go to: http://127.0.0.1:8188
|
Update: I upgraded my pytorch version from 2.5.0+cu124 to 2.5.1+cu124. Now, my workflow no longer throws an IMMEDIATE out of memory error, but it just sits on the Flux Train Loop node for a minute or so before finally throwing the same OOM error. (On a second attempt, I never actually received an OOM error. I just sat on the node doing nothing.) Is there any way to DOWNGRADE my version of pytorch back to 2.4.0? I tried following the steps on the pytorch.org website, but my pytorch version never actually changed. 🤷🏼 |
I was able to downgrade to pytorch version 2.4.1+cu124 and the workflow is now running as it used to. Is there any way we can get this to run on the newer versions of pytorch? The newest ComfyUI installs come with 2.5.1 built it. |
Seems to be some memory leak when using attn_mask, confirmed it doesn't work with 2.5.0, updating to torch nightly 2.6.0 (currently testing with Edit: downgraded to 2.5.1 now and it's also fine for me. |
This is interesting. I'm using 2.5.1 on a separate install and it's not working for me. Downgrading to 2.4.0 works fine, though. |
Why do I keep encountering the error "FluxTrainLoop Allocation on device" even after trying multiple versions, including 2.4, 2.5, and the 2.6 series? |
Why do I keep encountering the error "FluxTrainLoop Allocation on device" even after trying multiple versions, including 2.4, 2.5, and the 2.6 series? |
Anybody facing this error ?
nitFluxLoRATraining
'C:\Users\User\Desktop\Train\New folder' is not a directory
ComfyUI Error Report
Error Details
Stack Trace
The text was updated successfully, but these errors were encountered: