Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

Open
dkhold opened this issue Oct 5, 2024 · 4 comments
Open

InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

dkhold opened this issue Oct 5, 2024 · 4 comments

Comments

@dkhold
Copy link

dkhold commented Oct 5, 2024

Hello,

Thanks for providing these nodes! I've used the reference workflow.json from the repository.

I successfully preprocessed the images (steps 1-4) but when actually starting the training, I get the error below.
I made sure to select the original ae.safetensors and flux-dev.safetensors. I also tried with flux-dev-fp8, same error.
I tried with and without highvram, with and without split_mode, with and without base_fp8.

For some reason, moving the text_encoders to the GPU is failing as can be seen below.

Problematic lines: here or here.

  • I do not believe this is because of lack of VRAM. Using nvtop I see I stay below 30% VRAM usage.
  • I downloaded the safetensors from HF to make sure they were the correct ones.

This is pytorch 2.3.1, cuda 12.1, cudnn8.

Thanks for your help!

Error occurred when executing InitFluxLoRATraining:

Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

File "/stable-diffusion/execution.py", line 316, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/stable-diffusion/execution.py", line 191, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/stable-diffusion/execution.py", line 168, in _map_node_over_list
process_inputs(input_dict, i)
File "/stable-diffusion/execution.py", line 157, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "/data/config/comfy/custom_nodes/ComfyUI-FluxTrainer/nodes.py", line 523, in init_training
training_loop = network_trainer.init_train(args)
File "/data/config/comfy/custom_nodes/ComfyUI-FluxTrainer/train_network.py", line 406, in init_train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File "/data/config/comfy/custom_nodes/ComfyUI-FluxTrainer/flux_train_network_comfy.py", line 220, in cache_text_encoder_outputs_if_needed
text_encoders[0].to(accelerator.device, dtype=weight_dtype) # always not fp8
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2905, in to
return super().to(*args, **kwargs)
(snip torch internal calls)

image

@kijai
Copy link
Owner

kijai commented Oct 6, 2024

Meta tensor error like that means it's trying to move empty model, so if it's when it's trying to move text encoder, then there's something wrong with the text encoder files.

@ZGpork
Copy link

ZGpork commented Oct 12, 2024

Have the same problem, I set this all up just for nothing now ;-(

edit: ah, it just does not seem to work with the de-destilled model which I wanted to try

@annedaphne
Copy link

Have the same problem, I set this all up just for nothing now ;-(

edit: ah, it just does not seem to work with the de-destilled model which I wanted to try

same problem

@EnragedAntelope
Copy link

I am not using a de-distilled version. I have python 3.11.9, torch 2.4.1+cu124 .
I am receiving the same error. Any fix?

import network module: .networks.lora_flux
                    ERROR    !!! Exception during processing !!! Cannot copy out of meta tensor; no data! Please use execution.py:392
                             torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta
                             to a different device.
                    ERROR    Traceback (most recent call last):                                                      execution.py:393
                               File "D:\ComfyUI\execution.py", line 323, in execute
                                 output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all,
                             execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\execution.py", line 198, in get_output_data
                                 return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION,
                             allow_interrupt=True, execution_block_cb=execution_block_cb,
                             pre_execute_cb=pre_execute_cb)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\execution.py", line 169, in _map_node_over_list
                                 process_inputs(input_dict, i)
                               File "D:\ComfyUI\execution.py", line 158, in process_inputs
                                 results.append(getattr(obj, func)(**inputs))
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\nodes.py", line 523, in
                             init_training
                                 training_loop = network_trainer.init_train(args)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\train_network.py", line 390, in
                             init_train
                                 vae.to(accelerator.device, dtype=vae_dtype)
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to
                                 return self._apply(convert)
                                        ^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in
                             _apply
                                 module._apply(fn)
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in
                             _apply
                                 module._apply(fn)
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in
                             _apply
                                 module._apply(fn)
                               [Previous line repeated 3 more times]
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 805, in
                             _apply
                                 param_applied = fn(param)
                                                 ^^^^^^^^^
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1167, in
                             convert
                                 raise NotImplementedError(
                             NotImplementedError: Cannot copy out of meta tensor; no data! Please use
                             torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta
                             to a different device.

end_vram - start_vram: 8234228418 - 1732385986 = 6501842432
#107 [InitFluxLoRATraining]: 10.85s - vram 6501842432b
                    INFO     Prompt executed in 10.89 seconds                                                             main.py:138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants