InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

dkhold · 2024-10-05T23:48:24Z

Hello,

Thanks for providing these nodes! I've used the reference workflow.json from the repository.

I successfully preprocessed the images (steps 1-4) but when actually starting the training, I get the error below.
I made sure to select the original ae.safetensors and flux-dev.safetensors. I also tried with flux-dev-fp8, same error.
I tried with and without highvram, with and without split_mode, with and without base_fp8.

For some reason, moving the text_encoders to the GPU is failing as can be seen below.

Problematic lines: here or here.

I do not believe this is because of lack of VRAM. Using nvtop I see I stay below 30% VRAM usage.
I downloaded the safetensors from HF to make sure they were the correct ones.

This is pytorch 2.3.1, cuda 12.1, cudnn8.

Thanks for your help!

Error occurred when executing InitFluxLoRATraining:

Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

File "/stable-diffusion/execution.py", line 316, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/stable-diffusion/execution.py", line 191, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/stable-diffusion/execution.py", line 168, in _map_node_over_list
process_inputs(input_dict, i)
File "/stable-diffusion/execution.py", line 157, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "/data/config/comfy/custom_nodes/ComfyUI-FluxTrainer/nodes.py", line 523, in init_training
training_loop = network_trainer.init_train(args)
File "/data/config/comfy/custom_nodes/ComfyUI-FluxTrainer/train_network.py", line 406, in init_train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File "/data/config/comfy/custom_nodes/ComfyUI-FluxTrainer/flux_train_network_comfy.py", line 220, in cache_text_encoder_outputs_if_needed
text_encoders[0].to(accelerator.device, dtype=weight_dtype) # always not fp8
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2905, in to
return super().to(*args, **kwargs)
(snip torch internal calls)

The text was updated successfully, but these errors were encountered:

kijai · 2024-10-06T00:14:56Z

Meta tensor error like that means it's trying to move empty model, so if it's when it's trying to move text encoder, then there's something wrong with the text encoder files.

ZGpork · 2024-10-12T14:26:16Z

Have the same problem, I set this all up just for nothing now ;-(

edit: ah, it just does not seem to work with the de-destilled model which I wanted to try

annedaphne · 2024-10-16T13:30:20Z

Have the same problem, I set this all up just for nothing now ;-(

edit: ah, it just does not seem to work with the de-destilled model which I wanted to try

same problem

EnragedAntelope · 2024-10-17T01:08:39Z

I am not using a de-distilled version. I have python 3.11.9, torch 2.4.1+cu124 .
I am receiving the same error. Any fix?

import network module: .networks.lora_flux
                    ERROR    !!! Exception during processing !!! Cannot copy out of meta tensor; no data! Please use execution.py:392
                             torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta
                             to a different device.
                    ERROR    Traceback (most recent call last):                                                      execution.py:393
                               File "D:\ComfyUI\execution.py", line 323, in execute
                                 output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all,
                             execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\execution.py", line 198, in get_output_data
                                 return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION,
                             allow_interrupt=True, execution_block_cb=execution_block_cb,
                             pre_execute_cb=pre_execute_cb)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\execution.py", line 169, in _map_node_over_list
                                 process_inputs(input_dict, i)
                               File "D:\ComfyUI\execution.py", line 158, in process_inputs
                                 results.append(getattr(obj, func)(**inputs))
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\nodes.py", line 523, in
                             init_training
                                 training_loop = network_trainer.init_train(args)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\train_network.py", line 390, in
                             init_train
                                 vae.to(accelerator.device, dtype=vae_dtype)
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to
                                 return self._apply(convert)
                                        ^^^^^^^^^^^^^^^^^^^^
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in
                             _apply
                                 module._apply(fn)
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in
                             _apply
                                 module._apply(fn)
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in
                             _apply
                                 module._apply(fn)
                               [Previous line repeated 3 more times]
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 805, in
                             _apply
                                 param_applied = fn(param)
                                                 ^^^^^^^^^
                               File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1167, in
                             convert
                                 raise NotImplementedError(
                             NotImplementedError: Cannot copy out of meta tensor; no data! Please use
                             torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta
                             to a different device.

end_vram - start_vram: 8234228418 - 1732385986 = 6501842432
#107 [InitFluxLoRATraining]: 10.85s - vram 6501842432b
                    INFO     Prompt executed in 10.89 seconds                                                             main.py:138

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

dkhold commented Oct 5, 2024 •

edited

Loading

kijai commented Oct 6, 2024

ZGpork commented Oct 12, 2024 •

edited

Loading

annedaphne commented Oct 16, 2024

EnragedAntelope commented Oct 17, 2024

InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

InitFluxLoRATraining: "Cannot copy out of meta tensor; no data!" #80

Comments

dkhold commented Oct 5, 2024 • edited Loading

kijai commented Oct 6, 2024

ZGpork commented Oct 12, 2024 • edited Loading

annedaphne commented Oct 16, 2024

EnragedAntelope commented Oct 17, 2024

dkhold commented Oct 5, 2024 •

edited

Loading

ZGpork commented Oct 12, 2024 •

edited

Loading