t2v CogVideoX1.5-5B OOM #540

LettleCreator · 2024-11-22T04:04:18Z

System Info / 系統信息

CUDA12.4
diffusers 0.32.0.dev0 (使用pi p install -e . 安装的最新的)
A100 40GB VRAM

运行CogVideoX1.5-5B-I2V进行I2V正常生成
运行CogVideoX1.5-5B进行T2V，总是OOM

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python inference/cli_demo.py --prompt="Two kittens lick each other's fur" --generate_type="t2v"

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 39.38 GiB of which 4.50 GiB is free. Process 168833 has 34.88 GiB memory in use. Of the allocated memory 31.49 GiB is allocated by PyTorch, and 2.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior / 期待表现

直接运行的官方的nference/cli_demo.py，无修改
里面是开启的
pipe.enable_sequential_cpu_offload() enabled.
依旧会OOM

zRzRzRzRzRzRzR · 2024-11-22T05:06:14Z

更新到最新的diffusers main分支

LettleCreator · 2024-11-22T06:05:31Z

更新到最新的diffusers main分支

已经更新到最新的diffusers main，还是存在OOM

LettleCreator · 2024-11-22T06:10:43Z

(cogvideo) root@autodl-container-cd46119efa-b92bcf86:/autodl-tmp/CogVideo# cd diffusers/
(cogvideo) root@autodl-container-cd46119efa-b92bcf86:/autodl-tmp/CogVideo/diffusers# git checkout main
Already on 'main'
Your branch is up to date with 'origin/main'.
(cogvideo) root@autodl-container-cd46119efa-b92bcf86:/autodl-tmp/CogVideo/diffusers# git pull
Already up to date.
(cogvideo) root@autodl-container-cd46119efa-b92bcf86:/autodl-tmp/CogVideo/diffusers# pip install -e .

Successfully built diffusers
Installing collected packages: diffusers
Attempting uninstall: diffusers
Found existing installation: diffusers 0.32.0.dev0
Uninstalling diffusers-0.32.0.dev0:
Successfully uninstalled diffusers-0.32.0.dev0
Successfully installed diffusers-0.32.0.dev0

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
python inference/cli_demo.py --prompt="Two kittens lick each other's fur"
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:32<00:00, 8.05s/it]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:33<00:00, 6.77s/it]
0%| | 0/50 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/CogVideo/inference/cli_demo.py", line 179, in
generate_video(
File "/root/autodl-tmp/CogVideo/inference/cli_demo.py", line 128, in generate_video
video_generate = pipe(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 710, in call
noise_pred = self.transformer(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 503, in forward
hidden_states, encoder_hidden_states = block(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
attn_hidden_states, attn_encoder_hidden_states = self.attn1(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/attention_processor.py", line 530, in forward
return self.processor(
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/attention_processor.py", line 2295, in call
key[:, :, text_seq_length:] = apply_rotary_emb(key[:, :, text_seq_length:], image_rotary_emb)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/embeddings.py", line 816, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 39.38 GiB of which 4.50 GiB is free. Process 225552 has 34.88 GiB memory in use. Of the allocated memory 31.49 GiB is allocated by PyTorch, and 2.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

eryueweiyu · 2024-11-22T09:19:53Z

diffusers 0.32.0.dev0
torch 2.5.1+cu124
torchaudio 2.5.1+cu124
torchvision 0.20.1+cu124
transformers 4.46.3
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 11.99 GiB of which 0 bytes is free. Of the allocated memory 26.15 GiB is allocated by PyTorch, and 153.84 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

umarkhalidAI · 2024-11-26T15:57:09Z

Was anyone able to solve this issue?

jt-zhang · 2024-11-29T08:10:35Z

Hope there could be a solution. Thank you.

ntrvideo · 2024-11-30T01:46:03Z

Thank you for your hard work, zRzRzRzRzRzRzR.

The cogvideo1.5 t2v worked, so I'm sharing the method. A 1360x768 resolution video was generated in my environment.

model_id = "THUDM/CogVideoX1.5-5B" # v1.5

import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel,  CogVideoXPipeline
from diffusers.utils import export_to_video
from transformers import T5EncoderModel


prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."

width = 1360//8
height = 768//8

transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer",
    sample_height= height,
    sample_width= width,
                                                          torch_dtype=torch.bfloat16)

pipe = CogVideoXPipeline.from_pretrained(
    model_id,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video = pipe(
    prompt=prompt,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=81,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)

I haven't tried inference/cli_demo.py yet, but I think it will be easier to modify CogVideoX1.5-5B/transformer/config.json and use cli_demo.py.

zRzRzRzRzRzRzR self-assigned this Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t2v CogVideoX1.5-5B OOM #540

t2v CogVideoX1.5-5B OOM #540

LettleCreator commented Nov 22, 2024

zRzRzRzRzRzRzR commented Nov 22, 2024

LettleCreator commented Nov 22, 2024

LettleCreator commented Nov 22, 2024

eryueweiyu commented Nov 22, 2024

umarkhalidAI commented Nov 26, 2024

jt-zhang commented Nov 29, 2024

ntrvideo commented Nov 30, 2024

t2v CogVideoX1.5-5B OOM #540

t2v CogVideoX1.5-5B OOM #540

Comments

LettleCreator commented Nov 22, 2024

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

zRzRzRzRzRzRzR commented Nov 22, 2024

LettleCreator commented Nov 22, 2024

LettleCreator commented Nov 22, 2024

eryueweiyu commented Nov 22, 2024

umarkhalidAI commented Nov 26, 2024

jt-zhang commented Nov 29, 2024

ntrvideo commented Nov 30, 2024