Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t2v CogVideoX1.5-5B OOM #540

Open
1 of 2 tasks
LettleCreator opened this issue Nov 22, 2024 · 7 comments
Open
1 of 2 tasks

t2v CogVideoX1.5-5B OOM #540

LettleCreator opened this issue Nov 22, 2024 · 7 comments
Assignees

Comments

@LettleCreator
Copy link

System Info / 系統信息

CUDA12.4
diffusers 0.32.0.dev0 (使用pi p install -e . 安装的最新的)
A100 40GB VRAM

运行CogVideoX1.5-5B-I2V进行I2V正常生成
运行CogVideoX1.5-5B进行T2V,总是OOM

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python inference/cli_demo.py --prompt="Two kittens lick each other's fur" --generate_type="t2v"

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 39.38 GiB of which 4.50 GiB is free. Process 168833 has 34.88 GiB memory in use. Of the allocated memory 31.49 GiB is allocated by PyTorch, and 2.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior / 期待表现

直接运行的官方的nference/cli_demo.py,无修改
里面是开启的
pipe.enable_sequential_cpu_offload() enabled.
依旧会OOM

@zRzRzRzRzRzRzR
Copy link
Member

更新到最新的diffusers main分支

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Nov 22, 2024
@LettleCreator
Copy link
Author

更新到最新的diffusers main分支

已经更新到最新的diffusers main,还是存在OOM

@LettleCreator
Copy link
Author

(cogvideo) root@autodl-container-cd46119efa-b92bcf86:/autodl-tmp/CogVideo# cd diffusers/
(cogvideo) root@autodl-container-cd46119efa-b92bcf86:
/autodl-tmp/CogVideo/diffusers# git checkout main
Already on 'main'
Your branch is up to date with 'origin/main'.
(cogvideo) root@autodl-container-cd46119efa-b92bcf86:/autodl-tmp/CogVideo/diffusers# git pull
Already up to date.
(cogvideo) root@autodl-container-cd46119efa-b92bcf86:
/autodl-tmp/CogVideo/diffusers# pip install -e .

Successfully built diffusers
Installing collected packages: diffusers
Attempting uninstall: diffusers
Found existing installation: diffusers 0.32.0.dev0
Uninstalling diffusers-0.32.0.dev0:
Successfully uninstalled diffusers-0.32.0.dev0
Successfully installed diffusers-0.32.0.dev0

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
python inference/cli_demo.py --prompt="Two kittens lick each other's fur"
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:32<00:00, 8.05s/it]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:33<00:00, 6.77s/it]
0%| | 0/50 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/CogVideo/inference/cli_demo.py", line 179, in
generate_video(
File "/root/autodl-tmp/CogVideo/inference/cli_demo.py", line 128, in generate_video
video_generate = pipe(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 710, in call
noise_pred = self.transformer(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 503, in forward
hidden_states, encoder_hidden_states = block(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
attn_hidden_states, attn_encoder_hidden_states = self.attn1(
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/attention_processor.py", line 530, in forward
return self.processor(
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/attention_processor.py", line 2295, in call
key[:, :, text_seq_length:] = apply_rotary_emb(key[:, :, text_seq_length:], image_rotary_emb)
File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/embeddings.py", line 816, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 39.38 GiB of which 4.50 GiB is free. Process 225552 has 34.88 GiB memory in use. Of the allocated memory 31.49 GiB is allocated by PyTorch, and 2.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@eryueweiyu
Copy link

diffusers 0.32.0.dev0
torch 2.5.1+cu124
torchaudio 2.5.1+cu124
torchvision 0.20.1+cu124
transformers 4.46.3
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 11.99 GiB of which 0 bytes is free. Of the allocated memory 26.15 GiB is allocated by PyTorch, and 153.84 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@umarkhalidAI
Copy link

Was anyone able to solve this issue?

@jt-zhang
Copy link

Hope there could be a solution. Thank you.

@ntrvideo
Copy link

Thank you for your hard work, zRzRzRzRzRzRzR.

The cogvideo1.5 t2v worked, so I'm sharing the method. A 1360x768 resolution video was generated in my environment.

model_id = "THUDM/CogVideoX1.5-5B" # v1.5

import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel,  CogVideoXPipeline
from diffusers.utils import export_to_video
from transformers import T5EncoderModel


prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."

width = 1360//8
height = 768//8

transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer",
    sample_height= height,
    sample_width= width,
                                                          torch_dtype=torch.bfloat16)

pipe = CogVideoXPipeline.from_pretrained(
    model_id,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video = pipe(
    prompt=prompt,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=81,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)

I haven't tried inference/cli_demo.py yet, but I think it will be easier to modify CogVideoX1.5-5B/transformer/config.json and use cli_demo.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants