CogX fails on MacOS requesting a 10TB buffer. #9972

Vargol · 2024-11-20T09:04:39Z

Describe the bug

Tried to run the THUDM/CogVideoX1.5-5B model using Diffusers from git (20th Nov, approx 8:30am GMT)
The script failed with

    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 10973.48 GB

While these are big models, I suspect that 10TB of Ram is not being used by the CUDA users out there :-)

Reproduction

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

torch.mps.set_per_process_memory_fraction(0.0)

prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX1.5-5B",
    torch_dtype=torch.bfloat16
).to("mps")


#pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video = pipe(
    prompt=prompt,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=81,
    guidance_scale=6,
    generator=torch.Generator(device="mps").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)

Logs

The full output was

$ python cogx.py 
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:37<00:00,  9.25s/it]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 5/5 [00:39<00:00,  7.91s/it]
  0%|                                                                                            | 0/50 [00:18<?, ?it/s]
Traceback (most recent call last):
  File "/Volumes/SSD2TB/AI/cog/cogx.py", line 19, in <module>
    video = pipe(
            ^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 710, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 503, in forward
    hidden_states, encoder_hidden_states = block(
                                           ^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
                                                     ^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 530, in forward
    return self.processor(
           ^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 2297, in __call__
    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 10973.48 GB



### System Info

- 🤗 Diffusers version: 0.32.0.dev0
- Platform: macOS-15.1.1-arm64-arm-64bit
- Running on Google Colab?: No
- Python version: 3.11.10
- PyTorch version (GPU?): 2.6.0.dev20241115 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.46.2
- Accelerate version: 1.1.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: Apple M3
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No


### Who can help?

@pcuenca

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-11-20T13:02:50Z

I am not sure if this is a diffusers-specific problem, though. My instincts tell me that if you generate random tensors matching what CogVideoX has and do a computation with F.scaled_dot_product_attention() it would fail.

Vargol · 2024-11-20T13:48:57Z

Loading pipeline components...: 100%|██████████████████████| 5/5 [00:38<00:00,  7.71s/it]
  0%|                                                                                            | 0/50 [00:00<?, ?it/s]
QUERY: torch.Size([2, 48, 247726, 64])
KEY: torch.Size([2, 48, 247726, 64])
VALUE: torch.Size([2, 48, 247726, 64])
ATTENTION_MASK: None
  0%|                                                                                            | 0/50 [00:12<?, ?it/s]
Traceback (most recent call last):
  File "/Volumes/SSD2TB/AI/cog/cogx.py", line 19, in <module>
    video = pipe(

I have no idea if values of that shape going into F.scaled... make sense.

sayakpaul · 2024-11-20T13:53:10Z

Your initial error logs suggest that it gets stuck at F.scaled_dot_product_attention().

a-r-r-o-w · 2024-11-20T13:54:53Z

That is definitely way too big. Could you explicitly specify height=768 and width=1360? If you don't, the sample_height and sample_width from transformer config are used to calculate the defaults (but also required to figure out RoPE dimensions of 300x300 correctly) which won't work as expected giving you 2400x2400 resolution

a-r-r-o-w · 2024-11-20T14:08:05Z

We have something planned that should hopefully reduce memory requirements on Mac, and other devices, coming very soon, that should also be easy to use API-wise. Would really be awesome if you would like to help us test it (I can ping you when the PR is out).

cc @DN6 as Mac devices are good potential candidate for testing out our SplitInferenceModule hooks

Vargol · 2024-11-20T14:08:34Z

Thats an improvementm but it still wants a buffer that 364Gb

python cogx.py 
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████Loading pipeline components...: 100%|███████████████████████████████| 5/5 [00:35<00:00,  7.18s/it]
  0%|                                                                                            | 0/50 [00:00<?, ?it/s]
QUERY: torch.Size([2, 48, 45106, 64])
KEY: torch.Size([2, 48, 45106, 64])
VALUE: torch.Size([2, 48, 45106, 64])
ATTENTION_MASK: None
  0%|                                                                                            | 0/50 [00:02<?, ?it/s]
Traceback (most recent call last):

...

  File "/Volumes/SSD2TB/AI/cog/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 2302, in __call__
    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 363.81 GB

a-r-r-o-w · 2024-11-20T14:19:52Z

This is actually a well known problem on Mac devices. mps lacks efficient kernel implementations for many many different things.

Until the PR I mentioned above is out, I'm unsure if there would be any possibility to easily make this run on Macs. For now, maybe you could run the 1.0 versions at 720 x 480 x 49, which should further lower the buffer size allocation.

I hope I'm not bothering you with the technical details too much, but you can significantly reduce the memory usage if you use a wrapper class to chunk the inference across batch_size and num_heads dimensions. This can serve as a useful example of that: https://github.com/huggingface/diffusers/blame/f6f7afa1d7c6f45f8568c5603b1e6300d4583f04/src/diffusers/pipelines/free_noise_utils.py#L37. I will try to get the easy to use API in asap so that the technical details can be ignored for end-users and it "just works"

Vargol added the bug Something isn't working label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CogX fails on MacOS requesting a 10TB buffer. #9972

CogX fails on MacOS requesting a 10TB buffer. #9972

Vargol commented Nov 20, 2024

sayakpaul commented Nov 20, 2024

Vargol commented Nov 20, 2024

sayakpaul commented Nov 20, 2024

a-r-r-o-w commented Nov 20, 2024 •

edited

Loading

a-r-r-o-w commented Nov 20, 2024

Vargol commented Nov 20, 2024 •

edited

Loading

a-r-r-o-w commented Nov 20, 2024

CogX fails on MacOS requesting a 10TB buffer. #9972

CogX fails on MacOS requesting a 10TB buffer. #9972

Comments

Vargol commented Nov 20, 2024

Describe the bug

Reproduction

Logs

sayakpaul commented Nov 20, 2024

Vargol commented Nov 20, 2024

sayakpaul commented Nov 20, 2024

a-r-r-o-w commented Nov 20, 2024 • edited Loading

a-r-r-o-w commented Nov 20, 2024

Vargol commented Nov 20, 2024 • edited Loading

a-r-r-o-w commented Nov 20, 2024

a-r-r-o-w commented Nov 20, 2024 •

edited

Loading

Vargol commented Nov 20, 2024 •

edited

Loading