-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CogX fails on MacOS requesting a 10TB buffer. #9972
Comments
I am not sure if this is a |
I have no idea if values of that shape going into |
Your initial error logs suggest that it gets stuck at |
That is definitely way too big. Could you explicitly specify height=768 and width=1360? If you don't, the sample_height and sample_width from transformer config are used to calculate the defaults (but also required to figure out RoPE dimensions of 300x300 correctly) which won't work as expected giving you 2400x2400 resolution |
We have something planned that should hopefully reduce memory requirements on Mac, and other devices, coming very soon, that should also be easy to use API-wise. Would really be awesome if you would like to help us test it (I can ping you when the PR is out). cc @DN6 as Mac devices are good potential candidate for testing out our SplitInferenceModule hooks |
Thats an improvementm but it still wants a buffer that 364Gb
|
This is actually a well known problem on Mac devices. mps lacks efficient kernel implementations for many many different things. Until the PR I mentioned above is out, I'm unsure if there would be any possibility to easily make this run on Macs. For now, maybe you could run the 1.0 versions at 720 x 480 x 49, which should further lower the buffer size allocation. I hope I'm not bothering you with the technical details too much, but you can significantly reduce the memory usage if you use a wrapper class to chunk the inference across batch_size and num_heads dimensions. This can serve as a useful example of that: https://github.com/huggingface/diffusers/blame/f6f7afa1d7c6f45f8568c5603b1e6300d4583f04/src/diffusers/pipelines/free_noise_utils.py#L37. I will try to get the easy to use API in asap so that the technical details can be ignored for end-users and it "just works" |
Describe the bug
Tried to run the THUDM/CogVideoX1.5-5B model using Diffusers from git (20th Nov, approx 8:30am GMT)
The script failed with
While these are big models, I suspect that 10TB of Ram is not being used by the CUDA users out there :-)
Reproduction
Logs
The text was updated successfully, but these errors were encountered: