-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel Inference with xDit unsuccessful #129
Comments
We will check this issue ASAP. |
(HunyuanVideo) root@dd22:~/project/HunyuanVideo# torchrun --nproc_per_node=8 sample_video.py --video-size 1280 720 --video-length 129 --infer-steps 50 --prompt "A cat walks on the grass, realistic style." --flow-reverse --seed 42 --ulysses-degree 8 --ring-degree 1 --save-path ./results
|
torchrun --nproc_per_node=8 sample_video.py 是最新的版本哈 |
I'm facing same issue. I'm using g6.12xlarge instance on aws which has 4 L4 GPUs (each card is 24GB vram). The command I run is
I also tried 2x2 and 1x4, but getting out of memory error. |
I suppose you can not run it successfully with 1 GPU? Currently, the VRAM memory usage should be the same as a single GPU version. |
|
the same error with 8 x L20, but I can run successfully with single L20 |
@feifeibear thanks for your reply!This is my command script
and here is my log file |
@feifeibear thanks for your reply. I changed to g6e.12xlarge (4x 48GB), and while I'm able to run single GPU inference for 544 * 960, I'm unable to run parallel inference |
我是直接git clone https://github.com/tencent/HunyuanVideo, 读了一下代码。比较好奇的是parallel inference的实现。
##################这一行就是加载模型
....... 然后看HunyuanVideoSampler.from_pretrained()具体实现, def from_pretrained(cls, pretrained_model_path, args, device=None, **kwargs):
|
@jash101 Did you run single GPU inference with the --use-cpu-offload flag? as I'm not able to run single GPU inference when CPU offload is disabled. |
@xibosun yes, I used command in README for single GPU inference:
|
The OOM issue arises due to the absence of CPU offloading support in multi-GPU inference. So it's natural for multi-GPU inference to consume more GPU memory compared to single-GPU setups. Nevertheless, we are actively exploring alternative strategies such as FSDP to mitigate memory demands during multi-GPU inference. |
Thanks for pointing this out, yes, that makes sense. Tested without cpu offload on single GPU and it gives OOM error. |
Hello, I have a problem. I can't successfully run Parallel Inference in an environment equipped with 8 L40S GPU cards (each card having 48GB of VRAM). The run fails with a memory insufficient error on rank 0. However, single-card operation runs successfully, although it takes significantly longer.
The text was updated successfully, but these errors were encountered: