降低显存的方法 (供参考) #45

peizhiluo007 · 2024-06-15T03:46:21Z

在下面函数(ComfyUI-IDM-VTON\src\nodes\pipeline_loader.py)，
def load_pipeline(self, weight_dtype):

修改两点：
修改1> 把所有的 .to(DEVICE) ，全部注释掉，所有的。

修改2> 函数结尾处
修改前:
pipe.unet_encoder = unet_encoder
pipe = pipe.to(DEVICE)
pipe.weight_dtype = weight_dtype
修改为:

在显卡12G测试，完全无压力。查看显存占用大概6G多点，估计在8G下也能跑。

peizhiluo007 · 2024-06-15T03:57:22Z

def load_pipeline(self, weight_dtype):
    if weight_dtype == "float32":
        weight_dtype = torch.float32
    elif weight_dtype == "float16":
        weight_dtype = torch.float16
    elif weight_dtype == "bfloat16":
        weight_dtype = torch.bfloat16
    noise_scheduler = DDPMScheduler.from_pretrained(
        WEIGHTS_PATH, 
        subfolder="scheduler"
    )
    vae = AutoencoderKL.from_pretrained(
        WEIGHTS_PATH,
        subfolder="vae",
        torch_dtype=weight_dtype
    ).requires_grad_(False).eval()#.to(DEVICE)
    unet = UNet2DConditionModel.from_pretrained(
        WEIGHTS_PATH,
        subfolder="unet",
        torch_dtype=weight_dtype
    ).requires_grad_(False).eval()#.to(DEVICE)
    image_encoder = CLIPVisionModelWithProjection.from_pretrained(
        WEIGHTS_PATH,
        subfolder="image_encoder",
        torch_dtype=weight_dtype
    ).requires_grad_(False).eval()#.to(DEVICE)
    unet_encoder = UNet2DConditionModel_ref.from_pretrained(
        WEIGHTS_PATH,
        subfolder="unet_encoder",
        torch_dtype=weight_dtype
    ).requires_grad_(False).eval()#.to(DEVICE)
    text_encoder_one = CLIPTextModel.from_pretrained(
        WEIGHTS_PATH,
        subfolder="text_encoder",
        torch_dtype=weight_dtype
    ).requires_grad_(False).eval()#.to(DEVICE)
    text_encoder_two = CLIPTextModelWithProjection.from_pretrained(
        WEIGHTS_PATH,
        subfolder="text_encoder_2",
        torch_dtype=weight_dtype
    ).requires_grad_(False).eval()#.to(DEVICE)
    tokenizer_one = AutoTokenizer.from_pretrained(
        WEIGHTS_PATH,
        subfolder="tokenizer",
        revision=None,
        use_fast=False,
    )
    tokenizer_two = AutoTokenizer.from_pretrained(
        WEIGHTS_PATH,
        subfolder="tokenizer_2",
        revision=None,
        use_fast=False,
    )
    pipe = TryonPipeline.from_pretrained(
        WEIGHTS_PATH,
        unet=unet,
        vae=vae,
        feature_extractor=CLIPImageProcessor(),
        text_encoder=text_encoder_one,
        text_encoder_2=text_encoder_two,
        tokenizer=tokenizer_one,
        tokenizer_2=tokenizer_two,
        scheduler=noise_scheduler,
        image_encoder=image_encoder,
        torch_dtype=weight_dtype,
    )
    pipe.weight_dtype = weight_dtype
    pipe.unet_encoder = unet_encoder
    pipe.enable_sequential_cpu_offload()
    pipe.unet_encoder.to(DEVICE)
    #pipe.to(DEVICE)
    #
    return (pipe, )

TemryL · 2024-06-15T12:22:08Z

Wow that's awesome! Thanks! Could you open a PR with these changes?

lldacing · 2024-06-17T08:15:09Z

速度会变慢吗

qtmssa · 2024-06-20T08:24:36Z

Does this work?

qtmssa · 2024-06-20T09:49:00Z

My 4090 (with 24G vRAM) still OOM, :-(
Anybody help? :-)

deepfree2023 · 2024-06-21T11:24:54Z

Works without problem.

peizhiluo007 · 2024-06-22T06:16:24Z

Wow that's awesome! Thanks! Could you open a PR with these changes?

ok，i have submitted for review.
and i think It's better to add the lowvram option , so.

dachangqing · 2024-06-27T14:51:52Z

大佬，为什么我照你说的改了代码，还是出现内存不足的报错啊

925-Studio · 2024-07-01T05:10:25Z

Thanks for sharing this tip, it works fine.

zmwv823 · 2024-07-02T11:39:50Z

升级了diffusers版本到0.29.2，发现这个插件挂了，然后自己试着改了下，又发现你这有个时间换硬件的办法，就测试玩一下(4G显存太小，懒得跑)。
测试了确实可行，我这分阶段生成，先生成遮罩图，然后再跑换装。
换装过程显存占用大概在6G左右，最后解码飙到6.5G左右。

最好把pipe也放到inference里面去，方便控制是否卸载模型释放显存，他这pipe分开之后，del pipe失效了。

TemryL · 2024-07-02T13:46:36Z

Wow awesome, thank you so much for this finding! Could you create a PR for this?

Jeff-goal · 2024-09-05T01:33:28Z

升级以后这个方法报错，vton无法导入了，请问该如何修改？

zhucenichenghao · 2024-11-23T14:03:52Z

it works，老铁

TemryL linked a pull request Jun 18, 2024 that will close this issue

cpu_offload #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

降低显存的方法 (供参考) #45

降低显存的方法 (供参考) #45

peizhiluo007 commented Jun 15, 2024 •

edited

Loading

peizhiluo007 commented Jun 15, 2024 •

edited

Loading

TemryL commented Jun 15, 2024

lldacing commented Jun 17, 2024

qtmssa commented Jun 20, 2024

qtmssa commented Jun 20, 2024

deepfree2023 commented Jun 21, 2024

peizhiluo007 commented Jun 22, 2024 •

edited

Loading

dachangqing commented Jun 27, 2024

925-Studio commented Jul 1, 2024

zmwv823 commented Jul 2, 2024 •

edited

Loading

TemryL commented Jul 2, 2024

Jeff-goal commented Sep 5, 2024

zhucenichenghao commented Nov 23, 2024

降低显存的方法 (供参考) #45

降低显存的方法 (供参考) #45

Comments

peizhiluo007 commented Jun 15, 2024 • edited Loading

peizhiluo007 commented Jun 15, 2024 • edited Loading

TemryL commented Jun 15, 2024

lldacing commented Jun 17, 2024

qtmssa commented Jun 20, 2024

qtmssa commented Jun 20, 2024

deepfree2023 commented Jun 21, 2024

peizhiluo007 commented Jun 22, 2024 • edited Loading

dachangqing commented Jun 27, 2024

925-Studio commented Jul 1, 2024

zmwv823 commented Jul 2, 2024 • edited Loading

TemryL commented Jul 2, 2024

Jeff-goal commented Sep 5, 2024

zhucenichenghao commented Nov 23, 2024

peizhiluo007 commented Jun 15, 2024 •

edited

Loading

peizhiluo007 commented Jun 15, 2024 •

edited

Loading

peizhiluo007 commented Jun 22, 2024 •

edited

Loading

zmwv823 commented Jul 2, 2024 •

edited

Loading