You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
or disable pipe.vae.enable_tiling(), there is an error:
Traceback (most recent call last):
File "/home/tiger/code/run.py", line 16, in <module>
video = pipe(
File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 776, in __call__
latents, image_latents = self.prepare_latents(
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 381, in prepare_latents
image_latents = [retrieve_latents(self.vae.encode(img.unsqueeze(0)), generator) for img in image]
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 381, in <listcomp>
image_latents = [retrieve_latents(self.vae.encode(img.unsqueeze(0)), generator) for img in image]
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1232, in encode
h = self._encode(x)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1204, in _encode
x_intermediate, conv_cache = self.encoder(x_intermediate, conv_cache=conv_cache)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 807, in forward
hidden_states, new_conv_cache[conv_cache_key] = down_block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 439, in forward
hidden_states, new_conv_cache[conv_cache_key] = resnet(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 304, in forward
hidden_states, new_conv_cache["conv1"] = self.conv1(hidden_states, conv_cache=conv_cache.get("conv1"))
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 144, in forward
output = self.conv(inputs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 62, in forward
output_chunks.append(super().forward(input_chunk))
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 610, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/conv.py", line 605, in _conv_forward
return F.conv3d(
RuntimeError: Calculated padded input size per channel: (1 x 2402 x 2402). Kernel size: (3 x 3 x 3). Kernel size can't be greater than actual input size
If I turn off cpu_offload() or pipe.vae.enable_slicing(), the code can run successfully, about 2h35m to generate a video
import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
prompt = "A little girl is riding a bicycle at high speed. Focused, detailed, realistic."
image = load_image(image="image.webp") # 1024x1024
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
"THUDM/CogVideoX1.5-5B-I2V",
torch_dtype=torch.bfloat16,
).to("cuda")
# pipe.enable_sequential_cpu_offload()
# pipe.vae.enable_tiling()
# pipe.vae.enable_slicing()
video = pipe(
prompt=prompt,
image=image,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]
export_to_video(video, "output.mp4", fps=8)
Expected behavior / 期待表现
Hopefully we can disable pipe.vae.enable_tiling() and run successfully
The text was updated successfully, but these errors were encountered:
liming-ai
changed the title
pipe.vae.enable_tiling leads to RuntimeError: Calculated padded input size per channel
Disable pipe.vae.enable_tiling leads to RuntimeError: Calculated padded input size per channelNov 28, 2024
System Info / 系統信息
Torch: 2.1.0
CUDA: 12.2
diffusers: 0.32.0.dev0
Information / 问题信息
Reproduction / 复现过程
Thanks for your contributions and efforts!
I am using a single H100 to run inference, when I turn off all the diffusers optimization:
or disable
pipe.vae.enable_tiling()
, there is an error:If I turn off
cpu_offload()
orpipe.vae.enable_slicing()
, the code can run successfully, about 2h35m to generate a videoThe full code is here:
Expected behavior / 期待表现
Hopefully we can disable
pipe.vae.enable_tiling()
and run successfullyThe text was updated successfully, but these errors were encountered: