Official Implementation of NeurIPS2024 Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
Zheng Zhan*, Yushu Wu*, Yifan Gong, Zichong Meng, Zhenglun Kong, Changdi Yang, Geng Yuan, Puzhao, Wei Nui, and Yanzhi Wang
Northeastern University, Harvard University, University of Georgia
38th Conference on Neural Information Processing Systems (NeurIPS 2024)
This repo contains simulation of Feature Slicer (Sec.4.1) and Operator Grouping (Sec.4.2) which can effectively reduce the memory-footprint of spatial-temporal model in inference.
NVIDIA A100-SXM4-80GB
NVIDIA A100-PCIE-40GB
NVIDIA A6000
- Stable Video Diffusion
- AnimateDiff
git clone https://github.com/wuyushuwys/FMEDiffusion
cd FMEDiffusion
# if you use conda
conda create -n fme python=3.10 -y
conda activate fme
pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121
pip install pynvml # for memory-footprint benchmark
pip install .
import torch
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_gif
# import our module wrapper
from fme import FMEWrapper
# load pipeline
pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid", torch_dtype=torch.float16, variant="fp16"
)
pipe.to('cuda')
# initialize wrapper
helper = FMEWrapper(num_temporal_chunk=7, num_spatial_chunk=7, num_frames=pipe.unet.config.num_frames)
# wrap pipeline
helper.wrap(pipe)
# Inference as normal
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))
generator = torch.manual_seed(42)
# no decode_chunk_size required!
frames = pipe(image, generator=generator).frames[0]
export_to_gif(frames, "generated_fme.gif", fps=7)
- In the case of SVD (
num_frames=14
, resolution=576x1024
), the original peak memory reported in the paper is 39.49 GB, which can be reduced to 23.42 GB using our proposed method. However, using the example, you may observe a peak memory of around 24.49 GB using our method, and note that the original peak memory could also rise to 40.39 GB. These values may differ slightly from those reported in the paper.