Skip to content

Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

License

Notifications You must be signed in to change notification settings

PKU-YuanGroup/WF-VAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

21 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

If you like our project, please give us a star โญ on GitHub for latest update.

hf arXiv License Hits GitHub repo stars

๐Ÿ’ก I also have other projects that may interest you โœจ.

Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin and Yunyang Ge and Xinhua Cheng and Zongjian Li and Bin Zhu and Shaodong Wang and Xianyi He and Yang Ye and Shenghai Yuan and Liuhan Chen and Tanghui Jia and Junwu Zhang and Zhenyu Tang and Yatian Pang and Bin She and Cen Yan and Zhiheng Hu and Xiaoyi Dong and Lin Chen and Zhang Pan and Xing Zhou and Shaoling Dong and Yonghong Tian and Li Yuan
github github arXiv

๐Ÿ“ฐ News

  • [2024.11.27] ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ We have published our report, which provides comprehensive training details and includes additional experiments.
  • [2024.11.25] ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ We have released our 16-channel WF-VAE-L model along with the training code. Welcome to download it from Huggingface.

๐Ÿ˜ฎ Highlights

WF-VAE utilizes a multi-level wavelet transform to construct an efficient energy pathway, enabling low-frequency information from video data to flow into latent representation. This method achieves competitive reconstruction performance while markedly reducing computational costs.

๐Ÿ’ก Simpler Architecture, Faster Encoding

  • This architecture substantially improves speed and reduces training costs in large-scale video generation models and data processing workflows.

๐Ÿ”ฅ Competitive Reconstruction Performance with SOTA VAEs

  • Our experiments demonstrate competitive performance of our model against SOTA VAEs.

๐Ÿš€ Main Results

Reconstruction

WF-VAE CogVideoX
WF-VAE CogVideoX

Efficiency

We conduct efficiency tests at 33-frame videos using float32 precision on an H100 GPU. All models operated without block-wise inference strategies. Our model demonstrated performance comparable to state-of-the-art VAEs while significantly reducing encoding costs.

๐Ÿ› ๏ธ Requirements and Installation

git clone https://github.com/PKU-YuanGroup/WF-VAE
cd WF-VAE
conda create -n wfvae python=3.10 -y
conda activate wfvae
pip install -r requirements.txt

๐Ÿค– Reconstructing Video or Image

To reconstruct a video or an image, execute the following commands:

Video Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_video.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --video_path "Video Path" \
    --rec_path rec.mp4 \
    --device cuda \
    --sample_rate 1 \
    --num_frames 65 \
    --height 512 \
    --width 512 \
    --fps 30 \
    --enable_tiling

Image Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_image.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --image_path assets/gt_5544.jpg \
    --rec_path rec.jpg \
    --device cuda \
    --short_size 512 

For further guidance, refer to the example scripts: examples/rec_single_video.sh and examples/rec_single_image.sh.

๐Ÿ—๏ธ Training & Validating

The training & validating instruction is in TRAIN_AND_VALIDATE.md.

๐Ÿ‘ Acknowledgement

โœ๏ธ Citation

@misc{li2024wfvaeenhancingvideovae,
      title={WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model}, 
      author={Zongjian Li and Bin Lin and Yang Ye and Liuhan Chen and Xinhua Cheng and Shenghai Yuan and Li Yuan},
      year={2024},
      eprint={2411.17459},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17459}, 
}

๐Ÿ”’ License

This project is released under the Apache 2.0 license as found in the LICENSE file.

About

Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages