📰 News

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

If you like our project, please give us a star ⭐ on GitHub for latest update.

💡 I also have other projects that may interest you ✨.

Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin and Yunyang Ge and Xinhua Cheng and Zongjian Li and Bin Zhu and Shaodong Wang and Xianyi He and Yang Ye and Shenghai Yuan and Liuhan Chen and Tanghui Jia and Junwu Zhang and Zhenyu Tang and Yatian Pang and Bin She and Cen Yan and Zhiheng Hu and Xiaoyi Dong and Lin Chen and Zhang Pan and Xing Zhou and Shaoling Dong and Yonghong Tian and Li Yuan

📰 News

[2024.11.27] 🔥🔥🔥 We have published our report, which provides comprehensive training details and includes additional experiments.
[2024.11.25] 🔥🔥🔥 We have released our 16-channel WF-VAE-L model along with the training code. Welcome to download it from Huggingface.

😮 Highlights

WF-VAE utilizes a multi-level wavelet transform to construct an efficient energy pathway, enabling low-frequency information from video data to flow into latent representation. This method achieves competitive reconstruction performance while markedly reducing computational costs.

💡 Simpler Architecture, Faster Encoding

This architecture substantially improves speed and reduces training costs in large-scale video generation models and data processing workflows.

🔥 Competitive Reconstruction Performance with SOTA VAEs

Our experiments demonstrate competitive performance of our model against SOTA VAEs.

🚀 Main Results

Reconstruction

WF-VAE	CogVideoX

Efficiency

We conduct efficiency tests at 33-frame videos using float32 precision on an H100 GPU. All models operated without block-wise inference strategies. Our model demonstrated performance comparable to state-of-the-art VAEs while significantly reducing encoding costs.

🛠️ Requirements and Installation

git clone https://github.com/PKU-YuanGroup/WF-VAE
cd WF-VAE
conda create -n wfvae python=3.10 -y
conda activate wfvae
pip install -r requirements.txt

🤖 Reconstructing Video or Image

To reconstruct a video or an image, execute the following commands:

Video Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_video.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --video_path "Video Path" \
    --rec_path rec.mp4 \
    --device cuda \
    --sample_rate 1 \
    --num_frames 65 \
    --height 512 \
    --width 512 \
    --fps 30 \
    --enable_tiling

Image Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_image.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --image_path assets/gt_5544.jpg \
    --rec_path rec.jpg \
    --device cuda \
    --short_size 512

For further guidance, refer to the example scripts: examples/rec_single_video.sh and examples/rec_single_image.sh.

🗝️ Training & Validating

The training & validating instruction is in TRAIN_AND_VALIDATE.md.

👍 Acknowledgement

Open-Sora Plan - https://github.com/PKU-YuanGroup/Open-Sora-Plan
Allegro - https://github.com/rhymes-ai/Allegro
CogVideoX - https://github.com/THUDM/CogVideo
Stable Diffusion - https://github.com/CompVis/stable-diffusion

✏️ Citation

@misc{li2024wfvaeenhancingvideovae,
      title={WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model}, 
      author={Zongjian Li and Bin Lin and Yang Ye and Liuhan Chen and Xinhua Cheng and Shenghai Yuan and Li Yuan},
      year={2024},
      eprint={2411.17459},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17459}, 
}

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
causalvideovae		causalvideovae
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TRAIN_AND_VALIDATE.md		TRAIN_AND_VALIDATE.md
requirements.txt		requirements.txt
setup.py		setup.py
train_ddp.py		train_ddp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

If you like our project, please give us a star ⭐ on GitHub for latest update.

📰 News

😮 Highlights

💡 Simpler Architecture, Faster Encoding

🔥 Competitive Reconstruction Performance with SOTA VAEs

🚀 Main Results

Reconstruction

Efficiency

🛠️ Requirements and Installation

🤖 Reconstructing Video or Image

Video Reconstruction

Image Reconstruction

🗝️ Training & Validating

👍 Acknowledgement

✏️ Citation

🔒 License

About

Releases

Packages

Contributors 2

Languages

License

PKU-YuanGroup/WF-VAE

Folders and files

Latest commit

History

Repository files navigation

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

If you like our project, please give us a star ⭐ on GitHub for latest update.

📰 News

😮 Highlights

💡 Simpler Architecture, Faster Encoding

🔥 Competitive Reconstruction Performance with SOTA VAEs

🚀 Main Results

Reconstruction

Efficiency

🛠️ Requirements and Installation

🤖 Reconstructing Video or Image

Video Reconstruction

Image Reconstruction

🗝️ Training & Validating

👍 Acknowledgement

✏️ Citation

🔒 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages