๐ก I also have other projects that may interest you โจ.
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin and Yunyang Ge and Xinhua Cheng and Zongjian Li and Bin Zhu and Shaodong Wang and Xianyi He and Yang Ye and Shenghai Yuan and Liuhan Chen and Tanghui Jia and Junwu Zhang and Zhenyu Tang and Yatian Pang and Bin She and Cen Yan and Zhiheng Hu and Xiaoyi Dong and Lin Chen and Zhang Pan and Xing Zhou and Shaoling Dong and Yonghong Tian and Li Yuan
- [2024.11.27] ๐ฅ๐ฅ๐ฅ We have published our report, which provides comprehensive training details and includes additional experiments.
- [2024.11.25] ๐ฅ๐ฅ๐ฅ We have released our 16-channel WF-VAE-L model along with the training code. Welcome to download it from Huggingface.
WF-VAE utilizes a multi-level wavelet transform to construct an efficient energy pathway, enabling low-frequency information from video data to flow into latent representation. This method achieves competitive reconstruction performance while markedly reducing computational costs.
- This architecture substantially improves speed and reduces training costs in large-scale video generation models and data processing workflows.
- Our experiments demonstrate competitive performance of our model against SOTA VAEs.
WF-VAE | CogVideoX |
---|---|
We conduct efficiency tests at 33-frame videos using float32 precision on an H100 GPU. All models operated without block-wise inference strategies. Our model demonstrated performance comparable to state-of-the-art VAEs while significantly reducing encoding costs.
git clone https://github.com/PKU-YuanGroup/WF-VAE
cd WF-VAE
conda create -n wfvae python=3.10 -y
conda activate wfvae
pip install -r requirements.txt
To reconstruct a video or an image, execute the following commands:
CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_video.py \
--model_name WFVAE \
--from_pretrained "Your VAE" \
--video_path "Video Path" \
--rec_path rec.mp4 \
--device cuda \
--sample_rate 1 \
--num_frames 65 \
--height 512 \
--width 512 \
--fps 30 \
--enable_tiling
CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_image.py \
--model_name WFVAE \
--from_pretrained "Your VAE" \
--image_path assets/gt_5544.jpg \
--rec_path rec.jpg \
--device cuda \
--short_size 512
For further guidance, refer to the example scripts: examples/rec_single_video.sh
and examples/rec_single_image.sh
.
The training & validating instruction is in TRAIN_AND_VALIDATE.md.
- Open-Sora Plan - https://github.com/PKU-YuanGroup/Open-Sora-Plan
- Allegro - https://github.com/rhymes-ai/Allegro
- CogVideoX - https://github.com/THUDM/CogVideo
- Stable Diffusion - https://github.com/CompVis/stable-diffusion
@misc{li2024wfvaeenhancingvideovae,
title={WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model},
author={Zongjian Li and Bin Lin and Yang Ye and Liuhan Chen and Xinhua Cheng and Shenghai Yuan and Li Yuan},
year={2024},
eprint={2411.17459},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17459},
}
This project is released under the Apache 2.0 license as found in the LICENSE file.