TLDR: DiffCollage, a scalable probabilistic model that synthesizes large content in parallel, including long images, looped motions, and 360 images, with diffusion models only trained on pieces of the content.
Diffusion models and notations in this code base follows EDM.
$\sigma = t$ - Diffusion forward process follows
$x_\sigma = x_0 + \sigma \epsilon$ - Diffusion training objective
$|x_0 - x_\theta(x_0 + \sigma \epsilon, \sigma)|^2$ (data prediction model) or$|\epsilon - \epsilon_\theta(x_0 + \sigma \epsilon, \sigma)|^2$ (noise prediction model) - Conversion between data prediction model and noise prediction model:
$\epsilon_\theta(x_\sigma, \sigma) = \frac{x_\sigma - x_\theta(x_0 + \sigma \epsilon, \sigma)}{\sigma} $
Please be aware of the following points when using this software:
-
Model Conversion:
- If your model has been trained using methods other than EDM, you may need to convert it to EDM using the change-of-variable method.
-
Sampling hyperparameters
- We find stochastic sampling algorithms performance way more better than deterministic sampling algorithm when sampling step is high.
import diff_collage as dc
def test(eps_fn, s_churn=10.0):
n_step = 40 # sampling step
overlap_size = 32 # how much overlap
num_img = 11 # how many square images
batch_size = 5
ts_order = 5 # sampling timestamp schedule
img_shape = (3, 64, 64) # image shape
# sampling with conditional independence assumption
worker = dc.condind_long.CondIndLong(img_shape, eps_fn, num_img, overlap_size=overlap_size)
sample = dc.sampling(
x = worker.generate_xT(batch_size),
noise_fn = worker.noise,
rev_ts = worker.rev_ts(n_step, ts_order),
x0_pred_fn = worker.x0_fn,
s_churn = s_churn,
is_traj = False # return sampling traj or not
)
# sampling with average noise method
worker = dc.AvgLong(img_shape, eps_fn, num_img, overlap_size=overlap_size)
sample = dc.sampling(
x = worker.generate_xT(batch_size),
noise_fn = worker.noise,
rev_ts = worker.rev_ts(n_step, ts_order),
x0_pred_fn = worker.x0_fn,
s_churn = s_churn,
is_traj = False # return sampling traj or not
)
The repo provides demo code for looped motion generation based on Human Motion Diffusion Model pretrained models
sudo apt update
sudo apt install ffmpeg
conda env create -f environment.yml
conda activate mdm
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
cd save
gdown "https://drive.google.com/u/0/uc?id=1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821&export=download&confirm=t"
unzip humanml_trans_enc_512.zip
cd ..
# sanity check
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."
@inproceedings{zhange2023diffcollage,
title={DiffCollage: Parallel Generation of Large Content with Diffusion Models},
author={Qinsheng Zhang and Jiaming Song and Xun Huang and Yongxin Chen and Ming-yu Liu},
booktitle={CVPR},
year={2023}
}