How to use diffusers to train Cogvideox1.5-t2v? #563

NickPan7779 · 2024-11-29T09:05:41Z

System Info / 系統信息

python3.10
cuda12.1
diffusers:0.32.0.dev0
ubuntu:20.04
GPU:8xA800

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

代码是11月28号的main代码，通过diffusers微淘1.5t2v，finetune_single_rank.sh报错，错误如下，已修改height为768 width为1360 max_num_frames为81 fps为16。麻烦帮忙看下，感谢！

Traceback (most recent call last):
  File "/home/p00513699/CogVideo-1128-gpu/finetune/train_cogvideox_lora.py", line 1548, in <module>
    main(args)
  File "/home/p00513699/CogVideo-1128-gpu/finetune/train_cogvideox_lora.py", line 1366, in main
    model_output = transformer(
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/p00513699/CogVideo-1128-gpu/sat/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 476, in forward
    hidden_states = self.patch_embed(encoder_hidden_states, hidden_states)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/p00513699/CogVideo-1128-gpu/sat/diffusers/src/diffusers/models/embeddings.py", line 431, in forward
    image_embeds = image_embeds.reshape(
RuntimeError: shape '[1, 10, 2, 48, 2, 85, 2, 16]' is invalid for input of size 5483520

finetune_single_rank.sh

#!/bin/bash

export MODEL_PATH="CogVideoX1.5-5B"
export CACHE_PATH="~/.cache"
export DATASET_PATH="Disney-VideoGeneration-Dataset"
export OUTPUT_PATH="cogvideox1.5-lora-single-node"
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

# if you are not using wth 8 gus, change `accelerate_config_machine_single.yaml` num_processes as your gpu number
accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu \
  train_cogvideox_lora.py \
  --gradient_checkpointing \
  --pretrained_model_name_or_path $MODEL_PATH \
  --cache_dir $CACHE_PATH \
  --enable_tiling \
  --enable_slicing \
  --instance_data_root $DATASET_PATH \
  --caption_column prompt.txt \
  --video_column videos.txt \
  --validation_prompt "DISNEY A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions:::A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance" \
  --validation_prompt_separator ::: \
  --num_validation_videos 1 \
  --validation_epochs 100 \
  --seed 42 \
  --rank 128 \
  --lora_alpha 64 \
  --mixed_precision bf16 \
  --output_dir $OUTPUT_PATH \
  --height 768 \
  --width 1360 \
  --fps 16 \
  --max_num_frames 81 \
  --skip_frames_start 0 \
  --skip_frames_end 0 \
  --train_batch_size 1 \
  --num_train_epochs 30 \
  --checkpointing_steps 1000 \
  --gradient_accumulation_steps 1 \
  --learning_rate 1e-3 \
  --lr_scheduler cosine_with_restarts \
  --lr_warmup_steps 200 \
  --lr_num_cycles 1 \
  --enable_slicing \
  --enable_tiling \
  --gradient_checkpointing \
  --optimizer AdamW \
  --adam_beta1 0.9 \
  --adam_beta2 0.95 \
  --max_grad_norm 1.0 \
  --allow_tf32 \
  --report_to wandb

Expected behavior / 期待表现

使用diffusers微调1.5 t2v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use diffusers to train Cogvideox1.5-t2v? #563

How to use diffusers to train Cogvideox1.5-t2v? #563

NickPan7779 commented Nov 29, 2024

How to use diffusers to train Cogvideox1.5-t2v? #563

How to use diffusers to train Cogvideox1.5-t2v? #563

Comments

NickPan7779 commented Nov 29, 2024

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现