Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

求助!Qwen2VL进行lora微调后合并模型失败 #2495

Open
gxlover0625 opened this issue Nov 25, 2024 · 1 comment
Open

求助!Qwen2VL进行lora微调后合并模型失败 #2495

gxlover0625 opened this issue Nov 25, 2024 · 1 comment

Comments

@gxlover0625
Copy link

Describe the bug

step1-lora微调(正常)

Qwen2VL进行lora微调,微调过程正常,没有出现bug。微调的命令如下

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NPROC_PER_NODE=8 swift sft \
  --model_type qwen2-vl-7b-instruct \
  --model_id_or_path /home/llm/qwen/Qwen2-VL-7B-Instruct \
  --max_length 1024 \
  --sft_type lora \
  --lora_rank 8 \
  --lora_alpha 16 \
  --lora_dropout 0.0 \
  --dataset /home/Fixed-Train-Dataset/alignment-v3.jsonl \
  --learning_rate 0.0001 \
  --save_only_model true \
  --dataset_test_ratio 0.05 \
  --batch_size 8 \
  --eval_batch_size 8 \
  --num_train_epochs 3 \
  --gradient_accumulation_steps 2 \
  --lr_scheduler_type cosine \
  --warmup_ratio 0.1 \
  --eval_steps 50 \
  --save_steps 50 \
  --logging_steps 10 \
  --preprocess_num_proc 4 \
  --logging_dir /home/sft_middle_results/1125/alignment-v3 \
  --output_dir /home/sft_middle_results/1125/alignment-v3 \
  --save_strategy steps \
  --evaluation_strategy steps \
  --add_output_dir_suffix false

微调记录如下
image
image

step2-lora合并(正常)

使用以下命令进行lora合并权重,合并权重也正常

swift merge-lora \
  --ckpt_dir /home/sft_middle_results/1125/alignment-v3/checkpoint-891

image

step3-基于合并后的模型进行推理(bug)

processor = AutoProcessor.from_pretrained(
    model_path, 
    min_pixels=min_pixels, 
    max_pixels=max_pixels
)

代码报错Exception: data did not match any variant of untagged enum ModelWrapper at line 757371 column 3
image
这个问题请问团队能解决吗?求助

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
linux,8卡H20,torch2.5.1+cuda124,cuda是11.8

Additional context
Add any other context about the problem here(在这里补充其他信息)
原始qwen2vl的文件夹
image
合并lora后的文件夹
image
训练过程loss正常下降
image

@Jintao-Huang
Copy link
Collaborator

same issue here: #2494

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants