You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
lora微调脚本
如下是lora微调脚本,如果使用 --resume_from_checkpoint 加载微调后的模型继续训练,就会报错。显卡型号:A800-40G显存
如果不使用 --resume_from_checkpoint 参数微调模型,是可以训练的。
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
torch 2.1.2+cu121
ms-swift 2.6.0.post2
transformers 4.42.0
NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.1
Additional context
Add any other context about the problem here(在这里补充其他信息)
在A800-80G的显卡的机器上,可以使用 --resume_from_checkpoint参数来微调模型。只是占用显存比不使用 --resume_from_checkpoint参数的要大一些。
The text was updated successfully, but these errors were encountered:
xyz515
changed the title
lora 微调的模型,继续训练报显存不足
lora 微调的模型使用--resume_from_checkpoint参数,继续训练报显存不足;不使用--resume_from_checkpoint参数可以正常训练
Nov 26, 2024
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
lora微调脚本
如下是lora微调脚本,如果使用 --resume_from_checkpoint 加载微调后的模型继续训练,就会报错。显卡型号:A800-40G显存
如果不使用 --resume_from_checkpoint 参数微调模型,是可以训练的。
错误截图
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
torch 2.1.2+cu121
ms-swift 2.6.0.post2
transformers 4.42.0
NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.1
Additional context
Add any other context about the problem here(在这里补充其他信息)
在A800-80G的显卡的机器上,可以使用 --resume_from_checkpoint参数来微调模型。只是占用显存比不使用 --resume_from_checkpoint参数的要大一些。
The text was updated successfully, but these errors were encountered: