Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QA] InternEvo能否load预训练llama2的参数 #268

Open
JunZhan2000 opened this issue Jul 2, 2024 · 4 comments
Open

[QA] InternEvo能否load预训练llama2的参数 #268

JunZhan2000 opened this issue Jul 2, 2024 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@JunZhan2000
Copy link

JunZhan2000 commented Jul 2, 2024

描述问题

InternEvo能否load预训练llama2的参数,再继续预训练,用hf的格式还是原始的格式

@JunZhan2000 JunZhan2000 added the question Further information is requested label Jul 2, 2024
@JunZhan2000
Copy link
Author

Traceback (most recent call last): File "/root/dataDisk/internlm/openpoet/train.py", line 335, in
main(args) File "/root/dataDisk/internlm/openpoet/train.py", line 149, in main
ckpt_manager.try_resume_training(train_state, current_time) File "/root/dataDisk/internlm/internlm/checkpoint/checkpoint_manager.py", line 551, in try_resume_training
load_content_str = load_func(self, self.load_ckpt_info, train_state) File "/root/dataDisk/internlm/internlm/checkpoint/checkpoint_manager.py", line 208, in try_load_internlm_ckpt_func
func(folder=load_ckpt_folder, model=ckpt_mm.model) File "/root/dataDisk/internlm/internlm/checkpoint/load_funcs.py", line 179, in load_hf_llama_pretrained_weights
missing_keys, unexpected_keys = model.load_state_dict(current_states, strict=False) File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Llama2:
size mismatch for layers.0.feed_forward.w2.weight: copying a param with shape torch.Size([11008, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 11008]).

@sunpengsdu
Copy link
Contributor

@zigzagcai 帮忙看看

@zigzagcai
Copy link
Collaborator

zigzagcai commented Jul 10, 2024

支持的

@zigzagcai
Copy link
Collaborator

zigzagcai commented Jul 12, 2024

Traceback (most recent call last): File "/root/dataDisk/internlm/openpoet/train.py", line 335, in
main(args) File "/root/dataDisk/internlm/openpoet/train.py", line 149, in main
ckpt_manager.try_resume_training(train_state, current_time) File "/root/dataDisk/internlm/internlm/checkpoint/checkpoint_manager.py", line 551, in try_resume_training
load_content_str = load_func(self, self.load_ckpt_info, train_state) File "/root/dataDisk/internlm/internlm/checkpoint/checkpoint_manager.py", line 208, in try_load_internlm_ckpt_func
func(folder=load_ckpt_folder, model=ckpt_mm.model) File "/root/dataDisk/internlm/internlm/checkpoint/load_funcs.py", line 179, in load_hf_llama_pretrained_weights
missing_keys, unexpected_keys = model.load_state_dict(current_states, strict=False) File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Llama2:
size mismatch for layers.0.feed_forward.w2.weight: copying a param with shape torch.Size([11008, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 11008]).

我可以复现这个报错,已经在这个PR修复:#276

这个bug的原因在于在早期版本的InternEvo LLaMA实现中,ffn w2和w3的层与meta发布的LLaMA反了。在后来与meta LLaMA对齐之后,load func没有同步更新导致。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants