Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Muon with Llama? #9

Open
yangsp5 opened this issue Dec 11, 2024 · 1 comment
Open

Muon with Llama? #9

yangsp5 opened this issue Dec 11, 2024 · 1 comment

Comments

@yangsp5
Copy link

yangsp5 commented Dec 11, 2024

How can i use Muon with llama model? I run it with Llama, 64 A100

model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
grouped_parameters = [
  p for p in model.parameters() if p.requires_grad
]

optimizer = Muon(grouped_parameters)

But it got wrong

[rank3]:   File "/xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/optimizer/Muon.py", line 104, in <listcomp>
[rank3]:     params = [p for p in group['params'] if self.state[p]['use_muon']]
[rank3]: KeyError: 'use_muon'

When I print the params,it seems that the params in self.state not equal group['params']

@kcz358
Copy link

kcz358 commented Dec 16, 2024

I encountered this error also. I solve this error by changing the deepspeed to fsdp. The command is something like this

accelerate launch \
    --use_fsdp \
    --mixed_precision bf16 \
    --fsdp_sharding_strategy HYBRID_SHARD \
    --fsdp_auto_wrap_policy TRANSFORMER_BASED_WRAP \
    --fsdp_backward_prefetch BACKWARD_PRE \
    --fsdp_forward_prefetch false \
    --fsdp_cpu_ram_efficient_loading true \
    --fsdp_offload_params false \
    --fsdp_state_dict_type SHARDED_STATE_DICT \
    --fsdp_sync_module_states true \
    --fsdp_transformer_layer_cls_to_wrap "<Your layer cls>" \
    --fsdp_use_orig_params true \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants