Muon with Llama? #9

yangsp5 · 2024-12-11T07:28:48Z

How can i use Muon with llama model? I run it with Llama, 64 A100

model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
grouped_parameters = [
  p for p in model.parameters() if p.requires_grad
]

optimizer = Muon(grouped_parameters)

But it got wrong

[rank3]:   File "/xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/optimizer/Muon.py", line 104, in <listcomp>
[rank3]:     params = [p for p in group['params'] if self.state[p]['use_muon']]
[rank3]: KeyError: 'use_muon'

When I print the params，it seems that the params in self.state not equal group['params']

The text was updated successfully, but these errors were encountered:

kcz358 · 2024-12-16T02:31:13Z

I encountered this error also. I solve this error by changing the deepspeed to fsdp. The command is something like this

accelerate launch \
    --use_fsdp \
    --mixed_precision bf16 \
    --fsdp_sharding_strategy HYBRID_SHARD \
    --fsdp_auto_wrap_policy TRANSFORMER_BASED_WRAP \
    --fsdp_backward_prefetch BACKWARD_PRE \
    --fsdp_forward_prefetch false \
    --fsdp_cpu_ram_efficient_loading true \
    --fsdp_offload_params false \
    --fsdp_state_dict_type SHARDED_STATE_DICT \
    --fsdp_sync_module_states true \
    --fsdp_transformer_layer_cls_to_wrap "<Your layer cls>" \
    --fsdp_use_orig_params true \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Muon with Llama? #9

Muon with Llama? #9

yangsp5 commented Dec 11, 2024

kcz358 commented Dec 16, 2024

Muon with Llama? #9

Muon with Llama? #9

Comments

yangsp5 commented Dec 11, 2024

kcz358 commented Dec 16, 2024