You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your question
Ask a clear and concise question about Megatron-LM.
There is an assert in megatron/core/transformer/transformer_config.py: 401
if self.moe_expert_capacity_factor is not None:
** if self.moe_token_dispatcher_type not in ["alltoall", "alltoall_seq"]:**
raise ValueError(
'moe_expert_capacity_factor only works with alltoall token dispatcher'
)
The code to process with capacity_factor and pad in router.py seems it won't change the output tensor's dimsize. And I don't see any different process to do with capacity_factor in token_dispatcher.py. So why should I use only 'alltoall' or 'alltoall_seq' with moe_expert_capacity_factor?
thanks for reply.
The text was updated successfully, but these errors were encountered:
Your question
Ask a clear and concise question about Megatron-LM.
There is an assert in megatron/core/transformer/transformer_config.py: 401
The code to process with capacity_factor and pad in router.py seems it won't change the output tensor's dimsize. And I don't see any different process to do with capacity_factor in token_dispatcher.py. So why should I use only 'alltoall' or 'alltoall_seq' with moe_expert_capacity_factor?
thanks for reply.
The text was updated successfully, but these errors were encountered: