Skip to content

Enabling sequence_parallel slows down training with fp16 #283

Answered by ksivaman
cavdard asked this question in Q&A
Discussion options

You must be logged in to vote

It's not that the sequence-parallel arg does not have to be passed into TE (it must be), but the underlying toolkit, i.e. NeMo must also be aware of SP being used so that it can split the input. This can be done by setting the corresponding sequence-parallel arg in NeMo.

Replies: 8 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by ksivaman
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants
Converted from issue

This discussion was converted from issue #182 on June 16, 2023 16:37.