Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The cached_loss_mask maybe modified unexpectedly in GPTDataset? #1269

Open
shmily326 opened this issue Nov 1, 2024 · 0 comments
Open

Comments

@shmily326
Copy link

shmily326 commented Nov 1, 2024

I encountered a problem in pretrain that the lm loss become 0.0 after hundreds iterations and hold then, but there was no nan/inf/skip iteration according to the train log.

if (
not self.masks_and_position_ids_are_cacheable
or not self.masks_and_position_ids_are_cached
):
attention_mask, loss_mask, position_ids = _get_ltor_masks_and_position_ids(
tokens,
self.config.tokenizer.eod,
self.config.reset_position_ids,
self.config.reset_attention_mask,
self.config.eod_mask_loss,
self.config.create_attention_mask,
)
if self.masks_and_position_ids_are_cacheable:
self.cached_attention_mask = attention_mask
self.cached_loss_mask = loss_mask
self.cached_position_ids = position_ids
self.masks_and_position_ids_are_cached = True
else:
attention_mask = self.cached_attention_mask
loss_mask = self.cached_loss_mask
position_ids = self.cached_position_ids
# For padded sequences, mask the loss
loss_mask[labels == self._pad_token_id] = 0.0

I'm wondering whether loss_mask may modify self.cached_loss_mask unexpectedly in Line206 (since we want to cache loss mask, but loss_mask is just a reference to the original tensor), which finally results in zeros accumulated in self.cached_loss_mask and an all-zero loss_mask

@shmily326 shmily326 changed the title [QUESTION] The cached_loss_mask maybe modified unexpectedly in GPTDataset? Whether a .clone() is needed? [QUESTION] The cached_loss_mask maybe modified unexpectedly in GPTDataset? Nov 1, 2024
@shmily326 shmily326 reopened this Nov 3, 2024
@shmily326 shmily326 changed the title [QUESTION] The cached_loss_mask maybe modified unexpectedly in GPTDataset? [BUG] The cached_loss_mask maybe modified unexpectedly in GPTDataset? Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant