Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoize dataset length for eval sample packing #1974

Merged
merged 18 commits into from
Oct 17, 2024
Merged

memoize dataset length for eval sample packing #1974

merged 18 commits into from
Oct 17, 2024

Conversation

bursteratom
Copy link
Collaborator

Description

Fix for issue#1966, where eval_sample_packing=True caused evaluation being stuck on multi-gpu.

Motivation and Context

In issue#1966, evaluation on sample packed dataset on multiple GPU got stuck on RANK 0 gpu, when calling __len__(self) in MultipackBatchSampler. This is due to a change in transoformers' handling on callbacks during evaluation loop. This PR modifies MultipackBatchSampler.__len__(self) in src/axolotl/utils/samper/multipack.py such that it gets the length of eval_dataloader across all ranks.

How has this been tested?

Tested on example/llama-3/qlora-1b.yml

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

@bursteratom bursteratom changed the title Fix for issue#1966 Fix for issue#1966: eval sample packing Oct 17, 2024
@winglian winglian changed the title Fix for issue#1966: eval sample packing memoize dataset length for eval sample packing Oct 17, 2024
@winglian winglian merged commit f62e237 into main Oct 17, 2024
11 checks passed
@winglian winglian deleted the 1966fix branch October 17, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants