memoize dataset length for eval sample packing #1974

bursteratom · 2024-10-16T17:27:17Z

Description

Fix for issue#1966, where eval_sample_packing=True caused evaluation being stuck on multi-gpu.

Motivation and Context

In issue#1966, evaluation on sample packed dataset on multiple GPU got stuck on RANK 0 gpu, when calling __len__(self) in MultipackBatchSampler. This is due to a change in transoformers' handling on callbacks during evaluation loop. This PR modifies MultipackBatchSampler.__len__(self) in src/axolotl/utils/samper/multipack.py such that it gets the length of eval_dataloader across all ranks.

How has this been tested?

Tested on example/llama-3/qlora-1b.yml

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

tests/e2e/multigpu/test_eval.py

Co-authored-by: Wing Lian <[email protected]>

sunny added 13 commits October 14, 2024 21:05

wip on multimodal sample packing support

fc0386f

wip on multimodal packing support

df8e1d0

llama-1b-yml

da4643f

setup logging for test

493698c

yml

f385efe

yml

cfffcde

yml

8b5d774

fix for __len__ for eval sample packing

a6e4ac2

reverted irrelavant changes

1a534c5

reformatted, reverted log message

5c5fefe

reverted unnecessary changes

87cf605

added e2e multigpu testing for eval sample packing

e0a6ce6

formatting

2524bfc

bursteratom mentioned this pull request Oct 16, 2024

Flash attention and multipack failing for qwen and mistral #1966

Open

8 tasks

sunny added 3 commits October 16, 2024 15:21

fixed e2e test_eval params

bdb3a39

fix test_eval e2e multigpu

d4255bd

fix test_eval e2e multigpu

ec72a33

bursteratom requested a review from winglian October 16, 2024 21:12

winglian reviewed Oct 17, 2024

View reviewed changes

tests/e2e/multigpu/test_eval.py Show resolved Hide resolved

winglian reviewed Oct 17, 2024

View reviewed changes

tests/e2e/multigpu/test_eval.py Show resolved Hide resolved

bursteratom and others added 2 commits October 17, 2024 09:59

Update tests/e2e/multigpu/test_eval.py

c81638d

Co-authored-by: Wing Lian <[email protected]>

Update tests/e2e/multigpu/test_eval.py

92232ef

Co-authored-by: Wing Lian <[email protected]>

bursteratom changed the title ~~Fix for issue#1966~~ Fix for issue#1966: eval sample packing Oct 17, 2024

winglian approved these changes Oct 17, 2024

View reviewed changes

winglian changed the title ~~Fix for issue#1966: eval sample packing~~ memoize dataset length for eval sample packing Oct 17, 2024

winglian merged commit f62e237 into main Oct 17, 2024
11 checks passed

winglian deleted the 1966fix branch October 17, 2024 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memoize dataset length for eval sample packing #1974

memoize dataset length for eval sample packing #1974

bursteratom commented Oct 16, 2024

memoize dataset length for eval sample packing #1974

memoize dataset length for eval sample packing #1974

Conversation

bursteratom commented Oct 16, 2024

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)