Change KTO tokenization to use DPO's #2187

kawine · 2024-10-06T06:33:35Z

What does this PR do?

The tokenization used for KTO has diverged significantly from that of DPO's (e.g., doesn't support images in input, different length truncation techniques, etc.). This PR uses helper functions from DPOTrainer to do the same kind of tokenization in KTOTrainer. Subsequent improvements to the DPO tokenization will now carry over automatically to KTO as well.

This works seem to work better in practice as well, at least with the sample kto dataset.

cc @kashif

…-tokenize

kawine · 2024-10-06T06:37:30Z

Some of the changes carry over from #2153 so maybe it's best to merge that one first?

HuggingFaceDocBuilderDev · 2024-10-06T09:31:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/kto_trainer.py

kashif · 2024-10-06T17:00:54Z

@kawine the kto test(s) failing...

kawine · 2024-10-06T17:24:47Z

Sorry I forgot to rewrite the tests. Will fix in an hour.

kawine · 2024-10-06T18:56:33Z

@kashif The tests are passing now.

kashif · 2024-10-06T18:58:07Z

very cool! checking!

kashif · 2024-10-06T19:17:53Z

can i add back the maybe_unpair_preference_dataset logic in the example script?

kawine · 2024-10-06T19:29:44Z

Sure!

examples/scripts/kto.py

kawine · 2024-10-09T05:19:26Z

@kashif does this look okay? I merged the latest changes in from main

kashif · 2024-10-09T06:10:25Z

yes, I believe so, the only issue is that the dpo helpers being used somehow smell bad... we need to perhaps make it more modular or simplify it... let me ask around

kawine · 2024-10-10T03:45:33Z

@kashif seems that there are already tokenization helper functions in utils, so I just moved the remaining methods that both DPO and KTO tokenization depend on to utils. This removes the dependency between trainers and makes the code simpler as well -- hopefully that should be fine?

qgallouedec · 2024-10-10T09:26:35Z

Thank you for this PR! I completely agree with your observations. I'm currently working on further refactoring the tokenization phase for DPO (#2209—feel free to contribute, by the way). I suggest putting this PR on hold for now, as the solution might become simpler once we've identified a more straightforward approach for DPO.

qgallouedec · 2024-10-24T15:25:07Z

#2209 is now merged. We would like to do the same refactoring for KTO, if you're still interested in contributing, let us know :)

kawine added 18 commits September 30, 2024 23:41

add argument for dropout

9d5105c

increase default lr

1996fe0

change default lr in examples

000bd35

fix bug in calculation of KL batch size

0fbcec6

KL batch size should be args.per_device_train_batch_size

847790d

Update kto_trainer.mdx with hparam recs

4a8645a

typo

8aab352

allow dropout to be disabled

599d6f6

Merge branch 'huggingface:main' into kto-hyperparam

0b2fda2

Use DPO tokenization functions where possible

67d7884

fix bugs in use of dpotrainer tokenization

f7d77b6

Merge branch 'huggingface:main' into kto-tokenize

ea23270

add prefixes and text to batch

0e23f55

minor changes

ecfa423

Merge branch 'kto-tokenize' of https://github.com/kawine/trl into kto…

62dd49f

…-tokenize

Merge branch 'huggingface:main' into kto-tokenize

1157202

minor changes

f0324a6

remove unnecessarily cols in kl dataset

61365db

kashif added the 🏋 KTO Related to KTO label Oct 6, 2024

kashif reviewed Oct 6, 2024

View reviewed changes

trl/trainer/kto_trainer.py Outdated Show resolved Hide resolved

kashif added 4 commits October 6, 2024 12:59

Update trl/trainer/kto_trainer.py

9fabdca

formatting

b0ee0d8

Merge branch 'main' into kto-tokenize

ea88ad1

revert from merge

2f83f58

fix tests to work with new tokenization format

42c5e92

fix test

782cc51

add back maybe_unpair_preference_dataset

c83cf4a

kashif reviewed Oct 6, 2024

View reviewed changes

examples/scripts/kto.py Outdated Show resolved Hide resolved

Update examples/scripts/kto.py

b201d63

kashif reviewed Oct 6, 2024

View reviewed changes

examples/scripts/kto.py Outdated Show resolved Hide resolved

kashif and others added 5 commits October 6, 2024 21:42

Update examples/scripts/kto.py

729d59e

Merge branch 'main' into kto-tokenize

9ae7f9f

remove twice processing of training data

7c3b970

fix more bugs with merge

a9e87b5

Merge branch 'main' into kto-tokenize

cc36a81

kawine added 2 commits October 9, 2024 23:32

move tokenization helper functions to utils; streamline KL calc for KTO

c7f9dda

Merge branch 'main' into kto-tokenize

697dae2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change KTO tokenization to use DPO's #2187

Change KTO tokenization to use DPO's #2187

kawine commented Oct 6, 2024

kawine commented Oct 6, 2024

HuggingFaceDocBuilderDev commented Oct 6, 2024

kashif commented Oct 6, 2024

kawine commented Oct 6, 2024

kawine commented Oct 6, 2024

kashif commented Oct 6, 2024

kashif commented Oct 6, 2024

kawine commented Oct 6, 2024

kawine commented Oct 9, 2024

kashif commented Oct 9, 2024

kawine commented Oct 10, 2024

qgallouedec commented Oct 10, 2024

qgallouedec commented Oct 24, 2024

Change KTO tokenization to use DPO's #2187

Are you sure you want to change the base?

Change KTO tokenization to use DPO's #2187

Conversation

kawine commented Oct 6, 2024

What does this PR do?

kawine commented Oct 6, 2024

HuggingFaceDocBuilderDev commented Oct 6, 2024

kashif commented Oct 6, 2024

kawine commented Oct 6, 2024

kawine commented Oct 6, 2024

kashif commented Oct 6, 2024

kashif commented Oct 6, 2024

kawine commented Oct 6, 2024

kawine commented Oct 9, 2024

kashif commented Oct 9, 2024

kawine commented Oct 10, 2024

qgallouedec commented Oct 10, 2024

qgallouedec commented Oct 24, 2024