Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change KTO tokenization to use DPO's #2187

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

kawine
Copy link
Contributor

@kawine kawine commented Oct 6, 2024

What does this PR do?

The tokenization used for KTO has diverged significantly from that of DPO's (e.g., doesn't support images in input, different length truncation techniques, etc.). This PR uses helper functions from DPOTrainer to do the same kind of tokenization in KTOTrainer. Subsequent improvements to the DPO tokenization will now carry over automatically to KTO as well.

This works seem to work better in practice as well, at least with the sample kto dataset.

Screenshot 2024-10-06 at 2 32 41 AM

cc @kashif

@kawine
Copy link
Contributor Author

kawine commented Oct 6, 2024

Some of the changes carry over from #2153 so maybe it's best to merge that one first?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kashif kashif added the 🏋 KTO Related to KTO label Oct 6, 2024
trl/trainer/kto_trainer.py Outdated Show resolved Hide resolved
@kashif
Copy link
Collaborator

kashif commented Oct 6, 2024

@kawine the kto test(s) failing...

@kawine
Copy link
Contributor Author

kawine commented Oct 6, 2024

Sorry I forgot to rewrite the tests. Will fix in an hour.

@kawine
Copy link
Contributor Author

kawine commented Oct 6, 2024

@kashif The tests are passing now.

@kashif
Copy link
Collaborator

kashif commented Oct 6, 2024

very cool! checking!

@kashif
Copy link
Collaborator

kashif commented Oct 6, 2024

can i add back the maybe_unpair_preference_dataset logic in the example script?

@kawine
Copy link
Contributor Author

kawine commented Oct 6, 2024

Sure!

examples/scripts/kto.py Outdated Show resolved Hide resolved
examples/scripts/kto.py Outdated Show resolved Hide resolved
@kawine
Copy link
Contributor Author

kawine commented Oct 9, 2024

@kashif does this look okay? I merged the latest changes in from main

@kashif
Copy link
Collaborator

kashif commented Oct 9, 2024

yes, I believe so, the only issue is that the dpo helpers being used somehow smell bad... we need to perhaps make it more modular or simplify it... let me ask around

@kawine
Copy link
Contributor Author

kawine commented Oct 10, 2024

@kashif seems that there are already tokenization helper functions in utils, so I just moved the remaining methods that both DPO and KTO tokenization depend on to utils. This removes the dependency between trainers and makes the code simpler as well -- hopefully that should be fine?

@qgallouedec
Copy link
Member

Thank you for this PR! I completely agree with your observations. I'm currently working on further refactoring the tokenization phase for DPO (#2209—feel free to contribute, by the way). I suggest putting this PR on hold for now, as the solution might become simpler once we've identified a more straightforward approach for DPO.

@qgallouedec
Copy link
Member

#2209 is now merged. We would like to do the same refactoring for KTO, if you're still interested in contributing, let us know :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 KTO Related to KTO
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants