-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change KTO tokenization to use DPO's #2187
base: main
Are you sure you want to change the base?
Conversation
Some of the changes carry over from #2153 so maybe it's best to merge that one first? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@kawine the kto test(s) failing... |
Sorry I forgot to rewrite the tests. Will fix in an hour. |
@kashif The tests are passing now. |
very cool! checking! |
can i add back the |
Sure! |
@kashif does this look okay? I merged the latest changes in from main |
yes, I believe so, the only issue is that the dpo helpers being used somehow smell bad... we need to perhaps make it more modular or simplify it... let me ask around |
@kashif seems that there are already tokenization helper functions in utils, so I just moved the remaining methods that both DPO and KTO tokenization depend on to utils. This removes the dependency between trainers and makes the code simpler as well -- hopefully that should be fine? |
Thank you for this PR! I completely agree with your observations. I'm currently working on further refactoring the tokenization phase for DPO (#2209—feel free to contribute, by the way). I suggest putting this PR on hold for now, as the solution might become simpler once we've identified a more straightforward approach for DPO. |
#2209 is now merged. We would like to do the same refactoring for KTO, if you're still interested in contributing, let us know :) |
What does this PR do?
The tokenization used for KTO has diverged significantly from that of DPO's (e.g., doesn't support images in input, different length truncation techniques, etc.). This PR uses helper functions from DPOTrainer to do the same kind of tokenization in KTOTrainer. Subsequent improvements to the DPO tokenization will now carry over automatically to KTO as well.
This works seem to work better in practice as well, at least with the sample kto dataset.
cc @kashif