fix: special_tokens in the encode method to support the special tokens in the vocab #13

Hk669 · 2024-06-07T08:00:44Z

Why are these changes needed?

this will ensure to encode the special_tokens seperately in the BPETokenizer.encode(texts, special_tokens="all")

closes #12

I've included any doc changes needed for https://pypi.org/project/bpetokenizer/.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…s in the vocab

Hk669 added 2 commits June 7, 2024 11:43

fix: special_tokens in the encode method to support the special token…

9fe90d4

…s in the vocab

fix: encode_ord to not handle the special_tokens

6682dba

Hk669 merged commit c66cab7 into main Jun 7, 2024
12 checks passed

Hk669 deleted the fix12 branch June 7, 2024 08:05