Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: special_tokens in the encode method to support the special tokens in the vocab #13

Merged
merged 2 commits into from
Jun 7, 2024

Conversation

Hk669
Copy link
Owner

@Hk669 Hk669 commented Jun 7, 2024

Why are these changes needed?

this will ensure to encode the special_tokens seperately in the BPETokenizer.encode(texts, special_tokens="all")

  • added some new tokens to test the encoding with the special_tokens

Related issue number

closes #12

Checks

  • I've included any doc changes needed for https://pypi.org/project/bpetokenizer/.
  • I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • I've made sure all auto checks have passed.

@Hk669 Hk669 merged commit c66cab7 into main Jun 7, 2024
12 checks passed
@Hk669 Hk669 deleted the fix12 branch June 7, 2024 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

special_tokens in the encode method doesn't work for the BPETokenizer
1 participant