Skip to content

wrongbad/btok

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

btok

BPE tokenizer with full binary and unicode support

50x faster than huggingface ByteLevelBPETokenizer

Lean c++ backend, with slim python wrapper

Decoding is strictly concat of token strings, enabling portability to embedded runtimes.

C++ headers are installed with python packge, and can be found with python -m btok.includes You can use that command in makefiles to compile against the C++ backend directly.

About

blazing fast byte-level BPE tokenizer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published