Skip to content

Latest commit

 

History

History
3 lines (3 loc) · 332 Bytes

SentencePiece.md

File metadata and controls

3 lines (3 loc) · 332 Bytes

SentencePiece -- Unsupervised text tokenizer for Neural Network-based text generation

  • https://github.com/google/sentencepiece
  • SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.