tokenmonster/go at main · alasdairforsythe/tokenmonster

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
tokenmonster.go		tokenmonster.go

README.md

Click here for the complete documentation on pkg.go.dev.

Basic Usage

import "github.com/alasdairforsythe/tokenmonster/go"

func example() {

	vocab, err := tokenmonster.Load(vocabfilename)
	if err != nil {
		panic(err)
	}

	tokens, missing, err := vocab.Tokenize(text)
	if err != nil {
		panic(err)
	}
	
	decoder := vocab.NewDecoder()
	decoded_text := decoder.Decode(tokens)

}

missing is the number of bytes for which there were no tokens.

text must be a slice of bytes. If you are using UTF-16 encoding, that slice of bytes should be already UTF-16 encoded.

decoded_text will be also a slice of bytes in the charset encoding. If you are using UTF-8 encoding you can convert it to a string with string().

When using vocab.Tokenize(text) please note that if the vocabulary uses any normalizations other than NFD, the normalizations may be applied to the underlying text data. Therefore please pass a copy if you don't want the underlying data to be modified. This applies only to the Go package (the Python library always uses a copy.)

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go

go

README.md

Basic Usage

Files

go

Directory actions

More options

Directory actions

More options

Latest commit

History

go

Folders and files

parent directory

README.md

Basic Usage