Skip to content

Commit

Permalink
Merge pull request #30 from mkdynamic/patch-1
Browse files Browse the repository at this point in the history
* Fix memorization thread safety

* Apply standard formatting

Ran `rake standard:fix_unsafely`

* Give cache mutex a more specific name

* Remove note about thread safety

---------

Co-authored-by: I Park <[email protected]>
  • Loading branch information
IAPark authored Apr 4, 2024
2 parents 1143cbf + 338663a commit f2930bf
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 6 deletions.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
[![Gem Version](https://badge.fury.io/rb/tiktoken_ruby.svg)](https://badge.fury.io/rb/tiktoken_ruby)

# tiktoken_ruby

[Tiktoken](https://github.com/openai/tiktoken) is BPE tokenizer from OpenAI used with their GPT models.
This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used.
This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used.

## Request for maintainers

Expand All @@ -20,18 +21,19 @@ If bundler is not being used to manage dependencies, install the gem by executin
$ gem install tiktoken_ruby

## Usage

Usage should be very similar to the python library. Here's a simple example

Encode and decode text

```ruby
require 'tiktoken_ruby'

# note: retrieving an encoding is not currently thread safe until https://github.com/IAPark/tiktoken_ruby/pull/30 is merged
enc = Tiktoken.get_encoding("cl100k_base")
enc.decode(enc.encode("hello world")) #=> "hello world"
```

Encoders can also be retrieved by model name

```ruby
require 'tiktoken_ruby'

Expand Down Expand Up @@ -59,7 +61,6 @@ bundle exec rake compile
bundle exec rake spec
```


## License

The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
8 changes: 6 additions & 2 deletions lib/tiktoken_ruby/encoding.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# frozen_string_literal: true

class Tiktoken::Encoding
CACHE_MUTEX = Mutex.new

attr_reader :name

# This returns a new Tiktoken::Encoding instance for the requested encoding
Expand All @@ -15,8 +17,10 @@ def self.for_name(encoding)
# @param encoding [Symbol] The name of the encoding to load
# @return [Tiktoken::Encoding] The encoding instance
def self.for_name_cached(encoding)
@encodings ||= {}
@encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding)
CACHE_MUTEX.synchronize do
@encodings ||= {}
@encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding)
end
end

# Encodes the text as a list of integer tokens. This encoding will encode special non text tokens
Expand Down

0 comments on commit f2930bf

Please sign in to comment.