Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid passing around reference of cached tokens #15

Closed
wants to merge 0 commits into from

Conversation

PatrickElfert
Copy link

The tokenize function was previously returning a reference to the cached tokens. This caused issues when mutating the array and then accessing the cache, resulting in different outputs for the same input text. To resolve this, I've added logic to create a shallow copy of the tokens before returning them, ensuring the cached tokens are not inadvertently mutated.

@azu
Copy link
Owner

azu commented Dec 15, 2024

This approach does not prevent the mutate itself of each token.

const tokens = await tokenize(...);
tokens[0].word_id = "..."; // affect to cache

I think we should avoid mutating KuromojiToken. (The design was not intended to mutate it.)
Therefore, I feel that changing Promise<KuromojiToken[]> in the return value to Promise<Readonly<Readonly<KuromojiToken>[]>> is a compromise. (We also need to write it to documentation)

Freezing the return Object.freeze(value) is expensive, and storing a copy of the token in the cache can also be expensive/incomplete.
Therefore, I think it is better to solve this problem with just a type.

Related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants