-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does free actually do? #72
Comments
@disbelief I agree we need more guidance, as I'm not entirely sure how it works either. I've only come to an answer based on my own usage. Maybe @dqbd can shed some light here and correct me. I recommend if you are re-using the encoder instance, you should be calling the We can see that
And we are reminded to call it after the encoder is not used: Line 42 in 072dd12
The tiktokenizer playground frees the encoder after every request, which is also interesting to note: I found a comment somewhat related but it is a bit confusing because it implies you shouldn't be creating new instances repeatedly 'and/or'
I found re-creating instances repeatedly is mainly an issue because it slows down the rate of encodings. so TL;DR based on my usage, here's what I've done and haven't had issues when counting tokens in my app without much slowdown:
This resetting behavior has the added benefit of 'capping' resource usage, as a series of encoding requests without any buffer can be resource-intensive, and the intermittent reset forces the re-initialization period before too many are processed all at once. The introduced bottleneck is not too much of a blocker as it's within reason of OpenAI rate limits presuming you are counting before making generation requests, but you should adjust as necessary for your scale or use case. |
Thanks @danny-avila your experience here is much appreciated. I hope @dqbd can shed some more light, but in the meantime I'll add some extra instrumentation to keep an eye on things. |
Hello @disbelief and @danny-avila! Sorry for the delay, hopefully I can shed some more light into In the case of However, the issue and workaround with batching 25 requests does seem to point to a memory leak of sorts, will investigate further, thanks for for flagging @danny-avila! |
@dqbd thanks for the info. To further clarify, it should be okay to use a single instance of |
Hey @b0o! Yep, that should be the case. Any issues arising from that would be considered a bug. |
Hi @dqbd, thanks for your response. So are you saying In this case here, are you only freeing the encoder when a new one is selected? Thanks again, would just like added clarity. |
Hi @danny-avila, thanks for your comment above. Have you figured out when and how to use free() properly? After going through your comment, I am just wondering if using I'm using this library in my express.js API to compute the number of tokens in some input text, assisting me in trimming the text efficiently. For every API request, I have to execute
So as you quoted above "Create an encoder instance and re-use it up to 25 encodings", do I need to do this? |
Your server can crash just from requiring too many resources, which encoding will do that. Using Here's a snippet of my test script where I figured this out: for (let i = 0; i < iterations; i++) {
try {
console.log(`Iteration ${i}`);
const client = new OpenAIClient(apiKey, clientOptions);
client.getTokenCount(text); // uses this tiktoken library
// const encoder = client.constructor.getTokenizer('cl100k_base');
// console.log(`Iteration ${i}: call encode()...`);
// encoder.encode(text, 'all');
// encoder.free();
const memoryUsageDuringLoop = process.memoryUsage().heapUsed;
const percentageUsed = (memoryUsageDuringLoop / maxMemory) * 100;
printProgressBar(percentageUsed);
if (i === iterations - 1) {
console.log(' done');
// encoder.free();
}
} catch (e) {
console.log(`caught error! in Iteration ${i}`);
console.log(e);
}
}
Needing to do this will depend on your own tests. You can find my full test here: https://github.com/danny-avila/LibreChat/blob/b3aac97710ab9680046eb8089c5fcd4456bd2988/api/app/clients/specs/OpenAIClient.tokens.js |
Thank you for the response @danny-avila. That's an interesting test you have written. I will perform some similar test in my case to check if things will work as expected. |
Hi, thanks for the javascript port of tiktoken it's been extremely helpful.
I'm trying to figure out what
free
actually does and more importantly: when it should be used. I assume it performs some teardown and releases any memory reserved by the encoder instance, but I can't find any documentation or source code.My use case is in an AWS lambda environment, and so I was wondering if it's fine to reuse the same encoder instance multiple times rather than instantiate a new one on every request and then
free
it after?This would reduce the overhead of setting up a new encoder every request. Is that advisable or is it better to just instantiate and
free
every time an encoder is used?The text was updated successfully, but these errors were encountered: