Is LMEngine thread safe? #6
Replies: 1 comment 1 reply
-
No, not thread safe. It is designed at present to be run serially. So, you defined your models, give one a refname of "text", "tool", etc. When you call RunInference, it would try to load the specified refname if it's not already loaded. So, if you wish to do text gen, call the "text" refname, next if you wish to do function calling, you call "tool", etc. You must remember, there is an incredible number of resources allocated just to do inference. All the weights that are loaded into VRAM, all the internal allocation in RAM that's needed. For example, all the tokens are allocated in RAM. If you have 4k tokens going in, that is 4k * sizeof(int32). There are all the buffers that get allocated. It's a lot. It is really pushing consumer hardware to its limits. Yet it somehow all seems to work, lol. Note, 2-3 years ago, this would not even be possible. Having multiple models loaded would require way more resources than typical for consumer hardware. Not to say it not possible, just that my focus with these projects is to make them work as fast and as smoothly as possible on typical consumer hardware. To see just how much, uncomment the LME_Print(..) statement in the OnInfo callback and let it display the info during startup. |
Beta Was this translation helpful? Give feedback.
-
Hi!
Once a model is loaded and assuming you keep it in memory without loading other models, is it possible to invoke the "LME_RunInference" procedure from different threads to parallelize multiple tasks at the same time?
In other words, given a loaded model, is the "LME_RunInference" concurrent and thread safe? Or should the different inferences be performed serially?
Thanks in advance for your reply..
Beta Was this translation helpful? Give feedback.
All reactions