Faster Embeddings #284

srv1n · 2024-05-03T03:35:41Z

Great crate!

I was able to speed up embeddings by making the following changes -

expose n_ubatch
setting n_ubatch and n_batch to 2048
initialize llamabatch with n-tokens with 2048
updating line 65 to check on n_batch size instead of n_ctx. (Details below)

Line 65 - if (batch.n_tokens() as usize + tokens.len()) > n_ctx {

this needs to be n_batch & not n_ctx ( you can refer to the original llama example -
https://github.com/ggerganov/llama.cpp/blob/master/examples/embedding/embedding.cpp (line 164) - if (batch.n_tokens + n_toks > n_batch) {

MarcusDunn · 2024-05-03T03:58:27Z

Thanks for the issue, would love a PR to this effect.

MarcusDunn added 🪄 enhancement additions to the software 🚀 performance there is something that should be faster examples touches the examples lib things that effect the library itself labels May 8, 2024

MarcusDunn mentioned this issue May 8, 2024

expose n_ubatch #291

Closed

MarcusDunn removed the lib things that effect the library itself label May 8, 2024

brittlewis12 mentioned this issue Sep 22, 2024

Expose n_ubatch context param #504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Embeddings #284

Faster Embeddings #284

srv1n commented May 3, 2024

MarcusDunn commented May 3, 2024

Faster Embeddings #284

Faster Embeddings #284

Comments

srv1n commented May 3, 2024

MarcusDunn commented May 3, 2024