-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for stella_en_v5 embedding model -400M variant #2608
base: main
Are you sure you want to change the base?
Conversation
@iskng let's try and figure out if we can have one single I guess that way, it'll be easier for end users and maintainers. It would be great if you could mark this as a Thanks |
+1 to this, if it's easy to add support for the 400m model in the existing one that would make it simpler to maintain over time (though there is already a lot of duplication among models so if it's a significant effort to merge the two, I'm happy with the separate file). |
Should have mentioned I only really tested this for inference on metal and cpu so not sure if the cuda implementation is right, had to disable the ues_efficient_memory because trying to get xformers on a mac was rough. Also curious if its just my implementation, but its about 3 times slower than sentence transformers for the same model. |
@iskng Sorry about the delay, I've been on the move and didn't get a chance to work on this. I'm not sure about the correct way of proceeding with this PR (and my changes to it) so I just created a PR to your feature branch here. This incorporates
It might not be strictly your implementation, Candle is still very very young and a bit rough around the edges when it comes to head-on performance comparisons with PyTorch based implementations. Having said that, I've attempted to simplify some of your current implementation (basically removing anything which is not directly composed out of Also, I've updated the |
Your implementation of `Stella 400M` and the previous `Stella 1.5B` now supported in a single file and entry point
Its about 15% faster. Learned a lot from this thanks! Removed the now redundant 400m |
@iskng great, if you are satisfied |
Looks pretty good, you may want to run rustfmt though. Anything else we should be waiting for before merging this one? |
I believe this is generally complete. @iskng waiting on you to finalize this PR. As @LaurentMazare highlighted please run a |
Stella_en_400m_v5 is #6 on MTEB as of 9th Nov 2024.
Model Card
This PR adds support for the model along with some examples.
license: Model is licensed MIT
Authors example from the model card added and reproduced.