Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for stella_en_v5 embedding model -400M variant #2608

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

iskng
Copy link

@iskng iskng commented Nov 9, 2024

Stella_en_400m_v5 is #6 on MTEB as of 9th Nov 2024.

Model Card

This PR adds support for the model along with some examples.

license: Model is licensed MIT

Authors example from the model card added and reproduced.

@AnubhabB
Copy link
Contributor

@iskng let's try and figure out if we can have one single stella_en_v5 module instead of stella_en_v5 and stella_en_v5_400m. Allow me some time to go through this and discuss possible ways of merging this.

I guess that way, it'll be easier for end users and maintainers.

It would be great if you could mark this as a draft PR for the time being till we sort this out?

Thanks

@LaurentMazare
Copy link
Collaborator

@iskng let's try and figure out if we can have one single stella_en_v5 module instead of stella_en_v5 and stella_en_v5_400m. Allow me some time to go through this and discuss possible ways of merging this.

+1 to this, if it's easy to add support for the 400m model in the existing one that would make it simpler to maintain over time (though there is already a lot of duplication among models so if it's a significant effort to merge the two, I'm happy with the separate file).

@iskng iskng marked this pull request as draft November 10, 2024 18:40
@iskng
Copy link
Author

iskng commented Nov 12, 2024

Should have mentioned I only really tested this for inference on metal and cpu so not sure if the cuda implementation is right, had to disable the ues_efficient_memory because trying to get xformers on a mac was rough.

Also curious if its just my implementation, but its about 3 times slower than sentence transformers for the same model.
Would love to learn to make this faster, if you know of any resources I'm just starting to dig around candle. Thx

@AnubhabB
Copy link
Contributor

AnubhabB commented Nov 18, 2024

@iskng Sorry about the delay, I've been on the move and didn't get a chance to work on this.

I'm not sure about the correct way of proceeding with this PR (and my changes to it) so I just created a PR to your feature branch here.

This incorporates Stella 1.5B and Stella 400M under a single entry point and flow. Basically it's a single model now with minor divergence in implementation when required. Take a look, I've been able to reproduce the author (and your's) results.

Also curious if its just my implementation, but its about 3 times slower than sentence transformers for the same model.
Would love to learn to make this faster, if you know of any resources I'm just starting to dig around candle. Thx

It might not be strictly your implementation, Candle is still very very young and a bit rough around the edges when it comes to head-on performance comparisons with PyTorch based implementations. Having said that, I've attempted to simplify some of your current implementation (basically removing anything which is not directly composed out of Config, removed a bunch of redundant transpose, reshape etc. ops). Let me know if you see any performance improvements? I'll be curious.

Also, I've updated the stella_en_v5 example to support both the variants --which 400m and --which 1.5b and reported both the results. It should be safe to remove those files/ dir as well.

AnubhabB and others added 3 commits November 19, 2024 02:18
Your implementation of `Stella 400M` and the previous `Stella 1.5B` now supported in a single file and entry point
@iskng
Copy link
Author

iskng commented Nov 23, 2024

Its about 15% faster. Learned a lot from this thanks!

Removed the now redundant 400m

@AnubhabB
Copy link
Contributor

@iskng great, if you are satisfied un-draft this PR and we should be ready for merge with Candle.

@LaurentMazare
Copy link
Collaborator

Looks pretty good, you may want to run rustfmt though. Anything else we should be waiting for before merging this one?

@AnubhabB
Copy link
Contributor

I believe this is generally complete.

@iskng waiting on you to finalize this PR. As @LaurentMazare highlighted please run a cargo fmt --all and push when you un-mark this PR as WIP and we should be good to go.

@iskng iskng marked this pull request as ready for review November 26, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants