Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: align readme with current mteb #1493

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ model_name = "average_word_embeddings_komninos"
# model_name = "sentence-transformers/all-MiniLM-L6-v2"

model = SentenceTransformer(model_name)
# or directly from mteb:
model = mteb.get_model(model_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just recommend this one always?

and maybe rephrase to:

Suggested change
model = mteb.get_model(model_name)
model = mteb.get_model(model_name) # if the model is not implemented in MTEB it will be eq. to SentenceTransformer(model_name)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm not sure removing it is the best approach. I can make changes to whatever you think is better

tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")
Expand Down Expand Up @@ -220,9 +222,13 @@ Note that the public leaderboard uses the test splits for all datasets except MS
Models should implement the following interface, implementing an `encode` function taking as inputs a list of sentences, and returning a list of embeddings (embeddings can be `np.array`, `torch.tensor`, etc.). For inspiration, you can look at the [mteb/mtebscripts repo](https://github.com/embeddings-benchmark/mtebscripts) used for running diverse models via SLURM scripts for the paper.

```python
import mteb
from mteb.encoder_interface import PromptType
from mteb.models.wrapper import Wrapper
import numpy as np


class CustomModel:
class CustomModel(Wrapper):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to inherit from wrapper? I would instead inherit from the Encoder protocol

Copy link
Collaborator Author

@Samoed Samoed Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTEB class will wrap models that do not inherit from the wrapper class.

if not isinstance(model, Wrapper):
model = SentenceTransformerWrapper(model)

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, shouldn't it just wrap SentenceTransformers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be like this. You changed it in mieb

if isinstance(model, (SentenceTransformer, CrossEncoder)):
model = SentenceTransformerWrapper(model)
I can change there too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm then it will also be merged into v2.0.0 in which case we should probably just update the readme there (I plan to merge it during December)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for now I think this PR can be merged, and I will update readme in the 2.0 branch

def encode(
self,
sentences: list[str],
Expand All @@ -244,7 +250,7 @@ class CustomModel:
pass

model = CustomModel()
tasks = mteb.get_task("Banking77Classification")
tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = MTEB(tasks=tasks)
evaluation.run(model)
```
Expand Down