Add support for FinMTEB benchmark #1379

alt-glitch · 2024-11-04T08:19:47Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition:

Fixes #1267

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
  - Ran only on FiQAClassification as of now.
- intfloat/multilingual-e5-small
  - Ran only on FINAL as of now.
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

…etrieval, sts

alt-glitch · 2024-11-04T08:24:11Z

Hey @Muennighoff @KennethEnevoldsen @isaac-chung!

Here's a WIP PR to close #1267.

I had a few questions/notes:

Should I run and get the results for all the tasks?
Should the relevant PRs to embeddings-benchmark/results and embeddings-benchmark/leaderboard be made after merging this PR?
FiQA2018 is already in MTEB, so I have left that out from FinMTEB. Otherwise, there were no conflicting tasks.
Some tasks don't have a reference URL.
The Summarization tasks are still pending. I have yet to look into the changes highlighted by @yixuantt in Add FinMTEB #1267 for summarization.

I'll add the summarization changes and make the PRs to results and leaderboard once this is done.
Is there anything else I'm missing out on?

isaac-chung · 2024-11-04T09:39:09Z

Hi @alt-glitch , thanks for working on this!

Yes, I'd suggest running the whole thing on a small model mentioned in the paper like all-MiniLM-L12-v2, and only using the quickest settings as a sanity check, e.g. n_experiments=1 for classification.
Afterwards for the leaderboard yes. I'll leave the results repo part to @KennethEnevoldsen
Sounds good.
I think it's ok to use the paper's URL or its GitHub URL as reference. Otherwise, there are individual references for each dataset mentioned in the paper.
Re: summarization task, we can add column names as a class attributes to AbstaskSummarization like the way we did in MIEB's AbsTaskImageClassification.

Let me know if anything is unclear.

KennethEnevoldsen · 2024-11-04T19:38:40Z

Re. 2: PRs to embeddings-benchmark/results can be made after this PR. I don't believe a PR to embeddings-benchmark/leaderboard will be required once the new leaderboard is up and running as long as the benchmark is added to it is added to benchmarks.py

…lassification tasks

alt-glitch · 2024-11-07T13:46:50Z

Thanks for the comments!

Some more info:

The main_score of PairClassification tasks needed to be fixed to max_ap.
Summarization tasks use STSEvaluator
Added reference_summaries_column and generated_summaries_column for specifying column names.
Added Summarization tasks.
Added some more Clustering tasks that I had missed out.
I'm currently taking a look at the results from the sanity run I did through the benchmark. All tasks ran fine.

I'm interested in helping with getting the results too!

cc @KennethEnevoldsen @isaac-chung

mteb/abstasks/AbsTaskSTS.py

mteb/tasks/Classification/eng/ESGClassification.py

KennethEnevoldsen · 2024-11-07T21:02:53Z

mteb/tasks/Classification/eng/FOMCClassification.py

+from mteb.abstasks.TaskMetadata import TaskMetadata
+
+
+class FOMCClassification(AbsTaskClassification):


General comment: Meta data is required to be filled out.

Ah understood. Working on it.

mteb/tasks/Classification/eng/SemEva2017Classification.py

mteb/tasks/Classification/zho/FinNSPClassification.py

mteb/tasks/PairClassification/zho/AFQMCPairClassification.py

KennethEnevoldsen · 2024-11-07T21:09:07Z

Not sure what is meant by:

Summarization tasks use STSEvaluator

…tasks

alt-glitch · 2024-11-08T03:46:26Z

Not sure what is meant by:

Summarization tasks use STSEvaluator

Summarisation tasks here don't have human_summaries or relevance scores. So the Spearman correlation is calculated between summary and text. Hence the STSEvaluator is used.

See: yixuantt/FinMTEB#2

Updated AbsTaskSTS, AbsTaskSummarization, AbsTaskPairClassification and the respective tasks to use configurable column names instead of dataset_transform
Added missing reference to the tasks.
I'm going to work on filling out the metadata for the tasks @KennethEnevoldsen. I'll update you once I'm done.

alt-glitch · 2024-11-10T15:29:03Z

Update: It's taking me a couple more days to fill out all the metadata fields for this benchmark as this seems to be mostly a manual process — reading the paper referenced for each dataset to understand and derive the date of dataset creation, annotation creators, and sample creation process since there are 64 datasets :)

If there's something I'm missing, do let me know!

Thanks!

KennethEnevoldsen · 2024-11-11T09:02:18Z

Update: It's taking me a couple more days to fill out all the metadata fields for this benchmark as this seems to be mostly a manual process — reading the paper referenced for each dataset to understand and derive the date of dataset creation, annotation creators, and sample creation process since there are 64 datasets :)

Thanks for taking the time on this. I believe metadata is the only thing missing and then it can be reviewed and merged.

KennethEnevoldsen · 2024-11-11T09:27:52Z

Moving this to v2.0.0 to avoid merge conflict in the future. I can resolve the current merge conflicts one metadata is added.

alt-glitch added 3 commits November 4, 2024 13:02

FinMTEB: classification, clustering, pairclassification, reranking, r…

585cdeb

…etrieval, sts

fix __init__

cd86c8c

fix references not being URLs

cdebb41

alt-glitch mentioned this pull request Nov 7, 2024

AbsTaskSummarization using STSEvaluator instead of SummarizationEvalutor? yixuantt/FinMTEB#2

Closed

fix: update import statements; add summarization tasks; update pair_c…

231bd95

…lassification tasks

alt-glitch added 2 commits November 7, 2024 19:41

revert FinQA naming

4ccdb06

fix AbsTaskSTS default human/generated columns

4caa688

KennethEnevoldsen reviewed Nov 7, 2024

View reviewed changes

refactor: update dataset column names and add references for various …

2b38e05

…tasks

KennethEnevoldsen changed the base branch from main to v2.0.0 November 11, 2024 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for FinMTEB benchmark #1379

Add support for FinMTEB benchmark #1379

alt-glitch commented Nov 4, 2024 •

edited

Loading

alt-glitch commented Nov 4, 2024 •

edited

Loading

isaac-chung commented Nov 4, 2024

KennethEnevoldsen commented Nov 4, 2024

alt-glitch commented Nov 7, 2024

KennethEnevoldsen Nov 7, 2024

alt-glitch Nov 8, 2024

KennethEnevoldsen commented Nov 7, 2024

alt-glitch commented Nov 8, 2024

alt-glitch commented Nov 10, 2024 •

edited

Loading

KennethEnevoldsen commented Nov 11, 2024

KennethEnevoldsen commented Nov 11, 2024

		from mteb.abstasks.TaskMetadata import TaskMetadata


		class FOMCClassification(AbsTaskClassification):

Add support for FinMTEB benchmark #1379

Are you sure you want to change the base?

Add support for FinMTEB benchmark #1379

Conversation

alt-glitch commented Nov 4, 2024 • edited Loading

Checklist

Adding datasets checklist

alt-glitch commented Nov 4, 2024 • edited Loading

isaac-chung commented Nov 4, 2024

KennethEnevoldsen commented Nov 4, 2024

alt-glitch commented Nov 7, 2024

KennethEnevoldsen Nov 7, 2024

Choose a reason for hiding this comment

alt-glitch Nov 8, 2024

Choose a reason for hiding this comment

KennethEnevoldsen commented Nov 7, 2024

alt-glitch commented Nov 8, 2024

alt-glitch commented Nov 10, 2024 • edited Loading

KennethEnevoldsen commented Nov 11, 2024

KennethEnevoldsen commented Nov 11, 2024

alt-glitch commented Nov 4, 2024 •

edited

Loading

alt-glitch commented Nov 4, 2024 •

edited

Loading

alt-glitch commented Nov 10, 2024 •

edited

Loading