-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy in STS17 Results for S-LLaMA-1.3B #151
Comments
Hi @newfish-lab, The STS17 task in current version of MTEB is a crosslingual task. However, the leaderboard only considers the English subset. As you can see in your results file, there are multiple language subsets (denoted by To select specific subset of a task, please refer to the documentation on MTEB library. Let me know if you have any more questions. |
Thank you so much! I also wanted to know how you trained the |
Moreover, I have encountered another issue: the validation code seems to have problems when testing with other task: Error: TypeError: LLM2Vec.encode() got an unexpected keyword argument 'task_name' |
Hello,
I followed the instructions in the README to evaluate the S-LLaMA-1.3B model on the STS17 task. However, the results I obtained are significantly different from those reported in the paper.
Command Used:
python experiments/mteb_eval.py --model_name McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-unsup-simcse \ --task_name STS17 \ --task_to_instructions_fp test_configs/mteb/task_to_instructions.json \ --output_dir results
Results:
{
"dataset_revision": "faeb762787bd10488a50c8b5be4a3b82e411949c",
"evaluation_time": 57.001840114593506,
"kg_co2_emissions": null,
"mteb_version": "1.15.7",
"scores": {
"test": [
{
"cosine_pearson": 0.8068578432316841,
"cosine_spearman": 0.8169906637042349,
"euclidean_pearson": 0.7955408874118524,
"euclidean_spearman": 0.7998558030435642,
"hf_subset": "en-en",
"languages": [
"eng-Latn"
],
"main_score": 0.8169906637042349,
"manhattan_pearson": 0.8035899663153354,
"manhattan_spearman": 0.8065731327735807,
"pearson": 0.8068578432316841,
"spearman": 0.8169906637042349
},
{
"cosine_pearson": 0.15780754110461276,
"cosine_spearman": 0.12083310897735632,
"euclidean_pearson": 0.030643892007343198,
"euclidean_spearman": 0.03322321405535968,
"hf_subset": "en-ar",
"languages": [
"eng-Latn",
"ara-Arab"
],
"main_score": 0.12083310897735632,
"manhattan_pearson": -0.023743020301998822,
"manhattan_spearman": -0.019246085009935885,
"pearson": 0.15780754110461276,
"spearman": 0.12083310897735632
},
{
"cosine_pearson": 0.6054351354779678,
"cosine_spearman": 0.6162430244748419,
"euclidean_pearson": 0.40173825736167984,
"euclidean_spearman": 0.3933921991621478,
"hf_subset": "en-de",
"languages": [
"eng-Latn",
"deu-Latn"
],
"main_score": 0.6162430244748419,
"manhattan_pearson": 0.457962615505441,
"manhattan_spearman": 0.4838933699573373,
"pearson": 0.6054351354779678,
"spearman": 0.6162430244748419
},
{
"cosine_pearson": 0.7618940413983072,
"cosine_spearman": 0.7911176554969499,
"euclidean_pearson": 0.7687817432571042,
"euclidean_spearman": 0.7859527624136942,
"hf_subset": "es-es",
"languages": [
"spa-Latn"
],
"main_score": 0.7911176554969499,
"manhattan_pearson": 0.7794036752505948,
"manhattan_spearman": 0.7938492963891066,
"pearson": 0.7618940413983072,
"spearman": 0.7911176554969499
},
{
"cosine_pearson": 0.48420847537315587,
"cosine_spearman": 0.49317078071750625,
"euclidean_pearson": 0.3563428734227356,
"euclidean_spearman": 0.3584458165353449,
"hf_subset": "nl-en",
"languages": [
"nld-Latn",
"eng-Latn"
],
"main_score": 0.49317078071750625,
"manhattan_pearson": 0.4198271485260075,
"manhattan_spearman": 0.3836819578854696,
"pearson": 0.48420847537315587,
"spearman": 0.49317078071750625
},
{
"cosine_pearson": 0.5288823574377371,
"cosine_spearman": 0.562709079229829,
"euclidean_pearson": 0.2993909341777525,
"euclidean_spearman": 0.2992292640046535,
"hf_subset": "es-en",
"languages": [
"spa-Latn",
"eng-Latn"
],
"main_score": 0.562709079229829,
"manhattan_pearson": 0.424622955248226,
"manhattan_spearman": 0.4414351300043983,
"pearson": 0.5288823574377371,
"spearman": 0.562709079229829
},
{
"cosine_pearson": 0.47180324561704823,
"cosine_spearman": 0.5279642783201307,
"euclidean_pearson": 0.5100329065437332,
"euclidean_spearman": 0.5196472282696352,
"hf_subset": "ko-ko",
"languages": [
"kor-Hang"
],
"main_score": 0.5279642783201307,
"manhattan_pearson": 0.5156613233195979,
"manhattan_spearman": 0.5229021826790656,
"pearson": 0.47180324561704823,
"spearman": 0.5279642783201307
},
{
"cosine_pearson": 0.45567578510194695,
"cosine_spearman": 0.4759593706055199,
"euclidean_pearson": 0.5070517972856654,
"euclidean_spearman": 0.48261827460057777,
"hf_subset": "ar-ar",
"languages": [
"ara-Arab"
],
"main_score": 0.4759593706055199,
"manhattan_pearson": 0.5177089933394046,
"manhattan_spearman": 0.4914281145552357,
"pearson": 0.45567578510194695,
"spearman": 0.4759593706055199
},
{
"cosine_pearson": 0.15581738888547184,
"cosine_spearman": 0.1495641486110853,
"euclidean_pearson": 0.1349394931764104,
"euclidean_spearman": 0.13081927359268367,
"hf_subset": "en-tr",
"languages": [
"eng-Latn",
"tur-Latn"
],
"main_score": 0.1495641486110853,
"manhattan_pearson": 0.12970038975673026,
"manhattan_spearman": 0.14147949612647964,
"pearson": 0.15581738888547184,
"spearman": 0.1495641486110853
},
{
"cosine_pearson": 0.5667402604578623,
"cosine_spearman": 0.5826079418593232,
"euclidean_pearson": 0.38450094531647017,
"euclidean_spearman": 0.40102938983888436,
"hf_subset": "fr-en",
"languages": [
"fra-Latn",
"eng-Latn"
],
"main_score": 0.5826079418593232,
"manhattan_pearson": 0.4720522029904462,
"manhattan_spearman": 0.47395940988793417,
"pearson": 0.5667402604578623,
"spearman": 0.5826079418593232
},
{
"cosine_pearson": 0.5427131797486067,
"cosine_spearman": 0.5631847685301092,
"euclidean_pearson": 0.34861333517971227,
"euclidean_spearman": 0.32753161608389836,
"hf_subset": "it-en",
"languages": [
"ita-Latn",
"eng-Latn"
],
"main_score": 0.5631847685301092,
"manhattan_pearson": 0.40648821394761325,
"manhattan_spearman": 0.3918795987336719,
"pearson": 0.5427131797486067,
"spearman": 0.5631847685301092
}
]
},
"task_name": "STS17"
}
Could you please help me understand why there is such a discrepancy? Is there any additional setup or configuration that I might have missed?
The text was updated successfully, but these errors were encountered: