Get data that we can use to compute our evaluation metrics #6

PGijsbers · 2024-07-05T08:33:50Z

We need data that we can use to evaluate our models according to some evaluation metric (#5) during initial development.

This will most likely be some form of (query, relevant results) pairs. These should probably be fairly exhaustive, so for this we might also consider only working with a subset of all datasets. This has the added benefit of (hopefully) making our evaluations faster, too.

Another idea is to use LLMs to judge the relevancy of query results. But this has the danger of ignoring recall -- not realizing important documents were not retrieved.

PGijsbers · 2024-07-08T09:16:38Z

@LiinXemmon will make a script/app that will help us add labels to (query, dataset) pairs. For each query we need to be able to quickly label which datasets are relevant. The user could for example be prompted with a query, and then cycle through datasets and their description, indicating for each if they are relevant. Alternatively, you could take the other approach where a single dataset is presented and you cycle through each query to label the pairs. It's also fine if you think of an even better way to do this.

PGijsbers · 2024-07-17T08:08:05Z

People have been labeling the data with the tool (see tools). We are just waiting for the files to be shared.

PGijsbers · 2024-07-24T15:36:31Z

The merged and processed data is here: https://github.com/openml-labs/ai_search/blob/main/data/evaluation/query_key_dict.json

We should also add the individual files, should we decide to do something with them later.

PGijsbers · 2024-07-24T15:40:35Z

Added individual files here https://github.com/openml-labs/ai_search/tree/main/tools/data

except for subhaditya's file, since that one seemed broken. @SubhadityaMukherjee can you upload your file to that directory? Thanks. I'll close this issue.

PGijsbers mentioned this issue Jul 5, 2024

Build a search evaluation pipeline #4

Open

3 tasks

PGijsbers changed the title ~~Data that we can use to compute those metrics~~ Get data that we can use to compute our evaluation metrics Jul 5, 2024

PGijsbers added the evaluation about evaluation model results label Jul 5, 2024

PGijsbers added this to the Search milestone Jul 5, 2024

PGijsbers assigned LiinXemmon Jul 8, 2024

PGijsbers closed this as completed Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get data that we can use to compute our evaluation metrics #6

Get data that we can use to compute our evaluation metrics #6

PGijsbers commented Jul 5, 2024 •

edited

Loading

PGijsbers commented Jul 8, 2024

PGijsbers commented Jul 17, 2024

PGijsbers commented Jul 24, 2024

PGijsbers commented Jul 24, 2024

Get data that we can use to compute our evaluation metrics #6

Get data that we can use to compute our evaluation metrics #6

Comments

PGijsbers commented Jul 5, 2024 • edited Loading

PGijsbers commented Jul 8, 2024

PGijsbers commented Jul 17, 2024

PGijsbers commented Jul 24, 2024

PGijsbers commented Jul 24, 2024

PGijsbers commented Jul 5, 2024 •

edited

Loading