Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get data that we can use to compute our evaluation metrics #6

Closed
Tracked by #4
PGijsbers opened this issue Jul 5, 2024 · 4 comments
Closed
Tracked by #4

Get data that we can use to compute our evaluation metrics #6

PGijsbers opened this issue Jul 5, 2024 · 4 comments
Assignees
Labels
evaluation about evaluation model results
Milestone

Comments

@PGijsbers
Copy link
Member

PGijsbers commented Jul 5, 2024

We need data that we can use to evaluate our models according to some evaluation metric (#5) during initial development.

This will most likely be some form of (query, relevant results) pairs. These should probably be fairly exhaustive, so for this we might also consider only working with a subset of all datasets. This has the added benefit of (hopefully) making our evaluations faster, too.

Another idea is to use LLMs to judge the relevancy of query results. But this has the danger of ignoring recall -- not realizing important documents were not retrieved.

@PGijsbers PGijsbers changed the title Data that we can use to compute those metrics Get data that we can use to compute our evaluation metrics Jul 5, 2024
@PGijsbers PGijsbers added the evaluation about evaluation model results label Jul 5, 2024
@PGijsbers PGijsbers added this to the Search milestone Jul 5, 2024
@PGijsbers
Copy link
Member Author

@LiinXemmon will make a script/app that will help us add labels to (query, dataset) pairs. For each query we need to be able to quickly label which datasets are relevant. The user could for example be prompted with a query, and then cycle through datasets and their description, indicating for each if they are relevant. Alternatively, you could take the other approach where a single dataset is presented and you cycle through each query to label the pairs. It's also fine if you think of an even better way to do this.

@PGijsbers
Copy link
Member Author

People have been labeling the data with the tool (see tools). We are just waiting for the files to be shared.

@PGijsbers
Copy link
Member Author

The merged and processed data is here: https://github.com/openml-labs/ai_search/blob/main/data/evaluation/query_key_dict.json

We should also add the individual files, should we decide to do something with them later.

@PGijsbers
Copy link
Member Author

Added individual files here https://github.com/openml-labs/ai_search/tree/main/tools/data

except for subhaditya's file, since that one seemed broken. @SubhadityaMukherjee can you upload your file to that directory? Thanks. I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluation about evaluation model results
Projects
None yet
Development

No branches or pull requests

2 participants