You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need data that we can use to evaluate our models according to some evaluation metric (#5) during initial development.
This will most likely be some form of (query, relevant results) pairs. These should probably be fairly exhaustive, so for this we might also consider only working with a subset of all datasets. This has the added benefit of (hopefully) making our evaluations faster, too.
Another idea is to use LLMs to judge the relevancy of query results. But this has the danger of ignoring recall -- not realizing important documents were not retrieved.
The text was updated successfully, but these errors were encountered:
@LiinXemmon will make a script/app that will help us add labels to (query, dataset) pairs. For each query we need to be able to quickly label which datasets are relevant. The user could for example be prompted with a query, and then cycle through datasets and their description, indicating for each if they are relevant. Alternatively, you could take the other approach where a single dataset is presented and you cycle through each query to label the pairs. It's also fine if you think of an even better way to do this.
except for subhaditya's file, since that one seemed broken. @SubhadityaMukherjee can you upload your file to that directory? Thanks. I'll close this issue.
We need data that we can use to evaluate our models according to some evaluation metric (#5) during initial development.
This will most likely be some form of (query, relevant results) pairs. These should probably be fairly exhaustive, so for this we might also consider only working with a subset of all datasets. This has the added benefit of (hopefully) making our evaluations faster, too.
Another idea is to use LLMs to judge the relevancy of query results. But this has the danger of ignoring recall -- not realizing important documents were not retrieved.
The text was updated successfully, but these errors were encountered: