-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide to adding a new measure? #27
Comments
I'm happy to add this feature if it's a common enough situation. What's the case you have in mind? Is it like an experimental measure you want to build yourself? Or interfacing with some other library (perhaps sklearn's precision/recall/f1 impl or smth). I don't think a lambda definition would be useful for most of the current providers. For efficiency purposes, it's super beneficial to be able to perform operations in batch. E.g., trec_eval builds its own structure in memory for efficiently looking up query/doc pairs. A potential interface would be something like: ir_measures.define(lambda runs, qrels: xxx)
# where:
# - runs: a list of all document scores, perhaps as a dataframe or dictionary?
# - qrels: similar to runs
# - xxx would provide Metric values for every query
ir_measures.define_byquery(lambda run, qrels: xxx)
# where:
# - run: document scores for a single query, perhaps as a dataframe or dictionary?
# - qrels: similar to run (for matching query)
# - xxx would return a single Metric value; assumes mean aggregation of the scores? (Or this could be an optional argument?) I think the latter one would probably be useful in more situations. |
This was primarily for things like ROUGE in #28 My thinking was that in PyTerrier sometimes we would want to define an additional kind of measure to report in a pt.Experiment() (e.g. fairness ;-) |
I imagined that the above methods would return a measure. E.g., MyAwesomeFairnessMeasure = ir_measures.define_byquery(lambda run, qrels: xxx) which then could be used alongside other measures. But what are the situations where you'd want to define a measure but not write an optimized & shareable version of it? |
Same reason as https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html#custom-weighting-models |
Another example - I have a column in the results dataframe that I would like to summarise and report as part of the measurements. |
Makes sense. Would the define and define_byquery proposal above meet those needs? |
@cmacdonald I have a prototype of "runtime-defined" measures in the Does it look reasonable? You mention using lambdas above, and while this is supported, I struggle to find anything very meaningful to do in just an inline function (though I'm far from a pandas ninja). I'm open to alternative names for this feature as well. A similar feature is currently WIP in ir-datasets that I'm calling What these don't (yet?) support:
|
My use case has averaging row values (eg doc length) while conducting an experiment. I'll try it out in a Colab. A rank cutoff would be useful. Eg avg doc len @ 5, avg doc len@10. |
Ah, sure, so if you don't need to do any merging with the qrels, things can be easier. E.g., AvgDoclen = pt.define_byquery("AvgDoclen", lambda qrels, run: run['doc_id'].apply(index.get_doclen).mean()) Since the rank cutoff is probably common and well-defined, this could be something easy to switch on as an additional (optional) argument: AvgDoclen = pt.define_byquery("AvgDoclen", lambda qrels, run: run['doc_id'].apply(index.get_doclen).mean(), rank_cutoff=True)
AvgDoclen@5 # would automatically filter down the result list to the top 5 |
* runtime-defined measures (see #27) * reworking runtime-defined measures based on feedback from @cmacdonald - Support "cutoff" parameter (default on) - Name optional (defaults to repr of impl) - Runtime-defined measures don't get registered
Can I make a measure given a simple function like a lambda?
The text was updated successfully, but these errors were encountered: