guide to adding a new measure? #27

cmacdonald · 2021-11-26T15:14:14Z

Can I make a measure given a simple function like a lambda?

seanmacavaney · 2021-11-26T15:24:56Z

I'm happy to add this feature if it's a common enough situation. What's the case you have in mind? Is it like an experimental measure you want to build yourself? Or interfacing with some other library (perhaps sklearn's precision/recall/f1 impl or smth).

I don't think a lambda definition would be useful for most of the current providers. For efficiency purposes, it's super beneficial to be able to perform operations in batch. E.g., trec_eval builds its own structure in memory for efficiently looking up query/doc pairs.

A potential interface would be something like:

ir_measures.define(lambda runs, qrels: xxx)
# where:
#   - runs: a list of all document scores, perhaps as a dataframe or dictionary?
#   - qrels: similar to runs
#   - xxx would provide Metric values for every query

ir_measures.define_byquery(lambda run, qrels: xxx)
# where:
#  - run: document scores for a single query, perhaps as a dataframe or dictionary?
#  - qrels: similar to run (for matching query)
#  - xxx would return a single Metric value; assumes mean aggregation of the scores? (Or this could be an optional argument?)

I think the latter one would probably be useful in more situations.

cmacdonald · 2021-11-30T10:33:23Z

This was primarily for things like ROUGE in #28
In your API, presumably a measure name would have to be defined also?

My thinking was that in PyTerrier sometimes we would want to define an additional kind of measure to report in a pt.Experiment() (e.g. fairness ;-)

seanmacavaney · 2021-11-30T10:59:23Z

I imagined that the above methods would return a measure. E.g.,

MyAwesomeFairnessMeasure = ir_measures.define_byquery(lambda run, qrels: xxx)

which then could be used alongside other measures.

But what are the situations where you'd want to define a measure but not write an optimized & shareable version of it?

cmacdonald · 2021-11-30T12:56:01Z

But what are the situations where you'd want to define a measure but not write an optimized & shareable version of it?

Same reason as https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html#custom-weighting-models
To allow trying something out...

cmacdonald · 2021-12-23T17:53:36Z

But what are the situations where you'd want to define a measure but not write an optimized & shareable version of it?

Another example - I have a column in the results dataframe that I would like to summarise and report as part of the measurements.

seanmacavaney · 2021-12-23T19:04:44Z

Makes sense. Would the define and define_byquery proposal above meet those needs?

seanmacavaney · 2021-12-30T20:48:28Z

@cmacdonald I have a prototype of "runtime-defined" measures in the runtime branch. See an example usage of them in the test here.

Does it look reasonable? You mention using lambdas above, and while this is supported, I struggle to find anything very meaningful to do in just an inline function (though I'm far from a pandas ninja).

I'm open to alternative names for this feature as well. A similar feature is currently WIP in ir-datasets that I'm calling local datasets -- but unlike that feature, those are persisted to disk. These only last as long as the python interpreter is running, and I don't really see a way around that.

What these don't (yet?) support:

Parameters, e.g., MyMeasure(rel=2)@5 wouldn't be supported. this would probably need to be a third argument passed to the implementation function/lambda.
Alternative aggregators -- only mean is supported now. (This could be an additional argument to define and define_byquery?)
Alternative input formats -- only pandas dataframes currently supported
When using define_byquery, every query that appears in the run must produce exactly one score. define is more flexible and can return multiple metrics, perform whatever filtering it likes, etc. But is a little trickier to use.

cmacdonald · 2021-12-30T22:00:49Z

My use case has averaging row values (eg doc length) while conducting an experiment. I'll try it out in a Colab. A rank cutoff would be useful. Eg avg doc len @ 5, avg doc len@10.

seanmacavaney · 2021-12-31T10:12:08Z

Ah, sure, so if you don't need to do any merging with the qrels, things can be easier. E.g.,

AvgDoclen = pt.define_byquery("AvgDoclen", lambda qrels, run: run['doc_id'].apply(index.get_doclen).mean())

Since the rank cutoff is probably common and well-defined, this could be something easy to switch on as an additional (optional) argument:

AvgDoclen = pt.define_byquery("AvgDoclen", lambda qrels, run: run['doc_id'].apply(index.get_doclen).mean(), rank_cutoff=True)
AvgDoclen@5 # would automatically filter down the result list to the top 5

@cmacdonald

* runtime-defined measures (see #27) * reworking runtime-defined measures based on feedback from @cmacdonald - Support "cutoff" parameter (default on) - Name optional (defaults to repr of impl) - Runtime-defined measures don't get registered

seanmacavaney added a commit that referenced this issue Dec 30, 2021

runtime-defined measures (see #27)

f9e7b1c

seanmacavaney added a commit that referenced this issue Mar 4, 2022

runtime-defined measures (see #27)

0af30f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

guide to adding a new measure? #27

guide to adding a new measure? #27

cmacdonald commented Nov 26, 2021

seanmacavaney commented Nov 26, 2021 •

edited

Loading

cmacdonald commented Nov 30, 2021

seanmacavaney commented Nov 30, 2021

cmacdonald commented Nov 30, 2021

cmacdonald commented Dec 23, 2021

seanmacavaney commented Dec 23, 2021

seanmacavaney commented Dec 30, 2021

cmacdonald commented Dec 30, 2021

seanmacavaney commented Dec 31, 2021

guide to adding a new measure? #27

guide to adding a new measure? #27

Comments

cmacdonald commented Nov 26, 2021

seanmacavaney commented Nov 26, 2021 • edited Loading

cmacdonald commented Nov 30, 2021

seanmacavaney commented Nov 30, 2021

cmacdonald commented Nov 30, 2021

cmacdonald commented Dec 23, 2021

seanmacavaney commented Dec 23, 2021

seanmacavaney commented Dec 30, 2021

cmacdonald commented Dec 30, 2021

seanmacavaney commented Dec 31, 2021

seanmacavaney commented Nov 26, 2021 •

edited

Loading