ROUGE measures? #28

cmacdonald · 2021-11-26T15:16:53Z

If QA was a stage of the pipeline, how could we measure some ROUGE metrics or similar at the end of a Pyterrier pipeline?

seanmacavaney · 2021-11-26T15:37:35Z

That would be a bit of an undertaking, given that the current architecture is built around the many-to-many nature of search result lists ((query_id, doc_id) -> score mappings) and qrels ((query_id, doc_id) -> relevance mappings). ROUGE and similar measures operate over query_id -> text mappings and query_id -> [possible text answers] mappings.

But I could see how it could work. The structure already allows for various input data formats, so this new type of mapping would just be another one. If you request a qrel-oriented measure but provide text mappings instead (or vise versa), it would just throw an error.

I'd need to familiarise myself with the landscape of these measures too. IIRC there's a ton of fragmentation there as well.

It's worth also considering limiting the scope of this tool to only qrel-oriented measures.

cmacdonald · 2021-11-30T10:31:18Z

I think with longer QA pipelines involving retrieval and other NLP techniques, e.g. conversational QA, there might be something interesting in putting that as part of pt.Experiment().

cmacdonald mentioned this issue Nov 30, 2021

guide to adding a new measure? #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROUGE measures? #28

ROUGE measures? #28

cmacdonald commented Nov 26, 2021

seanmacavaney commented Nov 26, 2021

cmacdonald commented Nov 30, 2021

ROUGE measures? #28

ROUGE measures? #28

Comments

cmacdonald commented Nov 26, 2021

seanmacavaney commented Nov 26, 2021

cmacdonald commented Nov 30, 2021