CRAG_Metric #11

wtc9806 · 2024-04-08T00:35:29Z

Hi authors,

Thanks for the great job. I am a little bit confused about eval.py. In the paper, accuracy is used as the evaluation metric for arc_challenge, but in the actual code, match is indeed used as the metric. Are these two the same? When testing accuracy, why is there an output key in the data?

Thanks.

HuskyInSalt · 2024-04-08T04:34:25Z

Hi @wtc9806 , you can find that arc_challenge is a dataset that consists of questions with multiple choices. The current evaluation method is to match the predicted option with the golden label, which also means the accuracy of the predictions. In fact, the term accuracy we used in the paper and the metric functions in the evaluation code are both consistent with Self-RAG.

wtc9806 · 2024-04-09T06:06:51Z

Got it! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRAG_Metric #11

CRAG_Metric #11

wtc9806 commented Apr 8, 2024

HuskyInSalt commented Apr 8, 2024

wtc9806 commented Apr 9, 2024

CRAG_Metric #11

CRAG_Metric #11

Comments

wtc9806 commented Apr 8, 2024

HuskyInSalt commented Apr 8, 2024

wtc9806 commented Apr 9, 2024