Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAG_Metric #11

Open
wtc9806 opened this issue Apr 8, 2024 · 2 comments
Open

CRAG_Metric #11

wtc9806 opened this issue Apr 8, 2024 · 2 comments

Comments

@wtc9806
Copy link

wtc9806 commented Apr 8, 2024

Hi authors,

Thanks for the great job. I am a little bit confused about eval.py. In the paper, accuracy is used as the evaluation metric for arc_challenge, but in the actual code, match is indeed used as the metric. Are these two the same? When testing accuracy, why is there an output key in the data?

Thanks.
微信图片_20240408083503

@HuskyInSalt
Copy link
Owner

Hi @wtc9806 , you can find that arc_challenge is a dataset that consists of questions with multiple choices. The current evaluation method is to match the predicted option with the golden label, which also means the accuracy of the predictions. In fact, the term accuracy we used in the paper and the metric functions in the evaluation code are both consistent with Self-RAG.

@wtc9806
Copy link
Author

wtc9806 commented Apr 9, 2024

Got it! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants