This repository contains the data and code of Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
In the dataset folder, we have the dataset constructed by humans (CR WSC-H) and dataset constructed by machines (CR WSC-M) , which are wsc_273_annotated_final.csv and generated_modify_tq.csv.
In CR WSC-H (wsc_273_annotated_final.csv), the first seven rows are basic question information and answers from WSC. The row Q is the concept and row R is the text modified if we can not replace the original answers with concept. The row H-M are the results of gpt3 and the analysis of the question.
In CR WSC-M (generated_modify_tq.csv), the text is the original question. The entity is the result of LLM. The use means whether the entities generated by LLMs is adversarial enough.
In code folder, we have the code to construct the dataset and the code to evaluate different methods.
In wsc_get_more.ipynb, we get more questions to construct the dataset.
In Model_wsc_H.ipynb and Model_wsc_M.ipynb, we evaluate the performance of different methods on the CR WSC-H and CR WSC-M.
Please kindly cite the following paper if you found our dataset and code helpful!
@misc{han2024conceptreversedwinogradschemachallenge,
title={Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction},
author={Kaiqiao Han and Tianqing Fang and Zhaowei Wang and Yangqiu Song and Mark Steedman},
year={2024},
eprint={2410.12040},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.12040},
}