Skip to content

Adversarial WSC dataset and Abstraction-of-thought reasoning

Notifications You must be signed in to change notification settings

HKUST-KnowComp/Adv-WSC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Overview

This repository contains the data and code of Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction

Dataset

In the dataset folder, we have the dataset constructed by humans (CR WSC-H) and dataset constructed by machines (CR WSC-M) , which are wsc_273_annotated_final.csv and generated_modify_tq.csv.

In CR WSC-H (wsc_273_annotated_final.csv), the first seven rows are basic question information and answers from WSC. The row Q is the concept and row R is the text modified if we can not replace the original answers with concept. The row H-M are the results of gpt3 and the analysis of the question.

In CR WSC-M (generated_modify_tq.csv), the text is the original question. The entity is the result of LLM. The use means whether the entities generated by LLMs is adversarial enough.

Code

In code folder, we have the code to construct the dataset and the code to evaluate different methods.

In wsc_get_more.ipynb, we get more questions to construct the dataset.

In Model_wsc_H.ipynb and Model_wsc_M.ipynb, we evaluate the performance of different methods on the CR WSC-H and CR WSC-M.

Citation

Please kindly cite the following paper if you found our dataset and code helpful!

@misc{han2024conceptreversedwinogradschemachallenge,
      title={Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction}, 
      author={Kaiqiao Han and Tianqing Fang and Zhaowei Wang and Yangqiu Song and Mark Steedman},
      year={2024},
      eprint={2410.12040},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.12040}, 
}

About

Adversarial WSC dataset and Abstraction-of-thought reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published