- Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models - arXiv 2020. (Updated version on Cambridge.org)
- Shortcut learning in deep neural networks - Nature Machine Intelligence 2020.
- Measure and Improve Robustness in NLP Models: A Survey - NAACL 2022.
- Shortcut Learning of Large Language Models in Natural Language Understanding - Communications of the ACM (CACM) 2023
Paper | Add/Edit Level | Creation Method | Original Dataset | Naturalness |
---|---|---|---|---|
Consistency Ribeiro et al. 2019 | question | auto | SQuAD | Yes |
Contrast Sets Gardner et al. 2020 | word | experts | DROP, Quoref, .. | No |
SAM Schlegel et al. 2021 | word | auto | SQuAD, HotpotQA, DROP, NewsQA | No |
Break, Perturb, Build Geva et al. 2022 | question | auto | DROP, HotpotQA, IIRC | Yes |
Unanswerable Questions | ||||
Not-answerable Questions Nakanishi et al. 2018 | context | auto | SQuAD | Yes |
SQuADRUn Rajpurkar et al. 2018 | question | crowdworkers | SQuAD | Yes |
Disconnected Reasoning Trivedi et al. 2020 | context | auto | HotpotQA | Yes |
MuSiQue Trivedi et al. 2022 | context | auto | MuSiQue-Ans | Yes |
Paper | Form | Purpose | Task | Github | Dataset | Note |
---|---|---|---|---|---|---|
Inoue et al. 2020 | Triple | Evaluation & Training | Derivation generation | URL | R4C | based on HotpotQA |
Ho et al. 2020 | Triple | Evaluation & Training | Evidence generation | URL | 2WikiMultiHopQA | |
Wolfson et al. 2020 | QDMR | Training | - | URL | Break it down | based on ten datasets (e.g., HotpotQA & DROP) |
Tang et al. 2021 | Sub-question | Evaluation | QA about sub-questions | URL | 1000 samples | based on HotpotQA |
Geva et al. 2021 | Sub-question | Evaluation & Training | QA about sub-questions | URL | StrategyQA | implicit questions |
Ho et al. 2022 | Sub-question | Evaluation & Training | QA about sub-questions | URL | HieraDate | only for comparison about Date information |
Trivedi et al. 2022 | Sub-question | Evaluation & Training | QA about sub-questions | URL | MuSiQue | |
Dalvi et al. 2021 | Entailment Tree | Evaluation & Training | tree generation | URL | EntailmentBank | based on ARC and WorldTree V2 |
Ribeiro et al. 2023 | a graph | Evaluation & Training | graph generation | URL | STREET | based on ARC, SCONE, GSM8K, AQUA-RAT, and AR-LSAT |