Why Machine Reading Comprehension Models Learn Shortcuts?

Repo for 'Why Machine Reading Comprehension Models Learn Shortcuts?', Findings of ACL 2021

Arxiv Preprint: https://arxiv.org/abs/2106.01024

Data

The synthetic datasets proposed in our paper are in the dataset.zip file.

Size

	Train (Pairs)	Dev (Pairs)
Question Word Matching	6306	766
Simple Matching	7562	952

Format

The format of each dataset is as follows (same as the format of SQuAD). The version of the question (challenging or shortcut) is denoted in the key qtype.

{
  "version": "simple_matching_dev_challenging",  		// dataset version
  "data": [
    {
      "paragraphs": [
        {
          "cid": "squad1.1-dev-1_h", 				// context id
          "context": "As this was the 50th Super Bowl...",
          "qas": [
            {
              "id": "56be8e613aeaaa14008c90d2_h", // question id
              "question": "What day was the game played on?",
              "answers": [
                {
                  "text": "February 7, 2016",
                  "answer_start": 505 		// the start position of the answer in the context
                }
              ],
              "qtype": "challenging" 			// question type (challenging/shortcut)
            }
          ]
        }
      ]
    }
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
dataset.zip		dataset.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why Machine Reading Comprehension Models Learn Shortcuts?

Data

Size

Format

About

luciusssss/why-learn-shortcut

Folders and files

Latest commit

History

Repository files navigation

Why Machine Reading Comprehension Models Learn Shortcuts?

Data

Size

Format

About

Resources

Stars

Watchers

Forks