We proposes a question-answering (QA) benchmark for spatial reasoning on natural language text, which contains more realistic spatial phenomena not covered by prior work, and is challenging for state-of-the-art language models (LM). We propose a distant supervision method to improve on this task. Specifically, we design grammar and reasoning rules to automatically generate a spatial description of visual scenes and corresponding QA pairs. Experiments show that further pretraining LMs on these automatically generated data significantly improves LMs' capability on spatial understanding, which in turn helps to better solve two external datasets, bAbI and boolQ. We hope that this work can foster investigations into more sophisticated models for spatial reasoning over text.
SpaRTQA has two Human and Auto versions. The human version generated by human in small size and the Auto version is generated automatically by hand-crafted rules and Context Free Grammars.
We generate the train, dev, and test sets up on the same image sets in the NLVR dataset based on the given number, which is 24k for train and 4k for other sets. On average, each story contains 9 sentences (Min:3, Max: 22) and 118 tokens (Min: 66, Max: 274), and each question (on all q types) has 23 tokens (Min:6, Max: 57).
Sets (SpartQA-Human) | FB | FR | YN | CO | All |
---|---|---|---|---|---|
Test | 104 | 105 | 194 | 107 | 510 |
Train | 154 | 149 | 162 | 151 | 616 |
And,
Sets (SpartQA-Auto) | FB | FR | YN | CO | All |
---|---|---|---|---|---|
Seen Test | 3872 | 3712 | 3896 | 3594 | 15074 |
Unseen Test | 3872 | 3721 | 3896 | 3598 | 15087 |
Dev | 3842 | 3742 | 3860 | 3579 | 15023 |
Train | 23654 | 23302 | 23968 | 22794 | 93673 |
All qtypes can be cast into a sequence classification task, and the three transformer-based LMs tested in this paper, BERT , ALBERT, and XLNet, can all handle this type of tasks by classifying the representation of [CLS], a special token prepended to each target sequence. Depending on the qtype, the input sequence and how we do inference may be different.
For running each baseline At first you should install required packages. (torch, transformers (v 4.0.1)) Download the spartQA dataset from here: https://drive.google.com/file/d/1xW8abrXcX_BOkbzjrAr6UoF5KglPHQLh/view?usp=sharing Then you should create and empty dataset folder and upload dataset files in it.
After all fo this you should add the related arguments to the running command. The list of all arguments are listed below:
"--result", "Name of the result's saving file", type= str, default='test'
"--result_folder", "Name of the folder of the results file", type= str, default='SpaRT/Results'
"--model", "Name of the model's saving file", type= str, default='test'
"--model_folder", "Name of the folder of the models file", type=str, default = "SpaRT//Models"
"--dataset", "name of the dataset like spartqa", type = str, default = 'spartqa'
"--no_save", "If save the model or not", action='store_true', default = False
"--load", "For loading model", type=str
"--cuda", "The index of cuda", type=int, default=None
"--qtype", "Name of Question type. (FB, FR, CO, YN)", type=str, default = 'FB'
"--train24k", "Train on 24k data", action='store_true', default = True
"--train100k", "Train on 100k data", action='store_true', default = False
"--train500", "Train on 500 data", action='store_true', default = False
"--unseentest", "Test on unseen data", action='store_true', default = False
"--human", "Train and Test on human data", action='store_true', default = False
"--humantest", "Test on human data", action='store_true', default = False
"--dev_exists", "If development set is used", action='store_true', default = False
"--no_train", "Number of train samples", action='store_true', default = False
"--baseline", "Name of the baselines. Options are 'bert', 'xlnet', 'albert'", type=str, default = 'bert'
"--pretrain", "Name of the pretrained model. Options are 'bertqa', 'bertbc' (for bert boolean clasification). It is the same for other baselines.", type=str, default = 'bertbc'
"--con", "Testing consistency or contrast", type=str, default = 'not'
"--optim", "Type of optimizer. options 'sgd', 'adamw'.", type=str, default = 'sgd'
"--loss", "Type of loss function. options 'cross'.", type=str, default = 'cross'
"--train", "Number of train samples", type = int
"--train_log", "save the log of train if true", default = False, action='store_true'
"--start", "The start number of train samples", type = int, default = 0
"--dev", "Number of dev samples", type = int
"--dev_exist", "If development set is used", action='store_true'
"--test", "Number of test samples", type = int
"--unseen", "Number of unseen test samples", type = int
"--epochs", "Number of epochs for training", type = int, default=0
"--lr", "learning rate", type = float, default=4e-6
"--dropout", "If you want to set dropout=0", action='store_true', default = False
"--unfreeze", "unfreeze the first layeres of the model except this numbers", type=int, default = 0
"--other_var", dest='other_var', action='store', help="Other variable: classification (DK, noDK), random, fine-tune on unseen. for changing model load MLM from pre-trained model and replace other parts with new on", type=str
"--detail", "a description about the model", type = str
An example of a command is:
python main.py --qtype YN --pretrain bertqa --baseline bert --unseentest --epochs 10
Also for changing the place of results you should change the value of "result_adress", and for changing the saving models' place you should change all adresses in the torch.save parts in main.py.
Download SpartQA_Auto
Download SpartQA_Human
To cite the paper use below Bibtex:
@inproceedings{mirzaee-etal-2021-spartqa,
title = "{SPARTQA}: A Textual Question Answering Benchmark for Spatial Reasoning",
author = "Mirzaee, Roshanak and
Rajaby Faghihi, Hossein and
Ning, Qiang and
Kordjamshidi, Parisa",
booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.naacl-main.364",
doi = "10.18653/v1/2021.naacl-main.364",
pages = "4582--4598",
}