This repository contains the implementation for our paper (link):
Robust Knowledge Graph Completion with Stacked Convolutions and a Student Re-Ranking Network
Justin Lovelace, Denis Newman-Griffis, Shikhar Vashishth, Jill Fain Lehman, and Carolyn Penstein Rosé
Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing
(ACL-IJCNLP) 2021
Our work was performed with Python 3.8. The dependencies can be installed from requirements.txt
.
- We conduct our work upon the existing FB15K-237 and CN100K datasets. We additionally developed the FB15K-237-Sparse and SNOMED-CT Core datasets for our work.
- Running
./scripts/prepare_datasets.sh
will unzip the dataset files and process them for use by our models. - Because the SNOMED-CT Core dataset was derived from the UMLS, we cannot directly release the dataset files. See here for full instructions for how to recreate the dataset.
- The BERT embeddings can be downloaded from here. The
bert_emb.pt
files should be stored in the corresponding dataset directories, e.g.data/CN100K/bert_emb.pt
We provide scripts to train our proposed ranking model, denoted as BERT-ResNet in our paper, for all four datasets.
- FB15K-237:
./scripts/train_resnet_fb15k237.sh
- FB15K-237-Sparse:
./scripts/train_resnet_fb15k237_sparse.sh
- CN100K:
./scripts/train_resnet_cn100k.sh
- SNOMED-CT Core
./scripts/train_resnet_snomed.sh
The re-ranking models can only be trained after the ranking model for the corresponding dataset has already finished training. First, download the BERT checkpoints used for our training from here. They should be unzipped and stored in reranking/bert_ckpts
. A re-ranking model can then be trained with the provided scripts similarly to above.
- FB15K-237:
./scripts/train_reranking_fb15k237.sh
- FB15K-237-Sparse:
./scripts/train_reranking_fb15k237_sparse.sh
- CN100K:
./scripts/train_reranking_cn100k.sh
- SNOMED-CT Core
./scripts/train_reranking_snomed.sh
Pretrained ranking models can be downloaded from here. After unzipping them in the robust-kg-completion/pretrained_models
directory, they can be evaluated by running ./scripts/eval_pretrained_ranking_model.sh {DATASET}
where {DATASET}
is one of SNOMED_CT_CORE
, FB15K_237
, FB15K_237_SPARSE
, or CN100K
.
Pretrained re-ranking models can be downloaded from here. After unzipping them in the robust-kg-completion/reranking/pretrained_reranking_models
directory, they can be evaluated by running the following commands.
- FB15K-237:
./scripts/eval_pretrained_reranking_model.sh FB15K_237 0.75
- FB15K-237-Sparse:
./scripts/eval_pretrained_reranking_model.sh FB15K_237_SPARSE 0.75
- CN100K:
./scripts/eval_pretrained_reranking_model.sh CN100K 1.0
- SNOMED-CT Core
./scripts/eval_pretrained_reranking_model.sh SNOMED_CT_CORE 0.5
@inproceedings{lovelace-etal-2021-robust,
title={Robust Knowledge Graph Completion with Stacked Convolutions and a Student Re-Ranking Network},
author={Justin Lovelace and Denis Newman-Griffis and Shikhar Vashishth and Jill Fain Lehman and Carolyn Penstein Rosé},
booktitle = {Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)},
month = {August},
year = {2021}
eprint={2106.06555},
archivePrefix={arXiv},
primaryClass={cs.LG}
}