This is the code for generating the results in the paper "Towards Understanding the Geometry of Knowledge Graph Embeddings" to be presented at the 56th Annual Meeting of the Association for Computational Linguistics at Melbourne, July 15 to July 20, 2018.
The analysis requires pre-trained KG embeddings along with the KG triples data.
The KG triples data should be a pickle (python2.7) file named "<dataset>.<method>.bin"
. It should contain the following key values:
'train_subs'
: list of KG triples used for training, in (head_entity_index, tail_entity_index, relation_index) format.'valid_subs'
: list of KG triples used for validation, in (head_entity_index, tail_entity_index, relation_index) format.'test_subs'
: list of KG triples used for testing, in (head_entity_index, tail_entity_index, relation_index) format.'relations'
: list of KG relations.'entities'
: list of KG entities.
The KG embeddings should be stored as pickle (python2.7) file named "<dataset>.<method>.n<no-of-negatives>.d<dimension>.p"
. It should contain following key values:
'rNames'
: list of KG relations.'eNames'
: list of KG entities.'E'
: numpy array of size (numEntities X dimension) containing entity embeddings.'R'
: numpy array of size (numRelations X dimension) containing relation embeddings.'model'
: model name.'fpos test'
: ranks of head and tail entities obtained during link prediction. It is required for performance analysis. It should be a dictionary with relation index as keys, e.g.{rel_id1 :{'head':[head_rank_1, head_rank_2, ...], 'tail':[tail_rank_1, tail_rank_2, ...]}}
.
For running type analysis (Section 5.1 in the paper), please run the following command:
python typeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python typeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result
(for generating the plots)
For running negative analysis (Section 5.2 in the paper), please run the following command:
python negativeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python negativeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result
(for generating the plots)
For running dimension analysis (Section 5.3 in the paper), please run the following command:
python dimensionAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python dimensionAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result
(for generating the plots)
For running performance analysis (Section 5.4 in the paper), please run the following command:
-
python perfAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> -p <performance-file>
-
python perfAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result -p <performance-file>
(for generating the plots)Here the
<performance-file>
is a pickled file containing performance of different models. It is a nested dictionary andperf['<method>'][<dimension>][<numNegatives>]
should contain performance{'MRR':<MRR-value>, 'MR':<MR-value>, 'Hits@10':<Hits@10-value>}
for<method>
with vector size<dimension>
and<numNegatives>
number of negative samples.
If you find our work or this codebase useful, please cite us:
@inproceedings{chandrahas-etal-2018-towards,
title = "Towards Understanding the Geometry of Knowledge Graph Embeddings",
author = "{Chandrahas} and
Sharma, Aditya and
Talukdar, Partha",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2018",
address = "Melbourne, Australia",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P18-1012",
pages = "122--131",
}