Towards Understanding the Geometry of Knowledge Graph Embeddings

This is the code for generating the results in the paper "Towards Understanding the Geometry of Knowledge Graph Embeddings" to be presented at the 56th Annual Meeting of the Association for Computational Linguistics at Melbourne, July 15 to July 20, 2018.

Required data format

The analysis requires pre-trained KG embeddings along with the KG triples data. The KG triples data should be a pickle (python2.7) file named "<dataset>.<method>.bin". It should contain the following key values:

'train_subs': list of KG triples used for training, in (head_entity_index, tail_entity_index, relation_index) format.
'valid_subs': list of KG triples used for validation, in (head_entity_index, tail_entity_index, relation_index) format.
'test_subs': list of KG triples used for testing, in (head_entity_index, tail_entity_index, relation_index) format.
'relations': list of KG relations.
'entities': list of KG entities.

The KG embeddings should be stored as pickle (python2.7) file named "<dataset>.<method>.n<no-of-negatives>.d<dimension>.p". It should contain following key values:

'rNames' : list of KG relations.
'eNames' : list of KG entities.
'E' : numpy array of size (numEntities X dimension) containing entity embeddings.
'R' : numpy array of size (numRelations X dimension) containing relation embeddings.
'model' : model name.
'fpos test' : ranks of head and tail entities obtained during link prediction. It is required for performance analysis. It should be a dictionary with relation index as keys, e.g. {rel_id1 :{'head':[head_rank_1, head_rank_2, ...], 'tail':[tail_rank_1, tail_rank_2, ...]}}.

Running type analysis

For running type analysis (Section 5.1 in the paper), please run the following command:

python typeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python typeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result (for generating the plots)

Running negative analysis

For running negative analysis (Section 5.2 in the paper), please run the following command:

python negativeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python negativeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result (for generating the plots)

Running dimension analysis

For running dimension analysis (Section 5.3 in the paper), please run the following command:

python dimensionAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python dimensionAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result (for generating the plots)

Running performance analysis

For running performance analysis (Section 5.4 in the paper), please run the following command:

python perfAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> -p <performance-file>
python perfAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result -p <performance-file> (for generating the plots)

Here the <performance-file> is a pickled file containing performance of different models. It is a nested dictionary and perf['<method>'][<dimension>][<numNegatives>] should contain performance {'MRR':<MRR-value>, 'MR':<MR-value>, 'Hits@10':<Hits@10-value>} for <method> with vector size <dimension> and <numNegatives> number of negative samples.

Citation

If you find our work or this codebase useful, please cite us:

@inproceedings{chandrahas-etal-2018-towards,
  title = "Towards Understanding the Geometry of Knowledge Graph Embeddings",
  author = "{Chandrahas}  and
    Sharma, Aditya  and
    Talukdar, Partha",
  booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = jul,
  year = "2018",
  address = "Melbourne, Australia",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/P18-1012",
  pages = "122--131",
  }

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
aggregateAnalysis.py		aggregateAnalysis.py
analysis.py		analysis.py
data.zip		data.zip
dimensionAnalysis.py		dimensionAnalysis.py
lengthAnalysis.py		lengthAnalysis.py
model.py		model.py
negativeAnalysis.py		negativeAnalysis.py
perfAnalysis.py		perfAnalysis.py
stats.py		stats.py
triples.py		triples.py
typeAnalysis.py		typeAnalysis.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Understanding the Geometry of Knowledge Graph Embeddings

Required data format

Running type analysis

Running negative analysis

Running dimension analysis

Running performance analysis

Citation

About

Releases

Packages

Contributors 4

Languages

License

malllabiisc/kg-geometry

Folders and files

Latest commit

History

Repository files navigation

Towards Understanding the Geometry of Knowledge Graph Embeddings

Required data format

Running type analysis

Running negative analysis

Running dimension analysis

Running performance analysis

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages