This is the official implementation of the paper "Causal Discovery in Knowledge Graphs by Exploiting Asymmetric Properties of Non-Gaussian Distributions". We use many existing open source libraries for implementing the method proposed in the paper. Please follow the setup instructions to install the requirements and get the code running.
For running the experiments, we created a new environment using Conda, with python version 3.8.5.
You can create a new environment in conda using
conda create -n <env-name> python=3.8
Once created, you can enter the new environment by conda activate <env-name>
You can then install all the required dependencies by running this command:
pip install -r requirements.txt
If you want to train your own tucker embeddings with custom hyperparameters follow these steps, else run the hybrid algorithm.
- Head over to pykg2vec and follow the instructions to install the pykg2vec package for training custom embeddings.
cd
into the examples folder of the cloned pykg2vec repository.- To run the pykg2vec embedding with the same hyperparameters run the command
python train.py -exp True -mn TuckER -ds freebase15k_237 -hpf custom_hp.yaml
in the examples folder. This creates the embeddings and stores them in the/datasets/dataset-name/embeddings
folder as.tsv
files. You might have to include the full path to thecustom_hp.yaml
file included in this repo. - Depending on the version of
pykg2vec
you may have to add additional details to the.yaml
file located in thesite-packages
of the conda environment (installation location forpykg2vec
). - Once the custom embeddings are trained, you can follow the next steps to execute the algorithm for causal discovery.
- Before running the project, check if the required embedding (if custom training is done) are located in the same folder as that of the script.
- You can run the hybrid algorithm by running the following command in the project folder where
hybrid.py
is located.python hybrid.py -dataset fb15k-237 -algorithm DirectLiNGAM -plot True
- The output of the above command will be a text file
results_hybrid.txt
which contains the execution time, the mean p-value and the causal order. It also plots the Directed Acyclic Graph of the causal order output by the algorithm.