Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias

Source code related to the AAAI22 paper:

Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias. Jonas Schouterden, Jessa Bekker, Jesse Davis, Hendrik Blockeel.

Abstract

The following is the abstract of our paper:

Methods for Knowledge Base Completion (KBC) reason about a knowledge base (KB) in order to derive new facts that should be included in the KB. This is challenging for two reasons. First, KBs only contain positive examples. This complicates model evaluation which needs both positive and negative examples. Second, those facts that were selected to be included in the knowledge base, are most likely not an i.i.d. sample of the true facts, due to the way knowledge bases are constructed. In this paper, we focus on rule-based approaches, which traditionally address the first challenge by making assumptions that enable identifying negative examples, which in turn makes it possible to compute a rule’s confidence or precision. However, they largely ignore the second challenge, which means that their estimates of a rule’s confidence can be biased. This paper approaches rule-based KBC through the lens of PU learning, which can cope with both challenges. We make three contributions. (1) We provide a unifying view that formalizes the relationship between multiple existing confidences measures based on (i) what assumption they make about and (ii) how their accuracy depends on the selection mechanism. (2) We introduce two new confidence measures that can mitigate known biases by using propensity scores that quantify how likely a fact is to be included the KB. (3) We show through theoretical and empirical analysis that taking the bias into account improves the confidence estimates, even when the propensity scores are not known exactly.

Contents of this repository

artificial_bias_experiments: Python source code root module for running the experiments & generating images about those experiments.
dask_utils: Python code for using dask when running the experiments.
data/yago3_10: The yago3-10 dataset. This data directory is also used as root for everything generated when running the experiments.
external/AMIE3: External dependency: the AMIE-jar. See also the AMIE3 repository.
images: Root directory for all images.
kbc_pul: Python source code root module containing the core of this repository: everything related to rules, knowledge bases, confidence metrics and selection mechanisms.
notebooks: Jupyter notebooks as illustration on how to do some things.
notes: Markdown files describing this repository.
paper: PDF of the AAAI paper and its appendices.
paper_latex_tables: Tables used in the paper in LaTex.
amie_dir.json: Settings file used by our AMIE Python wrapper pointing to the AMIE jar.
LICENSE
README

Installation

Requirements

Create a fresh Python3 environment (3. or higher) and install the following packages:

jupyter: for the notebooks.
pandas: for representing the KB.
problog : used for its parsing functionalty, i.e. parsing Prolog clauses from their string representation
pylo2: see below
matplotlib: plotting
seaborn: plotting.
tqdm: pretty status bars.
unidecode: used when cleaning data.
tabulate: for pretty table printouts
dask.delayed and dask.distributed: for running experiments using dask

Installing Pylo2:

We use data structures from Pylo2 to represent rules as Prolog clauses. More specifically, Pylo2 data structures from src/pylo/language/lp are often used. To install Pylo2 in your Python environment, first clone it:

 git clone [email protected]:sebdumancic/pylo2.git
 cd pylo2

Note that Pylo has a lot of functionality we don't need. As we don't Pylo´s bindings to Prolog engines, we don't need those bindings. To install Pylo2 without these bindings, modify its setup.py by ading right before the line:

print(f"Building:\n\tGNU:{build_gnu}\n\tXSB:{build_xsb}\n\tSWIPL:{build_swi}")

the following lines:

build_gnu = None
build_xsb = None
build_swi = None

Then, install Pylo in the current environment using

python setup.py install

Notebooks

Different notebooks are provided:

Running the experiments

For a description on how to run the experiments, see here.

Generating the tables in the paper

For instructions on how to generate the tables in the paper from the results, see here.

Generating the images in the paper

Instructions on how to generate the images in the paper can be found here.

Preparation of the "ideal" Yago3_10 KB

In the paper, the experiments are run on a cleaned version of the yago3-10 datasets. The cleaning was done to remove unicode characters that might be incompatible with older prolog engines, using ./notebooks/yago3_10/data_exploration_and_preparation/yago3_10_data_cleaning.ipynb

The original data was obtained using AmpliGraph, but can also be found under ./data/yago3_10/original.

The cleaned version can be found under ./data/yago3_10/cleaned_csv.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias

Table of Contents

Abstract

Contents of this repository

Installation

Requirements

Installing Pylo2:

Notebooks

Running the experiments

Generating the tables in the paper

Generating the images in the paper

Preparation of the "ideal" Yago3_10 KB

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
artificial_bias_experiments		artificial_bias_experiments
dask_utils		dask_utils
data/yago3_10		data/yago3_10
external/AMIE3		external/AMIE3
images/github		images/github
kbc_pul		kbc_pul
notebooks		notebooks
notes		notes
paper		paper
paper_latex_tables		paper_latex_tables
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
amie_dir.json		amie_dir.json

License

ML-KULeuven/KBC-as-PU-Learning

Folders and files

Latest commit

History

Repository files navigation

Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias

Table of Contents

Abstract

Contents of this repository

Installation

Requirements

Installing Pylo2:

Notebooks

Running the experiments

Generating the tables in the paper

Generating the images in the paper

Preparation of the "ideal" Yago3_10 KB

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages