To reproduce the results in the paper, you need to follow the process explained here from the root directory of the proect
The following process needs to be followed from the KGQAn
directory
1.Source Code and Benchmarks
- The code is available at the following GitHub Repository: https://github.com/CoDS-GCS/KGQAn.git
- It can be downloaded using the following command: git clone https://github.com/CoDS-GCS/KGQAn.git
- Benchmarks can be found in the following directory src/evaluation/data/
- QALD-9: qald-9-test-multilingual.json
- LCQuAD-1.0: lcquad-qaldformat-test2.json
- YAGO: qald9_yago100.json
- DBLP: qald9_dblp100.json
- MAG: qald9_ms100.json
-
Description of software/library dependencies
- KGQAn can run in two settings local mode and docker mode, for ease of reproducing the results we will follow the docker set up mode
- docker setup steps: https://github.com/CoDS-GCS/KGQAn/blob/docker-setup/docker_run.md
- If you want to run in the local mode: you can follow the steps specified here: https://github.com/CoDS-GCS/KGQAn#running-kgqan
- The system was mainly tested using the ubuntu OS
-
Description of hardware needed
- For KGQAn to run, we will need around 32GB of RAM to start the server at the begining, after the start up no special hardware is needed
-
Detailed fully automated scripts Choose between steps 1&2, step 1 runs in the docker mode, step 2 runs in the local mode. step3 produce the figures using the output values
-
To get the results in docker mode: docker_run.sh: - The docker script for creating the container and running the evaluation scripts of the five benchmarks and producing their results
shell ./docker_run.sh
- It consists of the following steps:- Navigating to the KGQAn directory
- Downloading the required files
- Calling
docker compose
to run the evaluation and return all results inoutput
directory - The output opip install traitletsf this script are:
evaluation_results.csv
andevaluation_response_time.csv
evaluation_result.csv
contain the results of the following the five benchmarks besides the results for the filtration and linking experiment, in the following order:- QALD
- LCQUAD-1.0
- YAGO
- DBLP
- MAG
- QALD with filtration disabled
- LCQUAD with filtration disabled
- Entity Linking
- Relation Linking
- If you encountered an error while running this script due to the
docker compose
command, you can install it from here - If you have a problem running the docker compose line because of permission errors, you can follow the solution from [here][https://docs.docker.com/engine/install/linux-postinstall/] - If you have a problem running the script because ofVersion in "./docker-compose-evaluation.yml"
Error, you can replace the 'version' parameter indocker-compose-server.yml
anddocker-compose-evaluation.yml
If you do not want to run in a docker mode then you can run in local mode using the following steps:
- To get the results in local mode local_run.sh:
- A prerequisite to running in this mode is to run the semantic affinity server using this commands:
conda activate kgqan python word_embedding/server.py 127.0.0.1 9600
./local_run.sh
- It runs each of the benchmarks evaluation python script to produce the Json output
- It evaluates the produced Json against the ground truth values
-
figures_generation.sh: - First create a virtual environment and install the required dependencies
shell conda create --name figures python==3.7 conda activate figures pip install matplotlib pandas traitlets ./figures_generation.sh
- The generation script is responsible for generating the four figures as pdf file in this directory, The generated figures are: 1. question understanding quality: The values here were obtained manually by checking the logs for KGQAn and existing systems 2. Linking quality: To produce the figure, the scores of the entity linking and relation linking are obtained fromevaluation_result.csv
. The values are organized in the following order [Entity Precision, Entity Recall, Entity F1, Relation Precision, Relation Recall, Relation F1] 3. Filtering quality: To produce the figure, the scores of QALD-no-filtration, LCQAUD-no-filtration, QALD and LCQAUD are obtained fromevaluation_result.csv
. The values are organized in the following order [QALD Precision, QALD Recall, QALD F1, LCQuAD Precision, LCQUAD Recall, LCQUAD F1] 4. Response time per benchmark: To produce the figure, the response times are obtained fromevaluation_response_time.csv
. The values are organized in the following orders [QALD, LCQuAD, YAGO, DBLP, MAG].
-
-
Documentation
- Each script contains its documentaion for all steps performed
-
A single master script:
- We divided our operation into two scripts as the results of the first is needed to get the figures from the second script
- The Question understanding failure values were obtained by manually checking the logs of KGQAn and exisiting systems for the question understanding step
- The linking values were obtained by extracting the outputs of the linking from the final SPARQL queries of all systems
- The response time values in the output file
evaluation_response_time.csv
are used inFigures/response_time.py
- The F1 score values in the output file
evaluation_results.csv
are used inFigures/filtering.py
-
An estimation of the deployment time and effort
- These estimates are obtained by running on a machine of 180 GB of RAM and 16 cores
- Downloading the required files + creating the docker image + enabling the servers ~30 minutes (depends on the internet connection speed)
- Running the five benchmarks and evaluating their results + Filtering and Linking experiments ~ 8 hours
- Producing the Figures ~5 minutes
The final outputs produced after all experiments finish running are saved in two main files:
src/evaluation/output/evaluation_results.csv
: This file contains the results of 9 Experiments: 1) QALD, 2) LCQuAD, 3)YAGO, 4) DBLP, 5)MAG, 6)QALD with filtration disabled, 7)LCQUAD with filtration disabled, 8)Entity Linking, 9)Relation Linkingsrc/evaluation/output/evaluation_response_time.csv
: Response time for running the main experiment
Here is the mapping between the Experiments and the Tables and Figures of the Paper
Table / Figure | Experiment |
---|---|
Table 3 | evaluation_results.csv : Experiments 1-5 |
Figure 7 | evaluation_response_time.csv |
Figure 9 | evaluation_results.csv : Experiments 8, 9 |
Figure 10 | evaluation_results.csv : Experiments 1,2, 6, 7 |