PPRSSM

Personalized PageRank using Semantic Similarity Measures

Getting started

This is the code used to run our experiments for the paper "PPR-SSM: Personalized PageRank and Semantic Similarity Measures for Entity Linking".

The code has three steps:

Generating candidates file
Running PPR algorithm
Analyze results

The code for each gold standard is organized on its separate directoy (hpo_src, chebi_src, and go_src). The main script of each gold standard are ones starting with "parse". The others have helper functions to generate and process data.

Docker image

You can build a docker image using the Dockerfile provided on this repository or download it from dockerhub: docker pull andrelamurias/pprssm

Data

We used the following corpora:

HPO GSC+ (https://github.com/lasigeBioTM/IHP/raw/master/GSC%2B.rar)
ChEBI patents corpus (provided with this repo)
CRAFT (https://github.com/UCDenver-ccp/CRAFT/releases/tag/3.0 - put brat files inside CRAFT/GO_BP and CRAFT/GO_CC)

And the following ontologies:

HPO
ChEBI
Gene Ontology

For each ontology, it is necessary a OBO file and a .db file processed by DiShIn. These can be obtained with the get_data.sh script.

Usage

Generate candidates for corpus

First run dishin_app.py with flask:

export FLASK_APP=dishin_app.py
export DISHIN_DB=chebi.db
flask run &

Args:

min distance
min similarity
corpus dir (or ontology name for Gene Ontology entities in CRAFT corpus: "GO_BP" for GO Biological Process entities, "GO_CC" for GO Cellular Component entities)

Example:

python chebi_src/parse_chebi_corpus.py 1 0.5 ChebiPatents/

Run PPR algorithm

Run the PPRforNED script:

javac ppr_for_ned_chebi.java
java ppr_for_ned_chebi resnik_dishin

For GO entities in CRAFT corpus change to the desired subontology in the ppr_for_ned_go.java script

Calculate metrics

Process the results to get more results than what is given by PPRforNED:

python src/process_results.py chebi

Example output:

one candidate 431
correct 909
wrong 105
total 1014
accuracy: 0.8964497041420119
accuracy (multiple candidates): 0.8198970840480274

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ChebiPatents		ChebiPatents
candidates		candidates
chebi_src		chebi_src
go_src		go_src
hpo_src		hpo_src
src		src
temp		temp
Dockerfile		Dockerfile
GO_BP_pop		GO_BP_pop
GO_CC_pop		GO_CC_pop
README.md		README.md
chebi_pop		chebi_pop
dishin_app.py		dishin_app.py
get_data.sh		get_data.sh
hpo_pop		hpo_pop
ppr_for_ned.java		ppr_for_ned.java
ppr_for_ned_chebi.class		ppr_for_ned_chebi.class
ppr_for_ned_chebi.java		ppr_for_ned_chebi.java
ppr_for_ned_go.java		ppr_for_ned_go.java
ppr_for_ned_hpo.java		ppr_for_ned_hpo.java
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPRSSM

Getting started

Docker image

Data

Usage

Generate candidates for corpus

Run PPR algorithm

Calculate metrics

About

Releases

Packages

Contributors 2

Languages

lasigeBioTM/PPRSSM

Folders and files

Latest commit

History

Repository files navigation

PPRSSM

Getting started

Docker image

Data

Usage

Generate candidates for corpus

Run PPR algorithm

Calculate metrics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages