Model for biomedical Named Entity Linking improved by Relation Extraction
- Ruas, P., Lamurias, A. & Couto, F.M. Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature. J Cheminform 12, 57 (2020). https://doi.org/10.1186/s13321-020-00461-4
Build the Docker image from the Dockerfile:
docker build . --tag reel_image
Then run a docker container:
docker run -v $(pwd):/reel/ --name reel -it reel_image bash
To download all the ontology and corpora files:
chmod +x get_data.sh
./get_data.sh
If you have the ouput of a NER tool first store it in a json file with the same format as 'sample_input.json':
{
"doc_1": ["hypertension", "diabetes mellitus", "diazepam", "GABA"],
"doc_2": ["myocarditis", "heart failure", "acetaminophen"],
"doc_3": ["hepatitis", "caffeine", "adrenaline"]
}
Then apply the REEL model to link the inputed entities to ChEBI concepts:
python run.py --run_label sample_run --input_file sample_input.json -target_kb chebi -model ppr_ic --link_mode corpus_kb_link
The output will be in the file 'sample_run_results.json':
{
"doc_1":{"diazepam":"CHEBI:49575", "gaba":"CHEBI:35621"},
"doc_2":{"acetaminophen":"CHEBI:22160"},
"doc_3":{"adrenaline":"CHEBI:33568", "caffeine":"CHEBI:27732"}
}
There are 3 target knowledge bases available: 'chebi', 'medic' and 'ctd-chem'.
To see more info about the input arguments:
python run.py -h
If instead you want to apply the baseline model run on the same input file:
python run.py --input_file sample_entities.json --target_kb chebi -model baseline
To evaluate the REEL model on the dataset CRAFT-ChEBI run:
python run.py --dataset craft_chebi -model ppr_ic --link_mode corpus_link -target_kb chebi
The results are outputted to a file located in 'results/craft_chebi/ppr_ic/corpus_link' and printed in the terminal:
Total unique entities: 1679
Entities w/o solution (FN): 299
Wrong disambiguations (FP): 119
Correct disambiguations (TP): 1261
Precision: 0.913768115942029
Recall: 0.8083333333333333
Micro F1-score: 0.8578231292517008
Available datasets:
- 'craft_chebi' (target_kb = chebi)
- 'bc5cdr_medic_all' (target_kb = medic)
- 'bc5cdr_medic_train' (target_kb = medic)
- 'bc5cdr_medic_dev' (target_kb = medic)
- 'bc5cdr_medic_test' (target_kb = medic)
- 'bc5cdr_chemicals_all' (target_kb = ctd_chemicals)
- 'bc5cdr_chemicals_train' (target_kb = ctd_chemicals)
- 'bc5cdr_chemicals_dev' (target_kb = ctd_chemicals)
- 'bc5cdr_chemicals_test' (target_kb = ctd_chemicals)