Skip to content

Things to Be Updated for A New Release of ARAX

Chunyu Ma edited this page Dec 20, 2023 · 4 revisions

Author: Chunyu Ma

Date: 2023/12/20

Description

ARAX is a new reasoning system for translational biomedicine. It offers a web browser user interface and an application programming interface API. This system mainly consists of six different modules, which are ARAX_expander, ARAX_overlay, ARAX_filter_kg, ARAX_infer, ARAX_resultify, ARAX_ranker. For more detailed descriptions of these six modules, you can refer to this publication. To pragmatically query ARAX, you will need to use domain-specific language (DSL) commands. A summary of DSL commands can be found here. The ARAX system is still under active development, with periodic releases of new versions that incorporate updates from RTX-KG2 and other knowledge providers (kps). Since the ARAX system is complicated, its update involves Github code, databases, and multiple servers. This page serves as a checklist for updating the necessary items (e.g., code, databases, etc.) of ARAX for a new release.

RTX-KG2 Update

ARAX relies heavily on RTX-KG2, so the update to RTX-KG2 aligns with the ARAX update. This means that each time RTX-KG2 is updated, the databases and code of ARAX needs to be updated as well. For all to-do tasks of a new release, please refer to this checklist. The RTX-KG2 is mainly maintained by Dr. Stephen Ramsey's Lab. So, if you have any questions about the RTX-KG2, please reach out to Dr. Ramsey, Amy Glen, Sundareswar Pullela, or Lillana Acevedo.

Code and Database Update

1. xDTD database

The xDTD database is a pre-computed probability and path database for drug repurposing. It is built based on the KGML-xDTD model trained on a specific version of RTX-KG2. This database stores the predicted treatment probability, the RTX-KG2 based explainable paths for each pair of reliable chemical (with RXCUI Ids from UMLS system) and disease in RTX-KG2, as well as the TSV files of RTX-KG2. The construction of the xDTD database is managed by Dr. David Koslicki's Lab and was previously done by Chunyu Ma. The whole building process needs to take ~3 weeks (1 week for model training, 2 weeks for pre-computation). Below are some things about this building process:

  • The pipeline of training model and building database can be found in this repo. To build a database for a specific version of RTX-KG2, the config.yaml file needs to be edited. By default, the steps 24 and 25 (which are used for building database) are commented out. This is because snakemake cannot automatically run all steps at once. So, after the previous steps 1-23 are done, please remove the comment # character (here and here) in Run_Pipeline.smk file.
  • Since the model training needs GPUs, the e5-cse-cbdmk02 server (which has 4 A100 GPUs each with 80GB RAM) can be used.
  • The code and models for all previous versions of xDTD databases can be found under /scratch/xDTD_all_versions on the e5-cse-cbdmk02 server.
  • Once the database has been successfully built via the pipeline above, one should change its default name ExplainableDTD.db to ExplainableDTD_v1.0_KG###.db (### is the version of RTX-KG2).
  • The ExplainableDTD_v1.0_KG###.db can be uploaded to ~/KG### (### is the version of RTX-KG2) on the server arax-databases.rtx.ai.
  • The implementation code in ARAX that uses xDTD database are primarily ARAX_infer.py, ExplianableDTD_db.py, and infer_utilities.py.
  • After all the steps above are done, please run the pytest under https://github.com/RTXteam/RTX/tree/master/code/ARAX/test to make sure all xdtd associated tests can pass (You can use the command pytest -v -k xdtd --runslow --runexternal).