Skip to content

Things to Be Updated for A New Release of ARAX

Amy Glen edited this page Dec 4, 2024 · 4 revisions

Author: Chunyu Ma

Date: 2023/12/20

Description

ARAX is a new reasoning system for translational biomedicine. It offers a web browser user interface and an application programming interface API. This system mainly consists of six different modules, which are ARAX_expander, ARAX_overlay, ARAX_filter_kg, ARAX_infer, ARAX_resultify, ARAX_ranker. For more detailed descriptions of these six modules, you can refer to this publication. To pragmatically query ARAX, you will need to use domain-specific language (DSL) commands. A summary of DSL commands can be found here. The ARAX system is still under active development, with periodic releases of new versions that incorporate updates from RTX-KG2 and other knowledge providers (kps). Since the ARAX system is complicated, its update involves Github code, databases, and multiple servers. This page serves as a checklist for updating the necessary items (e.g., code, databases, etc.) of ARAX for a new release.

RTX-KG2 Update

ARAX relies heavily on RTX-KG2, so the update to RTX-KG2 aligns with the ARAX update. This means that each time RTX-KG2 is updated, the databases and code of ARAX needs to be updated as well. For all to-do tasks of a new release, please refer to this checklist. The RTX-KG2 is mainly maintained by Dr. Stephen Ramsey's Lab. So, if you have any questions about the RTX-KG2, please reach out to Dr. Ramsey, Amy Glen, or Sundareswar Pullela.

Code and Database Update

1. xDTD database

The xDTD database is a pre-computed probability and path database for drug repurposing. It is built based on the KGML-xDTD model trained on a specific version of RTX-KG2. This database stores the predicted treatment probability, the RTX-KG2 based explainable paths for each pair of reliable chemical (with RXCUI Ids from UMLS system) and disease in RTX-KG2, as well as the TSV files of RTX-KG2. The construction of the xDTD database is managed by Dr. David Koslicki's Lab and was previously done by Chunyu Ma. The whole building process needs to take ~3 weeks (1 week for model training, 2 weeks for pre-computation). Below are some things about this building process:

  • The pipeline of training model and building database can be found in this repo. To build a database for a specific version of RTX-KG2, the config.yaml file needs to be edited. By default, the steps 24 and 25 (which are used for building database) are commented out. This is because snakemake cannot automatically run all steps at once. So, after the previous steps 1-23 are done, please remove the comment # character (here and here) in Run_Pipeline.smk file.
  • Since the model training needs GPUs, the e5-cse-cbdmk02 server (which has 4 A100 GPUs each with 80GB RAM) can be used.
  • The code and models for all previous versions of xDTD databases can be found under /scratch/xDTD_all_versions on the e5-cse-cbdmk02 server.
  • Once the database has been successfully built via the pipeline above, one should change its default name ExplainableDTD.db to ExplainableDTD_v1.0_KG###.db (### is the version of RTX-KG2).
  • The ExplainableDTD_v1.0_KG###.db should be uploaded to ~/KG### (### is the version of RTX-KG2) on the server arax-databases.rtx.ai. After the database has been uploaded to this server, please change its corresponding path in the RTX/code/config_dbs.json file.
  • The implementation code in ARAX that uses xDTD database are primarily ARAX_infer.py, ExplianableDTD_db.py, and infer_utilities.py. If you find any errors, please check these three scripts.
  • After all the steps above are done, please run the pytest under https://github.com/RTXteam/RTX/tree/master/code/ARAX/test to make sure all xdtd associated tests can pass (You can use the command pytest -v -k xdtd --runslow --runexternal).

2. NGD database

For the instruction to build this database, please refer to this description. Access to the ngdbuild.rtx.ai AWS server may be required. Once the database build is complete, please rename it to curie_to_pmids_v1.0_KG###.sqlite (### is the version of RTX-KG2) and upload it to ~/KG### (### is the version of RTX-KG2) on the server arax-databases.rtx.ai.

3. Other databases

There are a few more databases that are needed to be updated for the new release of ARAX:

  • node_synonymizer_v1.0_KG###.sqlite: A database for node synonymizer which is maintained by Amy and Eric.
  • kg2c_v1.0_KG###.sqlite: A database storing the kg2c node and edge information which is maintained by Amy and Sundar.
  • autocomplete_v1.0_KG###.sqlite: This database is maintained by Amy and Sundar.

Servers for ARAX

  • arax.ncats.io server: the main AWS server for hosting the ARAX system. Currently, only Amy, Steve, and Eric have permission to access it.
  • arax-databases.rtx.ai server: this server is mainly used for storing different databases for ARAX. You may need the team members who can access it to help you add your SSH key.
  • buildkg2.rtx.ai server: this server is mainly used for building the RTX-KG2. We normally don't need to access it except for the RTX-KG2 team.
  • araxconfig.rtx.ai server: this server is used to store the secret keys for ARAX. We normally don't need to access it.