Skip to content

Dev info

Kevin Vizhalil edited this page Mar 19, 2024 · 97 revisions

Table of contents

Coding guidelines

Please see the coding guidelines for SOP's and general practices.

Setting up for local dev work on ARAX

NOTE: Use python 3.9! (Other versions may result in errors.) Some sort of python environment management tool may be your friend (e.g., pyenv, virtualenv).

  1. Clone the RTX repo and navigate into it (cd RTX)
  2. Run pip install -r requirements.txt
  3. Give your public RSA key to another ARAX dev for authentication
    • If you don't have an RSA key already, you'll need to generate one
    • The dev will need to put your public key on araxconfig.rtx.ai (under the araxconfig user) and arax-databases.rtx.ai (under the rtxconfig user).
    • A simple test to see if it has worked is to run ssh [email protected]
  4. Navigate to RTX/code/ARAX/ARAXQuery and run python ARAX_database_manager.py
    • This downloads all necessary databases and can take over an hour depending on your internet connection
    • Note: The databases take up about 200GB combined, so make sure you have that much space free on your machine!
  5. Navigate to RTX/code/ARAX/ARAXQuery/Expand and run python kp_info_cacher.py
    • This grabs the meta knowledge graphs from KPs so that Expand can figure out which KPs to query
    • Note that currently you need to run this on a daily(?) basis (on ARAX servers, this task is run by a background process, but that doesn't happen when you're running ARAX code locally, outside of the Flask application)
  6. Navigate to RTX/code/ARAX/test and run pytest -v

Setting up local UI

NOTE: This section is slightly outdated, doesn't seem to work as is.. needs updating

If you are running ARAX_query and friends on your local machine and are generating nice JSON, but you want to be able to visualize these JSON beasts through the UI, here's how you can do that (at least it worked for me on my Windows box):

Step 1) Install one more needed modules for CORS support and make sure connexion[swagger-ui] are installed

pip3 install flask_cors
pip3 install connexion[swagger-ui]

Step 2) Add a custom endpoint destination:

cd code/UI/interactive
cp config.js.example config.js
edit config.js to contain:
   config.base    = 'http://localhost:5001/';
   config.baseAPI = config.base + "api/arax/v1.2";

Step 3) Start the Flask server (blocks this shell and runs until ^C)

cd code/UI/OpenAPI/python-flask-server
python3 -m openapi_server

Step 4) Point your web browser to the UI files on your local filesystem, something like:

file://G:/Repositories/GitHub/RTX/code/UI/interactive/index.html

or

file://G:/Repositories/GitHub/RTX/code/UI/interactive/index.html?r=1

(the r number is the response id that you want to view in the UI)

By changing the r number, you should be able to view the messages you are creating and storing via ARAQ_Query {make sure you don't have return(store=false) in your DSL otherwise there's no r number} In theory launching queries from the GUI should work, too, but I haven't properly tested it.

For ARAX Developers: General software development guidelines

Logging and response objects

Care should be taken that the code never just dies because then there is no feedback about the problem in the API/UI. Use the ARAXResponse.error mechanism to log informative messages throughout your code (see below section for more details):

  • DEBUG: Only something an ARAX team member would want to see
  • INFO: Something an API user might like to see to examine the steps behind the scenes. Good for innocuous assumptions.
  • WARNING: Something that an API user should sit up and notice. Good for assumptions with impact
  • ERROR: A failure that prevents fulfilling the request. Note that logging an error may not halt processing. Several can accumulate. If you need processing to terminate, either return or raise an Exception depending on where this error occurs.

General response paradigm

An ARAXResponse object is passed into each ARAX module's apply() method; among many things, this object serves as ARAX's log. You may either use this same response object throughout your module by passing it to different methods/classes as needed OR you may instantiate new response objects and then merge them with the response object that is ultimately returned from the module's apply() method.

  • Major methods (not little helper ones that can't fail) and calls to different ARAX classes should always: - Either instantiate a new ARAXResponse object or take one as an input parameter
    • Log with response.debug, response.info, response.warning, and response.error
    • Place returned data objects in the response.data envelope (dict)
      • Return that response object
  • Callers of major methods should call with result = object.method()
  • Then immediately merge the new result into the active response (if they are separate response objects)
  • Then immediately check result.status to make sure it is 'OK', and if not, return response or take some other action for method call failure
  • The class may store the Response object as an object variable and sharing it among the methods that way (this may be convenient)

Best practice dev workflow

We generally manage all work (bug fixes, features, and enhancements) via GitHub issues. The general workflow for working on a GitHub issue is as follows:

  1. Create a branch for your issue (typically off of the master branch, but possibly another branch depending on your particular issue)
  2. Implement the necessary code changes for your issue in your branch
    • Ensure your commit messages are under 70 characters and always reference the issue in your commit (e.g., with '#1000', if your issue number was 1000)
    • It is generally ok to push commits to your branch that leave the system in a broken state, unless the branch is shared with other devs who do not expect the system to be broken (but you should never push breaking changes to master!)
  3. If you are working on this issue for an extended period of time you will likely want to periodically merge master (or whatever your parent branch was) into your branch (see section on Branches and Merging)
  4. It is generally a good idea to add one or more pytests (see the Testing section) that test out your fix/changes, but please ensure the test completes speedily (within ~10 seconds) or mark it with @pytest.mark.slow!
  5. Once you believe you are done implementing changes, merge master into your branch and run the ARAX Pytest suite
  6. If any tests are failing, you need to figure out why and address those
  7. Once all tests are passing, you can make a Pull Request to merge your branch into master (or whatever your parent branch was)
    • Be sure to reference the issue from your PR (same way as in commit messages)
    • Once you become more experienced you may omit creating a PR and instead directly merge your branch into master
  8. Next add the verify in next deployment tag to your issue
  9. Once your PR is merged, please delete your branch (assuming you aren't using it for any other issues)
  10. After master has been rolled out to one of our ARAX endpoints (either test, beta, or production - see the Different Instances section), verify that your changes are working as expected on that endpoint
  11. After that, post a message in the GitHub issue letting whoever submitted the issue know that the changes are complete (and which endpoint(s) they have been rolled out to)
  12. If the person who submitted the issue is satisfied, the issue can be closed

Other dev guidelines/rules

  • In your code, do not assume a particular location for the "current working directory". In general, try to use os.path.abspath to find the location of __FILE__ for your module and then construct a relative path to find other ARAX/RTX files/modules.
  • Always run the ARAX Pytest suite before pushing to master; do not push your changes to master if any pytests are failing!
  • Strive to adhere to PEP8 style in your Python code.

Testing

ARAX Pytest suite

The ARAX Pytest suite lives at: RTX/code/ARAX/test/. The README in that directory provides details on how to use the test suite, but some examples are provided below as well.

To run all tests, cd to that folder and run

pytest -v .

To run the tests in a specific file

pytest -v <file.py>

To run a specific test:

pytest -v <file.py> -k <a test like test_example_3>

To run the slow tests:

pytest -v --runslow

To run the 'external' tests:

pytest -v --runexternal

To run all tests:

pytest -v --runslow --runexternal

Testing /asyncquery with a callback receiver

The /asyncquery endpoint is a bit hard to test because you need to have a callback receiver that is Internet accessible or accessible to ARAX. There is a crude callback receiver available on ARAX itself.

How to use such a system is documented here: https://github.com/RTXteam/RTX/issues/1756

Locally testing KG2 API code

Normally when running the pytest suite on your dev machine, any queries Expand does of KG2 are sent to the KG2 API. This means that the KG2 API-specific code (which is mixed into the ARAX Expand code) is not actually run on your machine. If you're doing development work on KG2 API-specific pieces of code (which is essentially code within an if mode == "RTXKG2:" block), for testing you want your own machine to act as the KG2 API instead of calling our arax.ncats.io KG2 endpoint. Follow these steps to do so:

  1. Locally flip the force_local variable to True in ARAX_expander.py (on this line)
  2. Then run the pytest suite in the usual way

Different instances

  1. "our" prod: arax.ncats.io
  2. "our" test: arax.ncats.io/test
  3. "our" beta: arax.ncats.io/beta
  4. ITRB production: arax.transltr.io
  5. ITRB test: arax.test.transltr.io
  6. ITRB CI/staging: arax.ci.transltr.io

See also this google doc with all endpoints and the branches they run.

The Jenkins dashboard for ITRB builds is here: https://deploy.transltr.io/.

Config files

ARAX has one config file that does not live in the RTX repo; it is called config_secrets.json. The 'master copy' of this file lives on [email protected] at /home/araxconfig/config_secrets.json. ARAX developers' public RSA keys need to be listed in authorized_keys on this instance; this allows config_secrets.json to be automatically downloaded to their machine when queries are run (it auto-refreshes every 24 hours).

If desired, you may override config_secrets.json by creating a (local) copy of it at RTX/code/config_secrets_local.json, which you can tweak to contain whatever usernames/passwords you need. If a config_secrets_local.json file is present, it will always be used instead of the regular config_secrets.json.

NOTE: You should never push config_secrets.json or share its contents in a public space! (i.e., beyond our team)

The ARAX database config file lives in the RTX repo at RTX/code/config_dbs.json. This file specifies which versions of our various databases should be used. The ARAXDatabaseManager automatically takes care of downloading/removing databases from developers' machines as needed, according to what is specified in config_dbs.json.

Branches and merging

Which branches to commit to/merge into

  1. production and itrb-test should not be committed to, save for ITRB-specific changes
  2. master is to be merged into production and/or itrb-test, not the other way around

How to merge branches

To merge master into mybranch (replace with your own branch name), do the following:

git checkout master
git pull origin master
git checkout mybranch
git pull origin mybranch
git merge --no-ff origin/master
[if any merge conflicts: fix them and commit]
git push origin mybranch

To merge mybranch into master, do the following:

WARNING: Be very careful when merging anything into master! Be sure your changes are fully tested and always first merge master into your branch and test before doing this.

git checkout mybranch
git pull origin mybranch
git checkout master
git pull origin master
git merge --no-ff origin/mybranch
[if any merge conflicts: fix them and commit]
git push origin master

Merging a specific commit (and its history) into another branch

See this gist

Dealing with Pull Requests

  1. Install gh via these directions.
  2. Check out the PR locally gh pr checkout <PR number>
  3. Edit, check, commit, etc.
  4. If everything looks good:
    1. git branch to see what <branch name> you are on
    2. git checkout master switch to master branch
    3. git pull origin master to make sure master is up to date
    4. git checkout <branch name> switch back to PR branch
    5. git merge --no-ff origin/master merge master into PR
    6. Fix any merge conflicts
    7. git checkout master switch to master
    8. git merge --no-ff origin/<branch name> to merge PR to master

To switch back to master: git checkout master

Changing passwords

Neo4j pw in browser

:server change-password

Neo4j pw on command line

  1. sudo service neo4j stop
  2. sudo rm -rf /var/lib/neo4j/data/dbms
  3. sudo -u neo4j neo4j-admin set-initial-password PASSWORD
  4. sudo service neo4j start

mySQL Feedback database

$sudo mysql
>GRANT ALL ON RTXFeedback.* TO "rt"@"localhost" IDENTIFIED BY 'PASSWORD';

If rejected use:

$sudo mysql
>set password for 'rt'@'localhost'='PASSWORD';

Update Generic Terms Blocklist

To blocklist is stored in the general_concepts.json. Here, there are two ways of filtering out generic concepts

  • Specifying Curies: In the curie section, add the curies you want to filter out in lower case to the list. It is not necessary to specify equivalent curies as they get filtered out too.
  • Specifying terms: In the synonyms section, add the terms to the list you want to filter out to the list. This can be specified either as a string; for example congenital or a valid python regular expression such as pharmacolog.*. These terms provided can be case insensitive. A node gets filtered out if its name or any of its synonyms(specified in the Node attributes) match with an item in this list.

Old or infrequently used info

How to build the NodeSynonymizer

Note: The synonymizer should be automatically downloaded into your dev environment upon running the pytest suite (or ARAX_database_manager.py). But if you need to build one yourself for some reason, this explains how to do so.

How to build from scratch:

git pull 

If your kg2_node_info.tsv, kg2_equivalencies.tsv, and kg2_synonyms.json files are not already up to date (or you haven't created them yet), you should first do:

cd $RTX/code/ARAX/NodeSynonymizer
python3 dump_kg2_node_data.py

(This pulls down a lot of data over the network and takes 10+ minutes depending on network speed)

Then build the NodeSynonymizer database: (WARNING: The build process needs 25GB of free RAM to work!)

cd $RTX/code/ARAX/NodeSynonymizer
python3 sri_node_normalizer.py --build
python3 node_synonymizer.py --build --kg_name=both
python3 node_synonymizer.py --lookup=rickets --kg_name=KG2

NOTE: If during a branch switch/merge/commit you get a complaint about kg2_node_info.tsv, kg2_equivalencies.tsv, or kg2_synonyms.json being untracked files that would be overwritten, it is safe to delete them. After building the new NodeSynonymizer database, you will not need those files around any more.