-
Notifications
You must be signed in to change notification settings - Fork 21
Dev info
- Coding guidelines
- Setting up for local dev work on ARAX
- Setting up local UI
- For ARAX Developers: General software development guidelines
- Testing
- Different instances
- Config files
- Branches and merging
- Dealing with Pull Requests
- Changing passwords
- Old or infrequently used info
Please see the coding guidelines for SOP's and general practices.
NOTE: Use python 3.9! (Other versions may result in errors.) Some sort of python environment management tool may be your friend (e.g., pyenv, virtualenv).
- Clone the RTX repo and navigate into it (
cd RTX
) - Run
pip install -r requirements.txt
- Give your public RSA key to another ARAX dev for authentication
- If you don't have an RSA key already, you'll need to generate one
- The dev will need to put your public key on
araxconfig.rtx.ai
(under thearaxconfig
user) andarax-databases.rtx.ai
(under thertxconfig
user). - A simple test to see if it has worked is to run
ssh [email protected]
- Navigate to
RTX/code/ARAX/ARAXQuery
and runpython ARAX_database_manager.py
- This downloads all necessary databases and can take over an hour depending on your internet connection
- Note: The databases take up about 200GB combined, so make sure you have that much space free on your machine!
- Navigate to
RTX/code/ARAX/ARAXQuery/Expand
and runpython kp_info_cacher.py
- This grabs the meta knowledge graphs from KPs so that Expand can figure out which KPs to query
- Note that currently you need to run this on a daily(?) basis (on ARAX servers, this task is run by a background process, but that doesn't happen when you're running ARAX code locally, outside of the Flask application)
- Navigate to
RTX/code/ARAX/test
and runpytest -v
NOTE: This section is slightly outdated, doesn't seem to work as is.. needs updating
If you are running ARAX_query and friends on your local machine and are generating nice JSON, but you want to be able to visualize these JSON beasts through the UI, here's how you can do that (at least it worked for me on my Windows box):
Step 1) Install one more needed modules for CORS support and make sure connexion[swagger-ui] are installed
pip3 install flask_cors
pip3 install connexion[swagger-ui]
Step 2) Add a custom endpoint destination:
cd code/UI/interactive
cp config.js.example config.js
edit config.js to contain:
config.base = 'http://localhost:5001/';
config.baseAPI = config.base + "api/arax/v1.2";
Step 3) Start the Flask server (blocks this shell and runs until ^C)
cd code/UI/OpenAPI/python-flask-server
python3 -m openapi_server
Step 4) Point your web browser to the UI files on your local filesystem, something like:
file://G:/Repositories/GitHub/RTX/code/UI/interactive/index.html
or
file://G:/Repositories/GitHub/RTX/code/UI/interactive/index.html?r=1
(the r number is the response id that you want to view in the UI)
By changing the r number, you should be able to view the messages you are creating and storing via ARAQ_Query {make sure you don't have return(store=false) in your DSL otherwise there's no r number} In theory launching queries from the GUI should work, too, but I haven't properly tested it.
Care should be taken that the code never just dies because then there is no feedback about the problem in the API/UI. Use the ARAXResponse.error
mechanism to log informative messages throughout your code (see below section for more details):
-
DEBUG
: Only something an ARAX team member would want to see -
INFO
: Something an API user might like to see to examine the steps behind the scenes. Good for innocuous assumptions. -
WARNING
: Something that an API user should sit up and notice. Good for assumptions with impact -
ERROR
: A failure that prevents fulfilling the request. Note that logging an error may not halt processing. Several can accumulate. If you need processing to terminate, eitherreturn
orraise
anException
depending on where this error occurs.
An ARAXResponse
object is passed into each ARAX module's apply()
method; among many things, this object serves as ARAX's log. You may either use this same response object throughout your module by passing it to different methods/classes as needed OR you may instantiate new response objects and then merge them with the response object that is ultimately returned from the module's apply()
method.
- Major methods (not little helper ones that can't fail) and calls to different ARAX classes should always:
- Either instantiate a new
ARAXResponse
object or take one as an input parameter- Log with
response.debug
,response.info
,response.warning
, andresponse.error
- Place returned data objects in the
response.data
envelope (dict
)- Return that response object
- Log with
- Callers of major methods should call with
result = object.method()
- Then immediately merge the new result into the active response (if they are separate response objects)
- Then immediately check
result.status
to make sure it is'OK'
, and if not, return response or take some other action for method call failure - The class may store the
Response
object as an object variable and sharing it among the methods that way (this may be convenient)
We generally manage all work (bug fixes, features, and enhancements) via GitHub issues. The general workflow for working on a GitHub issue is as follows:
- Create a branch for your issue (typically off of the
master
branch, but possibly another branch depending on your particular issue) - Implement the necessary code changes for your issue in your branch
- Ensure your commit messages are under 70 characters and always reference the issue in your commit (e.g., with '#1000', if your issue number was 1000)
- It is generally ok to push commits to your branch that leave the system in a broken state, unless the branch is shared with other devs who do not expect the system to be broken (but you should never push breaking changes to
master
!)
- If you are working on this issue for an extended period of time you will likely want to periodically merge
master
(or whatever your parent branch was) into your branch (see section on Branches and Merging) - It is generally a good idea to add one or more pytests (see the Testing section) that test out your fix/changes, but please ensure the test completes speedily (within ~10 seconds) or mark it with
@pytest.mark.slow
! - Once you believe you are done implementing changes, merge
master
into your branch and run the ARAX Pytest suite - If any tests are failing, you need to figure out why and address those
- Once all tests are passing, you can make a Pull Request to merge your branch into
master
(or whatever your parent branch was)- Be sure to reference the issue from your PR (same way as in commit messages)
- Once you become more experienced you may omit creating a PR and instead directly merge your branch into
master
- Next add the
verify in next deployment
tag to your issue - Once your PR is merged, please delete your branch (assuming you aren't using it for any other issues)
- After
master
has been rolled out to one of our ARAX endpoints (either test, beta, or production - see the Different Instances section), verify that your changes are working as expected on that endpoint - After that, post a message in the GitHub issue letting whoever submitted the issue know that the changes are complete (and which endpoint(s) they have been rolled out to)
- If the person who submitted the issue is satisfied, the issue can be closed
- In your code, do not assume a particular location for the "current working directory". In general, try to use
os.path.abspath
to find the location of__FILE__
for your module and then construct a relative path to find other ARAX/RTX files/modules. - Always run the ARAX Pytest suite before pushing to
master
; do not push your changes tomaster
if any pytests are failing! - Strive to adhere to PEP8 style in your Python code.
The ARAX Pytest suite lives at: RTX/code/ARAX/test/
. The README in that directory provides details on how to use the test suite, but some examples are provided below as well.
To run all tests, cd
to that folder and run
pytest -v .
To run the tests in a specific file
pytest -v <file.py>
To run a specific test:
pytest -v <file.py> -k <a test like test_example_3>
To run the slow tests:
pytest -v --runslow
To run the 'external' tests:
pytest -v --runexternal
To run all tests:
pytest -v --runslow --runexternal
The /asyncquery endpoint is a bit hard to test because you need to have a callback receiver that is Internet accessible or accessible to ARAX. There is a crude callback receiver available on ARAX itself.
How to use such a system is documented here: https://github.com/RTXteam/RTX/issues/1756
Normally when running the pytest suite on your dev machine, any queries Expand does of KG2 are sent to the KG2 API. This means that the KG2 API-specific code (which is mixed into the ARAX Expand code) is not actually run on your machine. If you're doing development work on KG2 API-specific pieces of code (which is essentially code within an if mode == "RTXKG2:"
block), for testing you want your own machine to act as the KG2 API instead of calling our arax.ncats.io KG2 endpoint. Follow these steps to do so:
- Locally flip the
force_local
variable toTrue
inARAX_expander.py
(on this line) - Then run the pytest suite in the usual way
- "our" prod: arax.ncats.io
- "our" test: arax.ncats.io/test
- "our" beta: arax.ncats.io/beta
- ITRB production: arax.transltr.io
- ITRB test: arax.test.transltr.io
- ITRB CI/staging: arax.ci.transltr.io
See also this google doc with all endpoints and the branches they run.
The Jenkins dashboard for ITRB builds is here: https://deploy.transltr.io/.
ARAX has one config file that does not live in the RTX repo; it is called config_secrets.json
. The 'master copy' of this file lives on [email protected]
at /home/araxconfig/config_secrets.json
. ARAX developers' public RSA keys need to be listed in authorized_keys
on this instance; this allows config_secrets.json
to be automatically downloaded to their machine when queries are run (it auto-refreshes every 24 hours).
If desired, you may override config_secrets.json
by creating a (local) copy of it at RTX/code/config_secrets_local.json
, which you can tweak to contain whatever usernames/passwords you need. If a config_secrets_local.json
file is present, it will always be used instead of the regular config_secrets.json
.
NOTE: You should never push config_secrets.json
or share its contents in a public space! (i.e., beyond our team)
The ARAX database config file lives in the RTX repo at RTX/code/config_dbs.json
. This file specifies which versions of our various databases should be used. The ARAXDatabaseManager
automatically takes care of downloading/removing databases from developers' machines as needed, according to what is specified in config_dbs.json
.
-
production
anditrb-test
should not be committed to, save for ITRB-specific changes -
master
is to be merged intoproduction
and/oritrb-test
, not the other way around
To merge master
into mybranch
(replace with your own branch name), do the following:
git checkout master
git pull origin master
git checkout mybranch
git pull origin mybranch
git merge --no-ff origin/master
[if any merge conflicts: fix them and commit]
git push origin mybranch
To merge mybranch
into master
, do the following:
WARNING: Be very careful when merging anything into master
! Be sure your changes are fully tested and always first merge master
into your branch and test before doing this.
git checkout mybranch
git pull origin mybranch
git checkout master
git pull origin master
git merge --no-ff origin/mybranch
[if any merge conflicts: fix them and commit]
git push origin master
See this gist
- Install
gh
via these directions. - Check out the PR locally
gh pr checkout <PR number>
- Edit, check, commit, etc.
- If everything looks good:
-
git branch
to see what<branch name>
you are on -
git checkout master
switch to master branch -
git pull origin master
to make sure master is up to date -
git checkout <branch name>
switch back to PR branch -
git merge --no-ff origin/master
merge master into PR - Fix any merge conflicts
-
git checkout master
switch to master -
git merge --no-ff origin/<branch name>
to merge PR to master
-
To switch back to master: git checkout master
:server change-password
sudo service neo4j stop
sudo rm -rf /var/lib/neo4j/data/dbms
sudo -u neo4j neo4j-admin set-initial-password PASSWORD
sudo service neo4j start
$sudo mysql
>GRANT ALL ON RTXFeedback.* TO "rt"@"localhost" IDENTIFIED BY 'PASSWORD';
If rejected use:
$sudo mysql
>set password for 'rt'@'localhost'='PASSWORD';
Note: The synonymizer should be automatically downloaded into your dev environment upon running the pytest suite (or ARAX_database_manager.py
). But if you need to build one yourself for some reason, this explains how to do so.
How to build from scratch:
git pull
If your kg2_node_info.tsv
, kg2_equivalencies.tsv
, and kg2_synonyms.json
files are not already up to date (or you haven't created them yet), you should first do:
cd $RTX/code/ARAX/NodeSynonymizer
python3 dump_kg2_node_data.py
(This pulls down a lot of data over the network and takes 10+ minutes depending on network speed)
Then build the NodeSynonymizer database: (WARNING: The build process needs 25GB of free RAM to work!)
cd $RTX/code/ARAX/NodeSynonymizer
python3 sri_node_normalizer.py --build
python3 node_synonymizer.py --build --kg_name=both
python3 node_synonymizer.py --lookup=rickets --kg_name=KG2
NOTE: If during a branch switch/merge/commit you get a complaint about kg2_node_info.tsv
, kg2_equivalencies.tsv
, or kg2_synonyms.json
being untracked files that would be overwritten, it is safe to delete them. After building the new NodeSynonymizer database, you will not need those files around any more.