-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor metaproteomics aggregation script #27
base: main
Are you sure you want to change the base?
Conversation
@aclum - I'd appreciate any early feedback you have on the general approach, feel free to tag anyone else who should have eyes on this. I'm not sure how exactly we'll be able to set the API bearer tokens as environmental variables, but that's how I've been doing development (I altered the readme to describe what environmental variables we'll need). |
You can set both environmental variables and secrets in SPIN. For this we could have variables for the client ID and password and use those to get the bearer token using the https://api.microbiomedata.org/docs#/users/login_for_access_token_token_post endpoint. cc @eecavanna @shreddd |
variables like API url can also be moved to an environmental variable in SPIN. |
@aclum. Thanks for the input. I've rewritten to include a call to get a bearer token with the API username and password (set as environmental variables). I've also pulled out the functions we can reuse for the classes into an abstract I'll leave this in draft until the next release since it depends on the migrated database and next schema release. |
@picowatt I'll let you know when this is ready for review - I'm going to incorporate Alicia's comments and test this after new release first. |
This is now generated json files that validate via the json:submit endpoint. Unfortunately I haven't been able to generate a bearer token with the appropriate level of permissions to actually test the submission to the dev data portal. I'll reach out to the runtime crew for help on that front. Permissions aside, I think this is ready for review/merging now. |
With my updated permissions (thanks @eecavanna), I checked that the json.submit endpoint is working as expected. I loaded a single metaP's annotations to dev mongo's @aclum - is there a server/data portal issue to make sure the functional searches/ingests are expecting MetaProteomics records in the |
No, there is not a corresponding nmdc-server ticket yet, would you please make one @kheal ? @eecavanna @dwinston what is the max payload json:submit can handle? @kheal what is the max length for expected aggregation results? |
I wrote the script so that the json:submit only submits one workflow's aggregation results at at time to avoid payload issues - though I haven't testing the full lot yet. I can run the whole script locally to write into dev mongo overnight as a test. |
Associated ticket filed here: microbiomedata/nmdc-server#1468 |
Moving this back into draft. Testing revealed the first API call to be exceptionally slow, attempting to fix now. |
This is a temporary fix
I've implemented a partial fix for this that will likely not work for future subclasses of the
I tested the full run of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable (my comments are minor and can be ignored if they don't apply).
Also - Is there a way to test this code to make sure it works?
|
||
def find_anno(self, dos): | ||
rv = requests.post(self.base_url + "/token", data=token_request_body) | ||
token_response = rv.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may also want to check rv.status_code
or log it when raising an exception
) | ||
json_record_full = {"functional_annotation_agg": json_records} | ||
|
||
response = self.submit_json_records(json_record_full) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will error out with an Exception rather than logging an error and moving to the next record if you don't succeed in the submit call. That may be the correct behavior (since it is likely a bad server connection or similar), but just wanted to make sure.
This PR will refactor the
generate_metap_agg.py
script to address #26.Overall, the
generate_metap_agg.py
has been refactored toWill not be ready for release until microbiomedata/nmdc-schema#2203 has been merged in (done).