-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support JQ for API response transformation #489
Comments
|
how does this relate to the templating work (biothings/call-apis.js#30, biothings/call-apis.js#26, biothings/call-apis.js#31)? Also is there any concern about SmartAPI x-bte annotation being complex / very code-like @andrewsu ? |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I started work by trying it on one transformer on the use-jmes-path branch of api-response-transform. For now I'm trying JQ because it seems jmes path is missing some functions like string splitting. |
I created some jq transformers for EBI, CTD, and OpenTarget. Is there any test queries I can use for any of these? |
Next steps in this issue:
Future issues:
|
Replying to Rohan's comment:
I think the EBI transformer is for EBI Proteins API? So the test-query I have for that API is this (from the comments in the x-bte section):
|
JQ draft PR biothings/api-respone-transform.js#38 |
Notes from 2023-03-22 group meeting: This issue is related to post-query processing, which is especially important for BTE correctly handling APIs "in the wild" / external APIs (specifically this means non-TRAPI / non-BioThings). What is in the JQ PR?
Agreement: let’s move forward with getting the first JQ PR biothings/api-respone-transform.js#38 incorporated / deployed. Plan (sequential):
Didn’t have time to review use-cases and ask whether JQ processing can handle it. That’s fine, since we’ve decided to incorporate it anyways
|
Notes from today's group meetingDesign decision / plan: Shouldn't eliminate any existing feature, instead adds choice API-post-processing situations
If we notice stuff used in a lot of post-processing, we can keep it in the BTE module. We can also improve the Base Transformer class (would be a separate issue) → next steps are JC and CX working to review the current PRs to see what to "not use" (use JS instead), what to keep as "JQ in BTE", and what to move to SmartAPI yaml (see post-processing situations above) |
Notes from Jackson and me's meeting todayJackson will fix the typo "jq_transfomer" -> file name and all refs (and check smartapi-kg.js for this typo) We discussed what APIs are covered by the current JQ work, and "where is extra JQ functionality (not covered by JS in main branch)"? And what to do next? BioThings:
➡️ 9/18: @tokebe suggested keeping this JQ. So next is checking that the JQ stuff, including this "extra functionality", works. Dunno about moving to SmartAPI yaml - will discuss later EBI Proteinsadded extra functionality in JQ string, described here and here ➡️ check that this JQ stuff works, and move to SmartAPI yaml. CTD
➡️ check that this JQ stuff works, and move to SmartAPI yaml. Biolink/monarchDidn't notice any extra functionality (note that JQ is being used) ➡️ check that this JQ stuff works, and move to SmartAPI yaml. Overall
|
(my report part 1, pasted from Slack DMs with Jackson) I highlighted one bug with CTD for @tokebe to look into...
|
(my report part 2, updated version of what's on Slack) It also seems like the BioThings APIs extra functionality in JQ isn't working... To test this, you'll need x-bte annotation that is supposed to use this. You can take the SuppKG yaml, replace the operations (line 581-630) with the stuff below, and set up an override on your local BTE. SuppKG operations that should use this functionality by putting the input ID in the middle of the query-string
What I see in my local logs
It looks like BTE isn't parsing the api-response into records successfully...
|
I'm working on fixing the Biothings extra functionality |
Current situation, as discussed by Jackson @tokebe and I:
some notes
And next steps once the CTD fix is confirmed to work:
My ideas on deployment
We've floated the idea of "smartapi overrides" before But I wonder, if an API has "JQ in smartapi" and "JQ in BTE", does the "JQ in smartapi" take priority ("JQ in BTE" not used)? That's my interpretation of the logic in the api-response-transform's
|
Deployment discussion with Jackson @tokebe and I today: I'll do many of the current status / next steps from the previous post
|
For CTD, it looks like the latest commits fix the KEGG.PATHWAY operation, yay! But...it looks like a new bug's been introduced for batch-querying post-processing. Now when the test setup + query is run, every starting ID is linked to every output ID which is incorrect. info
There should be 18 results, according to the desired response in the linked issue. However, now I'm getting 34 results. And genes that should be only linked to 1 starting ID are instead showing up as linked to two IDs (original API response):
|
Ah, I think I see where this issue is occurring, let me put together a fix. |
Additionally, from discussion with @colleenXu regarding the $edge example{
query_operation: {
_params: {
format: "json",
inputTermSearchType: "directAssociations",
inputTerms: "{{ queryInputs | replPrefix('KEGG') }}",
inputType: "pathway",
report: "genes_curated",
},
_requestBody: undefined,
_requestBodyType: undefined,
_supportBatch: false,
_useTemplating: true,
_inputSeparator: undefined,
_templateInputs: undefined,
_batchSize: undefined,
_method: "get",
_pathParams: [
],
_server: "http://ctdbase.org",
_path: "/tools/batchQuery.go",
_tags: [
"translator",
"ctd",
],
_transformer: {
wrap_jq: undefined,
pair_jq: undefined,
},
},
association: {
input_id: "KEGG.PATHWAY",
input_type: "Pathway",
output_id: "NCBIGene",
output_type: "Gene",
predicate: "has_participant",
qualifiers: undefined,
source: undefined,
api_name: "CTD API",
smartapi: {
id: "0212611d1c670f9107baf00b77f0889a",
meta: {
date_created: "2023-03-14T06:06:50.582062+00:00",
last_updated: "2023-05-22T07:04:00.328952+00:00",
url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/CTD/smartapi.yaml",
username: "colleenXu",
},
},
"x-translator": {
component: "KP",
team: [
"Service Provider",
],
infores: "infores:ctd",
},
apiIsPrimaryKnowledgeSource: true,
},
response_mapping: {
has_participant: {
NCBIGene: "data.GeneID",
input_name: "data.PathwayName",
},
},
tags: [
"translator",
"ctd",
],
reasoner_edge: {
id: "e1",
predicate: undefined,
subject: {
id: "n1",
category: [
"biolink:Pathway",
],
expandedCategories: [
"Pathway",
],
equivalentIDsUpdated: false,
curie: [
"KEGG.PATHWAY:hsa05323",
],
is_set: undefined,
expanded_curie: {
"KEGG.PATHWAY:hsa05323": [
"KEGG.PATHWAY:hsa05323",
],
},
entity_count: 1,
held_curie: [
],
held_expanded: {
},
constraints: undefined,
connected_to: {
},
equivalentIDs: {
"KEGG.PATHWAY:hsa05323": {
primaryID: "KEGG.PATHWAY:hsa05323",
equivalentIDs: [
"KEGG.PATHWAY:hsa05323",
],
label: "KEGG.PATHWAY:hsa05323",
labelAliases: [
"KEGG.PATHWAY:hsa05323",
],
primaryTypes: [
"Pathway",
],
semanticTypes: [
"Pathway",
],
attributes: {
},
},
},
},
object: {
id: "n2",
category: [
"biolink:Gene",
],
expandedCategories: [
"Gene",
],
equivalentIDsUpdated: false,
curie: undefined,
is_set: undefined,
expanded_curie: {
},
entity_count: 0,
held_curie: [
],
held_expanded: {
},
constraints: undefined,
connected_to: {
},
},
expanded_predicates: undefined,
qualifier_constraints: [
],
reverse: false,
executed: false,
logs: [
],
records: [
],
},
input: {
queryInputs: "hsa05323",
},
input_resolved_identifiers: {
"KEGG.PATHWAY:hsa05323": {
primaryID: "KEGG.PATHWAY:hsa05323",
equivalentIDs: [
"KEGG.PATHWAY:hsa05323",
],
label: "KEGG.PATHWAY:hsa05323",
labelAliases: [
"KEGG.PATHWAY:hsa05323",
],
primaryTypes: [
"Pathway",
],
semanticTypes: [
"Pathway",
],
attributes: {
},
},
},
original_input: {
"KEGG.PATHWAY:hsa05323": "KEGG.PATHWAY:hsa05323",
},
filter: undefined,
} |
Just to follow-up: I think this should probably be abstracted into a simpler interface for anybody writing a JQ string in their annotation. If so, that would be a blocking requirement prior to deployment. Any work on such an abstracted interface is pending further discussion between @colleenXu and myself, so I'll handle implementation if we move in that direction. |
@colleenXu CTD batch processing should now be fixed, please re-test and let me know if there are any other misbehaviors. |
Hmmm...I'd like to add some details from the discussion Jackson and I had yesterday (related to the 3 previous posts by Jackson).
I have a collection of saved posts from this issue and PRs, on how to actually write JQ stuff for BTE. One of these posts says that the JQ strings can use variables from BTE's internal representation of stuff ($edge). Jackson and I discussed this:
But...I'm still unsure on whether this should be a blocker to deployment. Could it be saved for a next step/phase of BTE JQ work?
|
|
From discussion with @andrewsu: For now, table jq in smartapi -- don't move any jq strings to smartapi and don't advertise it as an option. Move forward with deploying code for jq in BTE (it's fine to deploy the jq in smartapi code, just don't start moving jq to smartapi). We can then come back later and revisit this for usability improvements and move forward with jq in smartapi. |
I also agree with that plan. Starting with our own use first before we advertise it. The original motivation of this issue is to reduce the needs of bte's own code changes, mostly for us. |
I retested CTD and all expected behaviors are working, yay!
I made a post here on whether BTE will need more adjustments for CTD batch-querying support. However, I don't think this is necessarily a barrier to deploying "JQ in BTE". EDIT: I've double-checked all APIs with "JQ in BTE", and no regression in behavior....all are working as-expected. I also tested w/ and w/o the smartapi-kg PR, and the basic "JQ in BTE" behavior appeared identical... |
Jackson @tokebe and I decided to deploy only biothings/api-respone-transform.js#38 to address this issue Details from our Slack DM convo:
|
Relevant changes deployed to Prod. |
JMESPath is a query language for JSON, with libraries available in many languages (e.g. javascript and Python).
We can evaluate and consider to use it in api-respone-transform.js sub-module. Potentially remove or simplify the existing hard-coded transformers.
The overall benefits I believe this new feature can bring us:
The text was updated successfully, but these errors were encountered: