Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support JQ for API response transformation #489

Closed
newgene opened this issue Aug 16, 2022 · 46 comments
Closed

support JQ for API response transformation #489

newgene opened this issue Aug 16, 2022 · 46 comments
Assignees
Labels
enhancement New feature or request jq / jmespath On Test Related changes are deployed to Test server

Comments

@newgene
Copy link
Member

newgene commented Aug 16, 2022

JMESPath is a query language for JSON, with libraries available in many languages (e.g. javascript and Python).

We can evaluate and consider to use it in api-respone-transform.js sub-module. Potentially remove or simplify the existing hard-coded transformers.

The overall benefits I believe this new feature can bring us:

  • Allow to standardize the API response transformation in a well-defined language, and we can then put it into the SmartAPI metadata, instead of hard-coded API-specific code within the module.
  • Simplify the complexity of the api-respone-transform.js sub-module, since the many transformation logic will be handled by JMESPath module.
  • Support more complex transformation logic if needed.
@newgene newgene added the enhancement New feature or request label Aug 16, 2022
@newgene
Copy link
Member Author

newgene commented Aug 16, 2022

jq can be another option, but jq module is more commandline-focused, while JMESPath is more suitable to be used in our modules.

@colleenXu
Copy link
Collaborator

colleenXu commented Aug 16, 2022

how does this relate to the templating work (biothings/call-apis.js#30, biothings/call-apis.js#26, biothings/call-apis.js#31)?

Also is there any concern about SmartAPI x-bte annotation being complex / very code-like @andrewsu ?

@rjawesome

This comment was marked as resolved.

@ericz1803

This comment was marked as resolved.

@ericz1803 ericz1803 assigned rjawesome and unassigned ericz1803 Aug 30, 2022
@rjawesome
Copy link
Contributor

I started work by trying it on one transformer on the use-jmes-path branch of api-response-transform. For now I'm trying JQ because it seems jmes path is missing some functions like string splitting.

@rjawesome
Copy link
Contributor

I created some jq transformers for EBI, CTD, and OpenTarget. Is there any test queries I can use for any of these?

@andrewsu
Copy link
Member

Next steps in this issue:

  • @newgene to reach out to @rjawesome to discuss jq vs JMESPath
  • finish migrating the rest of the transformers

Future issues:

  • move jq code to SmartAPI metadata
  • update documentation describing necessary formats
  • update BTE to consume jq annotation

@colleenXu
Copy link
Collaborator

Replying to Rohan's comment:

  • I think I removed CTD from BTE's list of APIs a while back due to an outdated SmartAPI yaml.
  • I removed OpenTargets from BTE's list of APIs more recently because that API had been removed and replaced with a diff query-interface.

I think the EBI transformer is for EBI Proteins API? So the test-query I have for that API is this (from the comments in the x-bte section):

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["UniProtKB:Q93099", "UniProtKB:Q9Y6P5"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": ["biolink:MolecularActivity"]
                }
            }
        }
    }
}

@colleenXu
Copy link
Collaborator

JQ draft PR biothings/api-respone-transform.js#38

@colleenXu colleenXu changed the title support JMESPath for API response transformation support JQ for API response transformation Mar 22, 2023
@colleenXu
Copy link
Collaborator

colleenXu commented Mar 22, 2023

Notes from 2023-03-22 group meeting:

This issue is related to post-query processing, which is especially important for BTE correctly handling APIs "in the wild" / external APIs (specifically this means non-TRAPI / non-BioThings).

What is in the JQ PR?

  1. replaced some custom js with jq processing. Some jq processing can stay in this repo as reusable functions for SmartAPI yamls
  2. some jq processing can be taken out and moved into the individual SmartAPI yamls
  3. custom js code is still the fallback for post-query processing

Agreement: let’s move forward with getting the first JQ PR biothings/api-respone-transform.js#38 incorporated / deployed.


Plan (sequential):

  • JC to check PR: actually uses JQ processing / update by merging latest main
  • CX to test existing JQ processing (code)
    • biolink/monarch
    • Ebi proteins
    • CTD
    • Semmeddb / biothings?? Ask Rohan for clarification
  • JC deploy to dev
  • Coordinate:

Didn’t have time to review use-cases and ask whether JQ processing can handle it. That’s fine, since we’ve decided to incorporate it anyways

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 13, 2023

Notes from today's group meeting

Design decision / plan: Shouldn't eliminate any existing feature, instead adds choice

API-post-processing situations

  • JS in BTE: CW,JC seem fairly sure JS is more performant than JQ, so we want to keep using JS. Let's keep this for internal APIs (BioThings), very common transforms, or complex transforms
  • JQ in BTE: lots of this in the current JQ PRs, as proof-of-concept. Not sure how much we want to actually use this - maybe only for "commonly used" JQ stuff?
  • JQ in SmartAPI yaml: Use JQ in SmartAPI yaml when possible for external APIs. It's particuarly good for simple transforms (parsing single fields?) and api-specific transforms
    • allows those APIs' teams to help build/maintain post-processing "code", rather than have our team do this
    • ideally more transparent in this way

If we notice stuff used in a lot of post-processing, we can keep it in the BTE module. We can also improve the Base Transformer class (would be a separate issue)

→ next steps are JC and CX working to review the current PRs to see what to "not use" (use JS instead), what to keep as "JQ in BTE", and what to move to SmartAPI yaml (see post-processing situations above)

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 13, 2023

Notes from Jackson and me's meeting today

Jackson will fix the typo "jq_transfomer" -> file name and all refs (and check smartapi-kg.js for this typo)

We discussed what APIs are covered by the current JQ work, and "where is extra JQ functionality (not covered by JS in main branch)"? And what to do next?

BioThings:

➡️ agreed to move any extra functionality to JS. Want to use JS instead for performance reasons and because many of our APIs are BioThings (internal).

9/18: @tokebe suggested keeping this JQ. So next is checking that the JQ stuff, including this "extra functionality", works. Dunno about moving to SmartAPI yaml - will discuss later

EBI Proteins

added extra functionality in JQ string, described here and here

➡️ check that this JQ stuff works, and move to SmartAPI yaml.

CTD

➡️ check that this JQ stuff works, and move to SmartAPI yaml.

Biolink/monarch

Didn't notice any extra functionality (note that JQ is being used)

➡️ check that this JQ stuff works, and move to SmartAPI yaml.

Overall

  • If we want to keep JQ-processing, does this work okay with the publication refactor?
  • There's other external APIs, not sure if they're covered by current JQ work or if we will want JQ stuff for them in the future:
    • litvar
    • quickgo
    • ols

@colleenXu
Copy link
Collaborator

(my report part 1, pasted from Slack DMs with Jackson)

I highlighted one bug with CTD for @tokebe to look into...

Done reviewing "potentially-extra-functionality to JQ" stuff. Didn't find anything more - they were just tagged "we could try seeing if this can be addressed with JQ, once it is deployed 😛 "

Also tested current JQ PR (just the api-response-transform). I couldn't figure out how to successfully test w/ breakpoints, so I'm not 100% sure the JQ code is being used. For EBI Proteins, Biolink/Monarch seemed to work as-intended, YAY.

But for CTD:

  • I think there's a bug when using this operation. Running the testExample: Pathway KEGG.PATHWAY:hsa05323 -> Gene...89 records are retrieved but they're all dropped during edge-management...so no results are returned. I think this operation previously worked...
  • worked as-intended for all other operations
  • worked as-intended for extra functionality: pipe-delimited, and batch-querying

@colleenXu
Copy link
Collaborator

(my report part 2, updated version of what's on Slack)

@tokebe

It also seems like the BioThings APIs extra functionality in JQ isn't working...

To test this, you'll need x-bte annotation that is supposed to use this. You can take the SuppKG yaml, replace the operations (line 581-630) with the stuff below, and set up an override on your local BTE.

SuppKG operations that should use this functionality by putting the input ID in the middle of the query-string

    supplement-treats-disease:
    ## 595,222 records 
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: UMLS
            semantic: SmallMolecule
        requestBodyType: object
        requestBody:
          body: >-
            {"q": {{ queryInputs | wrap('subject.umls:', ' AND predicate:TREATS
            AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)') | dump }}, "scopes": []}
        outputs:
          - id: UMLS
            semantic: Disease
        parameters:
          fields: object.umls,relation.pmid,relation.sentence,subject.name,object.name
          size: 1000
        predicate: treats
        source: "infores:suppkg" # no infores for suppkg yet
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/object"
        # testExamples:
        #   - qInput: "UMLS:C0016277"      ## Fluconazole
        #     oneOutput: "UMLS:C0009324"   ## ulcerative colitis
    disease-treated_by-supplement:
      - supportBatch: true
        useTemplating: true
        inputs:
          - id: UMLS
            semantic: Disease
        requestBodyType: object
        requestBody:
          body: >-
            {"q": {{ queryInputs | wrap('object.umls:', ' AND predicate:TREATS AND (NOT subject.semtypes:(bact OR gngm OR plnt))
            AND subject.semtypes:(aapp OR antb OR bacs OR dsp OR food OR inch OR orch OR phsu OR vita)') | dump }}, "scopes": []}
        outputs:
          - id: UMLS
            semantic: SmallMolecule
        parameters:
          fields: subject.umls,relation,subject.name,object.name
          size: 1000
        predicate: treated_by
        source: "infores:suppkg" # no infores for suppkg yet
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/subject"
        # testExamples:
        #   - qInput: "UMLS:C0009324"      ## ulcerative colitis
        #     oneOutput: "UMLS:DC0023791"   ## (r)-dithiolane-3-pentanoic acid

What I see in my local logs

It looks like BTE isn't parsing the api-response into records successfully...

  bte:call-apis:query using template builder +45s
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://biothings.ncats.io/suppkg/query',
  bte:call-apis:query   params: {
  bte:call-apis:query     with_total: true,
  bte:call-apis:query     fields: 'object.umls,relation.pmid,relation.sentence,subject.name,object.name',
  bte:call-apis:query     size: 1000
  bte:call-apis:query   },
  bte:call-apis:query   data: {
  bte:call-apis:query     q: [
  bte:call-apis:query       'subject.umls:C0016277 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C4082694 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C5164749 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C5164750 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C5164751 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C5164754 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C5164752 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C5164753 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C4266171 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0481980 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0483573 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0362338 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0362339 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0801917 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0362340 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0941434 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0550679 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)',
  bte:call-apis:query       'subject.umls:C0800007 AND predicate:TREATS AND object.semtypes:(acab OR anab OR cgab OR comd OR dsyn OR mobd OR neop)'
  bte:call-apis:query     ],
  bte:call-apis:query     scopes: []
  bte:call-apis:query   },
  bte:call-apis:query   method: 'post',
  bte:call-apis:query   timeout: 50000,
  bte:call-apis:query   headers: { 'User-Agent': 'BTE/dev Node/v18.16.1 darwin' }
  bte:call-apis:query } +1ms
  bte:call-apis:query query success, transforming hits->records... +373ms
  bte:api-response-transform:index api name BioThings SuppKG API +45s
  bte:api-response-transform:index api tags: drug,disease,association,chemical,metadata,query,translator,biothings +0ms
  bte:call-apis:query Successful POST https://biothings.ncats.io/suppkg (18 IDs): SmallMolecule > treats > Disease (obtained 0 records, took 372ms) +46ms
  bte:call-apis:query query completes. +0ms
  bte:call-apis:query Total number of records returned for this query is 0 +0ms

@rjawesome
Copy link
Contributor

I'm working on fixing the Biothings extra functionality

@colleenXu
Copy link
Collaborator

colleenXu commented Oct 4, 2023

Current situation, as discussed by Jackson @tokebe and I:

  1. The CTD bug (convo starts here) is a regression in behavior, so we want to fix it. Jackson recently pushed commits that should address this so ➡️ next step is for me to test
  2. the identifier-case / "lower case letters in IDs for some namespaces" situation is separate and just being exposed by this api-response-transform module work. So I'll open a separate issue for this.
some notes

  • we're in agreement that this likely affects all transformers of the api-response-transform module
  • we're in agreement that it'll be easier to have a default (all-caps) and then have a config of IDs that don't follow that. So...I'll need to curate a list of the ID namespaces we currently use that have lower-case letters in their IDs...
  • Jackson pointed out that the rest of BTE seems to be handling identifier-case fine (and I couldn't find any previous issues on this topic). So perhaps this is mostly limited to api-response-transform functionality right now...

  1. Jackson says it's difficult to write and test JQ strings for BTE's implementation. I haven't tried yet, but I agree that it's a learning curve to read JQ strings...what can we do to make this easier for us and future contributors? EDIT: I'm going to make a separate issue on this, so convo can continue there...
    • Jackson says this likely isn't a blocker to getting this set of JQ PRs done and deployed, but it is an issue going forward
    • Jackson suggested a "testing tool" to help with writing and testing

And next steps once the CTD fix is confirmed to work:

  1. update branches in both PRs (Use JQ api-respone-transform.js#38, Processing transformer field on operations smartapi-kg.js#61 to latest main. Check that everything still works?
  2. for external apis, try moving the jq stuff to the smartapi yaml and test that all that works, as discussed previously
  3. then discuss deployment. Particuarly, we'll want to figure out the following:
    • should Biothings handling stay as “jq in bte” and not in the smartapi yamls?
    • how do we deploy the JQ PRs, when we'd like to have "jq in smartapi" for external apis?
      • related: we'll want to update CTD x-bte annotation, to take advantage of batch-querying functionality in the JQ PRs...
My ideas on deployment

We've floated the idea of "smartapi overrides" before

But I wonder, if an API has "JQ in smartapi" and "JQ in BTE", does the "JQ in smartapi" take priority ("JQ in BTE" not used)? That's my interpretation of the logic in the api-response-transform's src/index.ts file...And if so, maybe there could be another deployment method:

  • deploy with "jq in bte" to all instances
  • Once that's done, merge a registry repo PR with "jq in smartapi". This will then be used by the external apis instead of "jq in bte"
  • We'll have a PR to remove the "jq in bte" for these external apis. We can deploy it with other features anytime

@colleenXu
Copy link
Collaborator

colleenXu commented Oct 4, 2023

Deployment discussion with Jackson @tokebe and I today:

I'll do many of the current status / next steps from the previous post

  • BioThings handling: keep as "jq in bte". Having it in smartapi yamls would be annoying (changing in every yaml vs 1 spot)
  • multi-stage deployment:
    • deploy to all instances ("jq in bte" gets used)
      • we may be able to deploy "jq in smartapi" immediately, since it should just be ignored by instances that don't have the JQ PRs deployed yet. we should be able to test this before starting the deployment process
    • then...x-bte changes related to the new functionality can then be deployed
      • CTD batch-querying being one example (changing q strings and supportBatch:true)
      • BioThings extra functionality: errr....it doesn't seem urgent / necessary to change existing annotations to use this
    • then...we can delete stuff to avoid dev confusion (can deploy w/ other features anytime)
      • the "jq in bte" for external apis won't be needed if it's in the smartapi yaml
      • the old javascript transformers could be removed

@colleenXu
Copy link
Collaborator

colleenXu commented Oct 11, 2023

@tokebe

For CTD, it looks like the latest commits fix the KEGG.PATHWAY operation, yay!

But...it looks like a new bug's been introduced for batch-querying post-processing. Now when the test setup + query is run, every starting ID is linked to every output ID which is incorrect.

info

There should be 18 results, according to the desired response in the linked issue.

However, now I'm getting 34 results. And genes that should be only linked to 1 starting ID are instead showing up as linked to two IDs (original API response):

  • NCBIGene:142 (PARP1) should only be linked to D015250 (Aclarubicin; PUBCHEM.COMPOUND:451415)
  • NCBIGene:1080 (CFTR) and NCBIGene:10800 (CYSLTR1) should only be linked to C006303 (acivicin; PUBCHEM.COMPOUND:294641)

@tokebe
Copy link
Member

tokebe commented Oct 11, 2023

Ah, I think I see where this issue is occurring, let me put together a fix.

@tokebe
Copy link
Member

tokebe commented Oct 11, 2023

Additionally, from discussion with @colleenXu regarding the $edge variable, here's an example:

$edge example
{
  query_operation: {
    _params: {
      format: "json",
      inputTermSearchType: "directAssociations",
      inputTerms: "{{ queryInputs | replPrefix('KEGG') }}",
      inputType: "pathway",
      report: "genes_curated",
    },
    _requestBody: undefined,
    _requestBodyType: undefined,
    _supportBatch: false,
    _useTemplating: true,
    _inputSeparator: undefined,
    _templateInputs: undefined,
    _batchSize: undefined,
    _method: "get",
    _pathParams: [
    ],
    _server: "http://ctdbase.org",
    _path: "/tools/batchQuery.go",
    _tags: [
      "translator",
      "ctd",
    ],
    _transformer: {
      wrap_jq: undefined,
      pair_jq: undefined,
    },
  },
  association: {
    input_id: "KEGG.PATHWAY",
    input_type: "Pathway",
    output_id: "NCBIGene",
    output_type: "Gene",
    predicate: "has_participant",
    qualifiers: undefined,
    source: undefined,
    api_name: "CTD API",
    smartapi: {
      id: "0212611d1c670f9107baf00b77f0889a",
      meta: {
        date_created: "2023-03-14T06:06:50.582062+00:00",
        last_updated: "2023-05-22T07:04:00.328952+00:00",
        url: "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/master/CTD/smartapi.yaml",
        username: "colleenXu",
      },
    },
    "x-translator": {
      component: "KP",
      team: [
        "Service Provider",
      ],
      infores: "infores:ctd",
    },
    apiIsPrimaryKnowledgeSource: true,
  },
  response_mapping: {
    has_participant: {
      NCBIGene: "data.GeneID",
      input_name: "data.PathwayName",
    },
  },
  tags: [
    "translator",
    "ctd",
  ],
  reasoner_edge: {
    id: "e1",
    predicate: undefined,
    subject: {
      id: "n1",
      category: [
        "biolink:Pathway",
      ],
      expandedCategories: [
        "Pathway",
      ],
      equivalentIDsUpdated: false,
      curie: [
        "KEGG.PATHWAY:hsa05323",
      ],
      is_set: undefined,
      expanded_curie: {
        "KEGG.PATHWAY:hsa05323": [
          "KEGG.PATHWAY:hsa05323",
        ],
      },
      entity_count: 1,
      held_curie: [
      ],
      held_expanded: {
      },
      constraints: undefined,
      connected_to: {
      },
      equivalentIDs: {
        "KEGG.PATHWAY:hsa05323": {
          primaryID: "KEGG.PATHWAY:hsa05323",
          equivalentIDs: [
            "KEGG.PATHWAY:hsa05323",
          ],
          label: "KEGG.PATHWAY:hsa05323",
          labelAliases: [
            "KEGG.PATHWAY:hsa05323",
          ],
          primaryTypes: [
            "Pathway",
          ],
          semanticTypes: [
            "Pathway",
          ],
          attributes: {
          },
        },
      },
    },
    object: {
      id: "n2",
      category: [
        "biolink:Gene",
      ],
      expandedCategories: [
        "Gene",
      ],
      equivalentIDsUpdated: false,
      curie: undefined,
      is_set: undefined,
      expanded_curie: {
      },
      entity_count: 0,
      held_curie: [
      ],
      held_expanded: {
      },
      constraints: undefined,
      connected_to: {
      },
    },
    expanded_predicates: undefined,
    qualifier_constraints: [
    ],
    reverse: false,
    executed: false,
    logs: [
    ],
    records: [
    ],
  },
  input: {
    queryInputs: "hsa05323",
  },
  input_resolved_identifiers: {
    "KEGG.PATHWAY:hsa05323": {
      primaryID: "KEGG.PATHWAY:hsa05323",
      equivalentIDs: [
        "KEGG.PATHWAY:hsa05323",
      ],
      label: "KEGG.PATHWAY:hsa05323",
      labelAliases: [
        "KEGG.PATHWAY:hsa05323",
      ],
      primaryTypes: [
        "Pathway",
      ],
      semanticTypes: [
        "Pathway",
      ],
      attributes: {
      },
    },
  },
  original_input: {
    "KEGG.PATHWAY:hsa05323": "KEGG.PATHWAY:hsa05323",
  },
  filter: undefined,
}

@tokebe
Copy link
Member

tokebe commented Oct 11, 2023

Just to follow-up: I think this should probably be abstracted into a simpler interface for anybody writing a JQ string in their annotation. If so, that would be a blocking requirement prior to deployment. Any work on such an abstracted interface is pending further discussion between @colleenXu and myself, so I'll handle implementation if we move in that direction.

@tokebe
Copy link
Member

tokebe commented Oct 12, 2023

@colleenXu CTD batch processing should now be fixed, please re-test and let me know if there are any other misbehaviors.

@colleenXu
Copy link
Collaborator

@tokebe @andrewsu @newgene

Hmmm...I'd like to add some details from the discussion Jackson and I had yesterday (related to the 3 previous posts by Jackson).

I have a collection of saved posts from this issue and PRs, on how to actually write JQ stuff for BTE.

One of these posts says that the JQ strings can use variables from BTE's internal representation of stuff ($edge).

Jackson and I discussed this:

  • it seems like "pair" JQ-strings use info from $edge, and "wrap" doesn't
    • "pair" is defined in this post as "match each entry returned by API to it's curie in an object"
    • VS "wrap" which is "process the raw data from the API before the response mapping is applied"
  • $edge is a pretty complicated object, and it's not always clear at a glance what its various fields store (unless a fully-fleshed out instance is pulled out during query-execution using breakpoints)
  • $edge is related to BTEKGOperationObject (I'm not clear on what this is, relative to kinds of Edges summarized here during the vocab refactor)

But...I'm still unsure on whether this should be a blocker to deployment. Could it be saved for a next step/phase of BTE JQ work?

  • how often is $edge info needed?
  • how urgent / necessary is it to have "abstraction" during this phase of JQ work? Maybe it's okay to implement and use this work for a while first, to gather the info needed to build the "abstraction":
    • right now, it's unclear to me what info from $edge is needed for JQ-string-writing
    • there's also what format the $edge info will be in + what format it needs to stay in (in case JQ-work is actually changing the variables in $edge, but I dunno if any of our current code-strings do this?).
  • In the future, do we expect outside devs to write JQ-strings that involve $edge info (VS BTE devs always taking the lead on this)?

@tokebe
Copy link
Member

tokebe commented Oct 13, 2023

  • If we deploy, and then later restructure $edge, you have to go back and fix the jq strings on any annotation that uses it. Personally, I'm against that, but if getting JQ out the door is a higher priority than getting it out the door in a polished state, then I defer to @colleenXu @andrewsu @newgene.
  • I expect $edge to be used in almost any annotation that uses the pair function, because it contains information that is often useful for the logic, as well as giving the edge inputs where an API might not be returning them, or returning them malformed.
  • I'd expect that the intent is for outside devs to occasionally use this, as it's a part of the annotation, which we already expect outside devs to be able to work on. I don't see how a non-abstracted, very dense edge object is desirable, even if BTE-only devs are the ones using it.

@tokebe
Copy link
Member

tokebe commented Oct 13, 2023

From discussion with @andrewsu:

For now, table jq in smartapi -- don't move any jq strings to smartapi and don't advertise it as an option. Move forward with deploying code for jq in BTE (it's fine to deploy the jq in smartapi code, just don't start moving jq to smartapi). We can then come back later and revisit this for usability improvements and move forward with jq in smartapi.

@newgene
Copy link
Member Author

newgene commented Oct 13, 2023

I also agree with that plan. Starting with our own use first before we advertise it. The original motivation of this issue is to reduce the needs of bte's own code changes, mostly for us.

@colleenXu
Copy link
Collaborator

colleenXu commented Oct 17, 2023

@tokebe

I retested CTD and all expected behaviors are working, yay!

I made a post here on whether BTE will need more adjustments for CTD batch-querying support. However, I don't think this is necessarily a barrier to deploying "JQ in BTE".

EDIT: I've double-checked all APIs with "JQ in BTE", and no regression in behavior....all are working as-expected.

I also tested w/ and w/o the smartapi-kg PR, and the basic "JQ in BTE" behavior appeared identical...

@colleenXu
Copy link
Collaborator

Jackson @tokebe and I decided to deploy only biothings/api-respone-transform.js#38 to address this issue

Details from our Slack DM convo:

  • Jackson: "smartapi-kg PR likely won't need significant changes in the future (everything needing changes that I'm aware of for future jq-in-smartapi stuff is in the api-response-transform)"
  • My reasons to put off deploying the smartapi-kg PR right now:
    • If we deploy it now, then "minimal JQ in SmartAPI" is basically silently live. So if we want to deploy "better JQ in SmartAPI" in the future w/ an api-response-transform PR, we have to wait for that future PR to deploy to all instances before we can safely update the yamls (x-bte w/ JQ)
    • VS if we waited to deploy the smartapi-kg PR with the future api-response-transform PR, we can also deploy the updated yamls right away - instances that don't have the PRs yet won't be able to ingest the new fields and will just ignore it
  • Jackson's reply: "You're right. I overlooked that, we should leave out smartapi-kg for now"

@colleenXu colleenXu added the On Dev Related changes are deployed to Dev server label Oct 20, 2023
@colleenXu colleenXu added On CI Related changes are deployed to CI server and removed On Dev Related changes are deployed to Dev server labels Oct 24, 2023
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server labels Dec 20, 2023
@tokebe
Copy link
Member

tokebe commented Feb 21, 2024

Relevant changes deployed to Prod.

@tokebe tokebe closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jq / jmespath On Test Related changes are deployed to Test server
Projects
None yet
Development

No branches or pull requests

6 participants