Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"treats refactor" aka update BTE and x-bte annotations to latest biolink-model (Spring 2024 Translator feature) #788

Closed
colleenXu opened this issue Feb 29, 2024 · 15 comments
Assignees

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Feb 29, 2024

[UPDATED]

We'll update to biolink-model 4.1.6:

  • it has inverses for all new predicates (4.1.4)
  • has some SEMMEDDB mappings we use when generating x-bte annotation for BioThings SEMMEDDB (added back in 4.1.5)
  • removes commas from 1 of the new predicate inverses (4.1.6)

Currently this is due in Dev/CI by 3/8 , with coordinated Translator requests to deploy to Test on 3/8 CORRECTION: Chris Bizon in Architecture call 3/5 says not to deploy to Test.

What's involved:

  • review the data-modeling changes from 3.5.3 (current version) -> 4.1.6 (latest version), ASAP -> me, DONE
  • if it seems smooth to update, start updating:
    • update what biolink-model module is using -> Jackson @tokebe
    • adjust x-bte annotation using branches, use overrides to those branches -> me
  • still in the air: communication from Multiomics/Text-Mining. I have told them that any x-bte annotation adjustments need to be done in branches / using overrides.
@colleenXu colleenXu self-assigned this Feb 29, 2024
@colleenXu
Copy link
Collaborator Author

colleenXu commented Mar 1, 2024

Analysis of changes from biolink-model 3.5.3 -> 4.1.6

TDLR: Should be smooth process to update (CC Jackson @tokebe). While we can get the biolink-model module PR ready, I'm not sure if more changes are coming - I provided Sierra with some feedback, particularly the wording of the new predicates' inverses.

Note: I've been discussing issues with Sierra about biolink 4, which has led to changes and a moving goalpost of what biolink-model version to implement. Currently, we're on 4.1.6.

New chem/drug/treatment ↔️ disease/pheno predicates

  • promotes condition/condition promoted by: mixin, promoting condition's manifestation in the first place (opposite of preventative for condition)
    • note: inverse doesn't act as a mixin (see note below)
  • studied to treat/treated in studies by: report that a research study happened but not enough evidence of actually working
    • noting: inverse's wording was studied for treatment with in 4.1.0-4.1.3, but changed in 4.1.4
  • in clinical trials for/tested by clinical trials of (? asked about this inverse's wording): report clinical trial was performed to test potential of intervention to treat condition
  • in preclinical trials for/tested by preclinical trials of (? asked about this inverse's wording): report pre-clinical study was done to test potential of intervention to treat condition
  • beneficial in models for/models demonstrating benefits for (? asked about this inverse's wording): reports of working treatment (alleviate, prevent, delay symptoms) in a model system for the disease
  • applied to treat/treatment applications from (? asked about this inverse's wording): report observations of human use in real world (not necessarily approved or effective). Can be from self-reporting (faers, aeolus) or off-label use
  • treats or applied or studied to treat/subject of treatment application or study for treatment by: mixin, should be able to use directly. Def - represent sources not clear about "treats", like text-mined statements. Now maps to SEMMEDDB treats.
    • note: inverse doesn't act as a mixin (see note below)

Adjusted chem/drug/treatment ↔️ disease/pheno predicates

  • ameliorates condition/condition ameliorated by: replaces ameliorates/is ameliorated by. Def - can ameliorate symptoms, stable progression, cure condition. "beneficial/therapeutic for condition" (narrower than treats because it doesn't include prevention/reduce risk of future disease).
  • preventative for condition/condition prevented by: replaces prevents/prevented by, with broad-mapping to SEMMEDDB prevents. Def - prophylactic for, preventing condition from manifesting in the first place
  • predisposes to condition/condition predisposed by: replaces predisposes/has predisposing factor, with broad-mapping to SEMMEDDB predisposes. slightly more specific than promotes condition, but basically the same? Def - "increases chances of condition coming to be"
  • exacerbates condition/condition exacerbated by: replaces exacerbates/is exacerbated by. Def - worsens some or all aspects of condition, detrimental for condition. Mapped to SEMMEDDB COMPLICATES
  • treats/treated by: mixin, can still be used directly. Only use with strong supporting evidence (approved for this condition, passed phase 3, in phase 4, established treatment) or in the creative-mode prediction edges.
    • note: inverse doesn't act as a mixin (see note below)
  • contraindicated in (inverse is still the same has contraindication): replaces contraindicated for. Def - shouldn't be applied as intervention with patients with condition because of risk for detrimental outcomes. Condition = biological entity because it's more general than disease/pheno - can include biological state/pregnancy and people taking warfarin/different meds being taken

Other

  • "treats-refactor" edge-attributes:
    • clinical approval status replaces FDA approval status, goes with the ClinicalApprovalStatusEnum (which has a hierarchy, replaces FDAApprovalStatusEnum)
    • add max research phase, goes with the MaxResearchPhaseEnum (which has a hierarchy)
  • affects BioThings semmeddb / suppKG x-bte annotation
    • STY:T167 (substance, sbst) now mapped to chemical entity
    • STY:T190 (Anatomical Abnormality, anab) now narrow-mapped to pheno (before it was mapped to disease)
  • splicing and molecular interaction added as possible terms for aspect qualifier (GeneOrGeneProductOrChemicalEntityAspectEnum)
  • infores catalog:
    • deprecated quickgo
    • using new human-goa (human gene-ontology annotations) rather than other infores?
  • transcript, exon, coding sequence became children of biological entity, rather than nucleic acid entity
  • not new: category including "ancestors" of the main class for the node
  • new association types:
    • gene affects chemical association (w/ qualifier use)
    • feature or disease qualifiers to entity mixin (reverse of existing entity to feature or disease qualifiers mixin)
    • phenotypic feature to entity association mixin (reverse of existing entity to phenotypic feature association mixin)
    • phenotypic feature to disease association
  • doesn't affect us:
    • superclass_of is labeled as inverse of subclass_of (subclass stays as the canonical one)
    • severity qualifier, severity value, onset, onset qualifier deprecated
    • adjusted predicate: affects likelihood of/likelihood affected by: replaces affects risk for/risk affected by? Def - object doesn't yet exist, but the actions/execution of subject impacts likelihood that object will come to be.
      • Not for statistical associations/correlations (which is the similar-sounding associated with likelihood of/likelihood associated with)
      • different from affects: the subject has an effect on the object which already exists

@colleenXu
Copy link
Collaborator Author

And noting other Translator documentation:

@colleenXu
Copy link
Collaborator Author

colleenXu commented Mar 5, 2024

Noting the changes to biothings semmeddb x-bte annotation specifically (also shared with Translator data-modeling group - Slack link)

Not treats-refactor:

  • STY:T167 (substance, sbst) now mapped to chemical entity: before we didn't expose info with this semantic type because there was no mapping to biolink-model category. Now we can expose some of this info
  • STY:T190 (Anatomical Abnormality, anab) now narrow-mapped to pheno (before it was mapped to disease)

Treats-refactor:

  • SEMMEDDB:PREVENTS now maps to preventative for condition/condition prevented by (old predicates were removed/replaced: prevents/prevented by)
  • SEMMEDDB:PREDISPOSES now maps to predisposes to condition/condition predisposed by (old predicates were removed/replaced: predisposes/has predisposing factor)
  • SEMMEDDB:COMPLICATES now maps to exacerbates condition/condition exacerbated by (old predicates were removed/replaced: exacerbates/is exacerbated by)
  • SEMMEDDB:TREATS now maps to treats or applied or studied to treat/subject of treatment application or study for treatment by

Potential issue:

  • for prevents/predisposes/complicates, semmeddb metatriples maybe don't use chem/drug/treatment subjects or disease/pheno objects - which would cause a mismatch with the biolink-predicates that have those domain/range restrictions...
  • previously we tried to address this with semmeddb review process update semmeddb SmartAPI based on domain/range constraint curation #669

@colleenXu
Copy link
Collaborator Author

colleenXu commented Mar 7, 2024

Update

PRs to get onto dev/CI this Friday:

The current push is to "update the KPs" with the "treats-refactor"/updated biolink-model. However, there will be another push to "update the ARAs" to use these updated KPs.


Also some issues that have been brought up but don't affect the Friday deployment:

  • we'll want to treat the new mixin predicates as if they aren't mixins because their inverses aren't functional as mixins.
    • there are mixin predicates for the chem -> drug direction (promotes condition, treats, treats or applied or studied to treat). See the visualization (they're purple, and you can see what they map to).
    • But for their inverses:
      • condition promoted by isn't a mixin
      • treated by and subject of treatment application or study for treatment by are set as mixins, but other predicates aren't mapped to them
    • for example: if we flip treats or applied or studied to treat to subject of treatment application or study for treatment by for execution, the predicate isn't a mixin that'll expand to all "treats-related" predicates the same way the canonical predicate will.
  • adjusting the predicate wording

@colleenXu colleenXu changed the title "treats refactor" aka update BTE and x-bte annotations to latest biolink-model as a KP: "treats refactor" aka update BTE and x-bte annotations to latest biolink-model Mar 8, 2024
@colleenXu colleenXu changed the title as a KP: "treats refactor" aka update BTE and x-bte annotations to latest biolink-model "treats refactor" aka update BTE and x-bte annotations to latest biolink-model Mar 8, 2024
@colleenXu colleenXu changed the title "treats refactor" aka update BTE and x-bte annotations to latest biolink-model Translator feature: "treats refactor" aka update BTE and x-bte annotations to latest biolink-model Mar 13, 2024
@colleenXu colleenXu changed the title Translator feature: "treats refactor" aka update BTE and x-bte annotations to latest biolink-model "treats refactor" aka update BTE and x-bte annotations to latest biolink-model (Spring 2024 Translator feature) Mar 13, 2024
@tokebe tokebe added the On CI Related changes are deployed to CI server label Mar 14, 2024
@colleenXu
Copy link
Collaborator Author

Noting that there's now a "release candidate" for a biolink-model version > 4.1.6. I don't think we need to do update to this...

Here's my quick notes on what's new (comparing 4.2.0-rc.2 to 4.1.6):

  • onset qualifier/onset is back: but only for use with the HPO-annotations for disease-phenotype that have onset info
  • semmeddb: changed mappings for
    • SEMMEDDB:ASSOCIATED_WITH (biolink:related_to ➡️ biolink:associated_with)
    • SEMMEDDB:ADMINISTERED_TO (biolink:related_to -➡️ biolink:applied_to_treat)
    • STY:T123/bacs/Biologically Active Substance (SmallMolecule ➡️ ChemicalEntity)
    • STY:T131/hops/Hazardous or Poisonous Substance (SmallMolecule ➡️ ChemicalEntity)

@colleenXu
Copy link
Collaborator Author

colleenXu commented Mar 22, 2024

Update

The (should-be) last part of this feature is now ready to deploy: updating creative-mode "treats" (MVP1) to use the new predicates.

It was easier than expected because at the beginning of executing a query, BTE will first use the biolink-model to find the descendants of the predicates given. Then when BTE flips the QEdge for execution (Disease ID ➡️ Chem), it flips these predicates to their inverses.

So my concerns above (some inverse predicates not having descendants) ended up not being a problem.


So the BTE PRs for this feature are now:

Plus there's a Text-Mining Targeted parser/API update that should be deployed concurrently w/ the BTE PRs.


And WE ARE WAITING AND WON'T MERGE the SmartAPI yaml PRs until AFTER this feature is deployed to Prod:

After these SmartAPI yaml PRs are merged, overrides to this branch's yamls can be removed from BTE.

@colleenXu
Copy link
Collaborator Author

Noting:

At the moment, we use one set of operations for the MyChem chembl drug_indications (treatsChembl). We use the predicates in_clinical_trials_for / tested_by_clinical_trials_of based on Matt Brush's request to match what others in the consortium are doing. It seems like the CQS wants to query our tool and retrieve this dataset's info.

However, we could create operations with different predicates, based on different values for max_phase_for_ind.

Andrew and I have discussed this, and we decided not to do anything for now - in case CQS depends on the current setup.

@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels May 9, 2024
@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 12, 2024

I've proposed updating to the latest biolink-model because it fixes a problem with the new "treats" predicates. Here's my analysis of changes from biolink-model 4.1.6 ➡️ 4.2.1 (diff).

Problem (but not for us): incorrect domain/range for treats or applied or studied to treat/subject of treatment application or study for treatment by.
I don't think we do anything with the domain/range specifications. But reasoner-validator probably does - so this is a problem upstream of BTE. I've made a PR and scheduled a Slack message to Translator data-modeling on this.

Matters to us:

  • new "treats" predicates are connected to rest of hierarchy, when they weren't before. This allows ancestor predicates like related_to to include them when expanded.
    • treats or applied or studied to treat/subject of treatment application or study for treatment by
    • treated by (treats was already connected to the overarching mixin)
  • semmeddb: changed mappings - will want to update x-bte annotation.
    • STY:T123/bacs/Biologically Active Substance (SmallMolecule ➡️ ChemicalEntity)
    • STY:T131/hops/Hazardous or Poisonous Substance (SmallMolecule ➡️ ChemicalEntity)

Kinda matters to us right now:

  • onset qualifier (values specified by onset) is back: but only for use with the HPO disease-phenotype annotations that have onset info

Doesn't matter to us right now:

  • adding url as a node property for node urls that aren't curie-expansions

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 12, 2024

This is on-hold until the problem I found above with 4.2.1 is discussed/addressed sufficiently. I've scheduled a Slack message to Translator data-modeling on this.


If we updated to a later biolink-model version (>= 4.2.1), this is what would be needed...

@colleenXu colleenXu added needs discussion and removed On Test Related changes are deployed to Test server labels Jun 14, 2024
@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 14, 2024

The BTE code was deployed today to Prod as part of the Octopus release (see BTE PRs listed here). I tested and it's live.

I've done the next steps of merging the SmartAPI yaml PRs and making notes on the override-removal chore. #811 (comment)


However, there's a current issue with Text-Mining Targeted and I've let them know through Translator Slack: the "treats"-related operations are all broken in test/Prod right now.
The BioThings API's test/prod instances haven't been updated...but all BTE instances are now using the newer x-bte annotation (currently the override, but will use master afterwards).

Also, the update to biolink-model >4.2.1 is still in limbo. I've gotten a response from Sierra (Translator Slack link) saying a new biolink-model release will happen early next week.

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 20, 2024

Update

Biolink 4.2.2 has been released, which fixes the incorrect domain/range issue (noted in this comment).

Changes from 4.1.6 ➡️ 4.2.1 have been previously noted and prepped for.

Analysis

Changes from biolink-model 4.2.1 ➡️ 4.2.2 (diff: look at biolink-model.yaml file):

Matters to us

  • Changed namespace prefix from PHARMGKB.CHEMICAL ➡️ PHARMGKB.DRUG
  • two semmeddb types now not mapped to anything. Going to remove all operations using this type
    • STY:T122 / bmod / Biomedical or Dental Material (was mapped to Device before)
    • STY:T168 / food / Food (was mapped to Food before)

Doesn't matter to us right now

  • Added ApprovalStatusEnum, used by highest FDA approval status and drug regulatory status world wide
  • Added terms to qualifier aspect enum: absorption, aggregation, interaction (not sure how to use this), release

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 20, 2024

Prep to update BTE to biolink 4.2.2

UPDATE:
The deployment of these PRs can be tracked in this issue:

Also: Updated PR for x-bte annotation. I think this can be deployed to master (used by all instances) immediately. NCATS-Tangerine/translator-api-registry#151. Shouldn't really break anything upstream...(biggest change is the two SmallMolecule ➡️ ChemicalEntity changes and removing Food operations).
-> other choice is to use override, would only need for semmeddb.


Note: Previous PRs were merged but then reverted so CI only included Test-patch stuff for Fugu.

@colleenXu colleenXu added the On CI Related changes are deployed to CI server label Aug 28, 2024
@colleenXu
Copy link
Collaborator Author

I've merged the PR for updated x-bte annotation, and I'm running a refresh of the registry now. 10 min after the registry finishes refreshing, all instances of BTE should be using the new x-bte annotation for semmeddb...

The code PRs were merged + deployed to CI on Friday.

@tokebe tokebe added On CI -> Test and removed On CI Related changes are deployed to CI server labels Sep 3, 2024
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Sep 17, 2024
@tokebe
Copy link
Member

tokebe commented Oct 21, 2024

@colleenXu All set to close this issue?

@tokebe tokebe closed this as completed Oct 21, 2024
@tokebe tokebe reopened this Oct 21, 2024
@colleenXu colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Oct 24, 2024
@colleenXu
Copy link
Collaborator Author

Yep, can close this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants