Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: BTE misparsing Automat responses #776

Closed
colleenXu opened this issue Jan 18, 2024 · 3 comments
Closed

bug: BTE misparsing Automat responses #776

colleenXu opened this issue Jan 18, 2024 · 3 comments
Assignees
Labels
bug Something isn't working On CI Related changes are deployed to CI server

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Jan 18, 2024

While working on #771, I noticed results where the Automat edges seemed almost identical.

After digging into this, I suspect that some of these edges are erroneously created by BTE and don't actually exist.

Example 1: imatinib-KIT from Automat Pharos

If we query BTE for edges between imatinib (PUBCHEM.COMPOUND:5291) and KIT (NCBIGene:3815), we get 5 edges from Automat Pharos imatinib-KIT.json:

  • 3 have the same pKd of 7.9 and 4 PMIDs in their edge attributes:
    • ef352ea5b97b591391cdc453c24ef097: related_to
    • dbac89955b6e2a343ad9a93ae4918d26: physically_interacts_with
    • b646f32eafa1a151eea892b5a0cf7765: binds
  • 2 have the same qualifier set (causes decreased activity), pKd of 7.89, and 2 PMIDs:
    • 780d2bfeac965695fb923f35ff1c0bb2: related_to
    • 000974f0ffe73e3a28b3f6297d5bd3c3: affects

VS when I query Automat Pharos directly for edges between imatinib (PUBCHEM.COMPOUND:5291) and KIT (NCBIGene:3815), I get 2 edges automat-pharos-1.json:

  • binds (pKd 7.9 + 4 PMIDs)
  • affects (with the qualifier set, pKd 7.89, 2 PMIDs)

Example 2: imatinib-PDGFRA from Automat DrugCentral

If we query BTE for edges between imatinib (PUBCHEM.COMPOUND:5291) and PDGFRA (NCBIGene:5156; as a Gene and a Protein), we get a similar situation to Example 1: 5 edges from Automat DrugCentral imatinib-PDGFRA.json...

  • 3 have the same Kd of 7.51 in their edge attributes:
    • f6f4696df6e39957f4cc8a51a1f4e737: related_to
    • 643a722004d529d83ecab7dc2df7031d: physically_interacts_with
    • 547c79b19227f6637110140877059ed3: binds
  • 2 have the same qualifier set (causes decreased activity) and IC50 of 7.3 in the edge-attributes:
    • e42d078b7fa828c5ea7e531556289833: related_to
    • 1cec3bbf58fe9ca53a48016975785eaa: affects

VS when I query Automat DrugCentral directly for edges between imatinib (PUBCHEM.COMPOUND:5291) and PDGFRA (UniProtKB:P16234, as a Protein), I get 2 edges automat-drugcentral-1.json:

  • binds (Kd of 7.51)
  • affects (with the qualifier set, IC50 of 7.3)

Note: this example also has 2 almost-identical Automat Pharos edges (related_to and affects, both with the same qualifier set, pIC50 of 8.7, and 3 PMIDs). The affects edge is likely the real one.


I think this is what is happening:

  • When BTE queries Automat KPs following a MetaEdge's info, Automat KPs will return edges where the predicate is a descendant (not an exact match)
  • But BTE will then assign the record's predicate using the MetaEdge's info - creating an Edge with the wrong predicate (doesn't actually exist in the Automat resource)
  • So a possible solution could be to drop a record for a TRAPI KP when the predicate/qualifier-set of the record doesn't match that of the MetaEdge?

However, there are some oddities. BTE doesn't seem to be creating extra edges...

  • in every case of an Automat KP edge. So there may be some inconsistency or difference in meta_knowledge_graphs between Automat KPs.
    • For example, in both of the cases above, there's an Automat Hetio edge that doesn't show this "duplicating" behavior
  • for qualifier-sets. It's good that BTE isn't doing this! But I'm a bit suspicious that this could happen and just isn't captured by these examples.
    • Ex: there's edges with the qualifier-set "causes decreased activity" but no edges for subsets of that qualifier set ("causes", "causes activity", "decreased activity", etc).

And a note: while I think there are real, separate edges in Automat pharos vs hetio vs drugcentral...these edges are very similar - as if they come from the same underlying sources (chembl?). Dunno if we want to bring that up with the Automat team...

@colleenXu colleenXu added the bug Something isn't working label Jan 18, 2024
@colleenXu
Copy link
Collaborator Author

@tokebe @andrewsu I imagine that we'd want to get this bug addressed before this upcoming release...

@tokebe tokebe self-assigned this Jan 18, 2024
@tokebe
Copy link
Member

tokebe commented Jan 18, 2024

Considering this high priority.

@colleenXu colleenXu added the On Dev Related changes are deployed to Dev server label Jan 19, 2024
@tokebe tokebe added On CI Related changes are deployed to CI server and removed On Dev Related changes are deployed to Dev server labels Jan 19, 2024
@colleenXu
Copy link
Collaborator Author

colleenXu commented Feb 21, 2024

I've confirmed that things work as-expected after the Prod deployment. Closing issue

Follow-up for example 1

POST to https://bte.transltr.io/v1/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:5291"],
                    "categories":["biolink:ChemicalEntity"],
                    "name": "imatinib"
                },
                "n1": {
                    "ids":["NCBIGene:3815"],
                    "categories":["biolink:Gene"],
                    "name": "KIT"
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

I now get only the two Automat-pharos edges that are in the original API automat-pharos-fixed.json:

  • b646f32eafa1a151eea892b5a0cf7765: binds (pKd of 7.9 and 4 PMIDs)
  • 000974f0ffe73e3a28b3f6297d5bd3c3: affects (qualifier set (causes decreased activity), pKd of 7.89, and 2 PMIDs)

Follow-up for example 2

POST to https://bte.transltr.io/v1/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:5291"],
                    "categories":["biolink:ChemicalEntity"],
                    "name": "imatinib"
                },
                "n1": {
                    "ids":["NCBIGene:5156"],
                    "categories":["biolink:Gene", "biolink:Protein"],
                    "name": "PDGFRA"
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

I now get only the two Automat-drugcentral edges that are in the original API automat-drugcentral-fixed.json:

  • 547c79b19227f6637110140877059ed3: binds (Kd of 7.51)
  • 1cec3bbf58fe9ca53a48016975785eaa: affects (qualifier set (causes decreased activity) and IC50 of 7.3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working On CI Related changes are deployed to CI server
Projects
None yet
Development

No branches or pull requests

2 participants