Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For Multiomics/Text-Mining APIs, avoid merging records that only differ by their edge.sources contents #783

Closed
colleenXu opened this issue Feb 21, 2024 · 7 comments

Comments

@colleenXu
Copy link
Collaborator

In both #775 and #774 (comment) (see third bullet in "implementation notes" and the collapsed details), we noticed that BTE will merge records that have different TRAPI-edge-source content. For x-bte annotation, we uses the trapi_sources response-mapping keyword (feature from #617) to tell BTE to ingest/handle this content.

This can lead to undesired behavior, like a TRAPI edge that has two primary knowledge sources - this should cause a TRAPI validation error. See those issues for examples.

A fix specific for Monarch API was implemented in biothings/api-respone-transform.js@3534b23, but it would be nice to implement a solution for all APIs that use trapi_sources. Right now, these are only the BioThings APIs we make in collab with other Translator teams, Multiomics/Text-Mining.

Note: If it's possible, it may be nice to have a solution that could theoretically work with any API (non-BioThings, external), just in case we do post-processing to create TRAPI-source-content with other APIs in the future (similar to, but probably more involved, than what we did with Monarch API).

@tokebe
Copy link
Member

tokebe commented Feb 21, 2024

Fix implemented as part of biothings/api-respone-transform.js#63

@colleenXu
Copy link
Collaborator Author

oh ooops, I didn't realize that the solution for Monarch API actually may work for all APIs?

I guess my next step is finding a example to test, to confirm that it is working with another API...

@tokebe
Copy link
Member

tokebe commented Feb 21, 2024

The fix I implemented in the Monarch PR affects how Record hashes are calculated across the board (i.e. all records now calculate their hash using their source/provenance chain), so if I understand this issue correctly, the fix should cover it completely.

@colleenXu colleenXu added the On CI Related changes are deployed to CI server label Feb 21, 2024
@colleenXu
Copy link
Collaborator Author

Jackson and I confirmed that biothings/api-respone-transform.js#63 addresses #775 in a more robust way.

So we can revert biothings/bte_trapi_query_graph_handler#178 as part of #784 (changing that issue to be about removing temporary things we no longer need).


How I tested this

First, in a local install of BTE, be in all main branches. Remove or comment out the line added in https://github.com/biothings/bte_trapi_query_graph_handler/pull/178/files. (remember to build if needed to incorporate the change)

Then send this query to Multiomics biggim-drug-response through BTE: http://localhost:3000/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query

Query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:5291"],
                    "categories":["biolink:ChemicalEntity"],
                    "name": "imatinib"
                },
                "n1": {
                    "categories":["biolink:Gene"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:physically_interacts_with"]
                }
            }
        }
    }
}

The response will have an odd edge to the gene KIT (NCBIGene:3815), with two primary knowledge sources in the sources section: TTD and drugcentral.

odd edge: bug

                "b46ae093d111737f77f8c728368c0040": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:5291",
                    "object": "NCBIGene:3815",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "attributes": [
                                {
                                    "attribute_type_id": "biolink:source_infores",
                                    "value": "infores:drugcentral"
                                }
                            ],
                            "value": "infores:drugcentral"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": "infores:biothings-multiomics-biggim-drugresponse"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:ttd",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-multiomics-biggim-drugresponse",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:ttd",
                                "infores:drugcentral"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-multiomics-biggim-drugresponse"
                            ]
                        },
                        {
                            "resource_id": "infores:drugcentral",
                            "resource_role": "primary_knowledge_source"
                        }
                    ]
                },

Then check out the branch for biothings/api-respone-transform.js#63 (remember to build if needed to incorporate the change). Then run the same query again.

The response will change to two separate edges, 1 for TTD and 1 for drugcentral

                "c0dd640042751eec03bf660be88d6075": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:5291",
                    "object": "NCBIGene:3815",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "attributes": [
                                {
                                    "attribute_type_id": "biolink:source_infores",
                                    "value": "infores:ttd"
                                }
                            ],
                            "value": "infores:ttd"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": "infores:biothings-multiomics-biggim-drugresponse"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:ttd",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-multiomics-biggim-drugresponse",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:ttd"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-multiomics-biggim-drugresponse"
                            ]
                        }
                    ]
                },

                "070cd562c2640dc1f5cd8e4f2050fadf": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:5291",
                    "object": "NCBIGene:3815",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "attributes": [
                                {
                                    "attribute_type_id": "biolink:source_infores",
                                    "value": "infores:drugcentral"
                                }
                            ],
                            "value": "infores:drugcentral"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": "infores:biothings-multiomics-biggim-drugresponse"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:drugcentral",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-multiomics-biggim-drugresponse",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:drugcentral"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-multiomics-biggim-drugresponse"
                            ]
                        }
                    ]
                },

@colleenXu
Copy link
Collaborator Author

Also, I think this will only be an issue for Multiomics biggim-drug-response, because that KP uses multiple sources.

VS the other APIs are basically single source:

  • multiomics ehr risk
  • multiomics wellness
  • multiomics clinicaltrials
  • text-mining targeted

@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server labels Mar 14, 2024
@tokebe
Copy link
Member

tokebe commented Apr 17, 2024

@colleenXu Can this issue be closed? Relevant BTE code is now on Prod.

@colleenXu
Copy link
Collaborator Author

colleenXu commented Apr 18, 2024

Confirmed that it's fixed now: there's two edges when posting the example query to https://bte.transltr.io/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query (Prod instance).

@colleenXu colleenXu removed the On Test Related changes are deployed to Test server label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants