Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data source: RARe-SOURCE #109

Closed
newgene opened this issue Mar 20, 2023 · 10 comments
Closed

Data source: RARe-SOURCE #109

newgene opened this issue Mar 20, 2023 · 10 comments
Assignees
Labels
data source Data source pending to create a new API

Comments

@newgene
Copy link
Member

newgene commented Mar 20, 2023

A gene-disease association knowledge source focused on rare diseases.

https://raresource.nih.gov/

Data files can be downloaded manually via the "export" button on each browsing page.

@newgene newgene added the data source Data source pending to create a new API label Mar 20, 2023
@erikyao
Copy link
Contributor

erikyao commented Apr 1, 2023

Data plugin repo: https://github.com/biothings/rare_source

API published to https://biothings.ncats.io/rare_source
ITRB CI is not working so it's not shown in the API list on the index page.

@colleenXu
Copy link

Next steps are creating SmartAPI yaml w/ x-bte annotation, registering it, adding it to the API_LIST (for full ingest). Want this process done by Mid-May.

Looks like example TRAPI queries will be Gene <-> Disease...

Discussion from Slack

Chunlei:
Yao deployed a new rare_source knowledge API last week at https://biothings.ncats.io/rare_source. It's organized by genes, and then the associated rare disease info is included as raresource.disease field. A few example queries:

This is a requested source by Tyler from this NIH resource: https://raresource.nih.gov/. If the returned data objects look good to us, the next step is to add a SmartAPI metadata and annotations of its gene-disease association, so it's integrated into BTE.

Andrew:
can you create an issue for the smartAPI annotation and the example TRAPI query? We can prioritize that at our next meeting. Chunlei, if you have thoughts on priority (aside from the fact that Tyler requested it), let us know...

Chunlei:
I also msg'ed Tyler for the priority, I assume it won't be too urgent, unless he has a direct use case depending on it.

Chunlei:
mid-May seems plenty of time for us.
From Tylor (looks like this was a request from Christine originally):
after checking in with Christine, she would like it if we could have the bulk of the RARESource data integrated into the system by mid-May. Would that be doable?

@colleenXu
Copy link

colleenXu commented May 12, 2023

SmartAPI yaml with x-bte annotation created and registered. Link above is to hook it up with BTE

NOTE 1

Because of the entity-based structure (organized by gene), basic querying doesn't accurately grab the cooccurrence_url when querying from disease -> gene (reverse, related to biothings/biothings_explorer#316). So we don't have that field when querying in this reverse direction.

However, @newgene says there is a new way of advanced querying that can help with this...

EXAMPLES

Responses are TRAPI 1.3.

Example 1: gene -> disease
{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:100"],
                    "categories": ["biolink:Gene"],
                    "name": "ADA"
                },
                "n1": {
                    "categories": ["biolink:Disease"]
                }
            }
        }
    }
}
Example 1 response: edge from RARe-SOURCE with cooccurrence url

EDIT: 2023-05-16: URL is correct now

                "7d4a84de4a253d7d998db33e4c9bd74f": {
                    "predicate": "biolink:gene_associated_with_condition",
                    "subject": "NCBIGene:100",
                    "object": "MONDO:0011338",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "value": [
                                "infores:rare-source"
                            ],
                            "value_type_id": "biolink:InformationResource"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": [
                                "infores:biothings-explorer",
                                "infores:biothings-rare-source"
                            ],
                            "value_type_id": "biolink:InformationResource"
                        },
                        {
                            "attribute_type_id": "biolink:xref",
                            "value": [
                                "https://raresource.nih.gov/literature/cooccurrence/ADA/0008198"
                            ]
                        }
                    ]
                },
Example 2: disease -> gene (REVERSE)
{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["ORPHANET:110"],
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            }
        }
    }
}
Example 2 response: edge from RARe-SOURCE WITHOUT cooccurrence url
                "841a0780a978809d06d8850a854c0db0": {
                    "predicate": "biolink:condition_associated_with_gene",
                    "subject": "MONDO:0015229",
                    "object": "NCBIGene:10806",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "value": [
                                "infores:rare-source"
                            ],
                            "value_type_id": "biolink:InformationResource"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": [
                                "infores:biothings-explorer",
                                "infores:biothings-rare-source"
                            ],
                            "value_type_id": "biolink:InformationResource"
                        }
                    ]
                },

@colleenXu
Copy link

colleenXu commented May 12, 2023

NOTE 2

EDIT 2023-05-16: fixed and the example responses above have been edited.

@newgene @erikyao

There seems to be some errors in the data/parser for the co-occurrence urls.

In the BioThings API entry for the gene ADA, the first hit's co-occurrence url is incorrect. It should have the term "ADA" in the url, aka it should be https://raresource.nih.gov/literature/cooccurrence/ADA/0008198. Instead it is https://raresource.nih.gov/literature/cooccurrence/RMRP/0008198.

I noticed this since I was reviewing/pasting the examples above (this is related to the first example).

@colleenXu
Copy link

colleenXu commented May 12, 2023

NOTE 3

EDIT 2023-05-22: this note has been addressed

Current limitations -> needs followup with SRI team

Disease IDs

The x-bte annotation doesn't allow BTE to retrieve all 2901 (gene-centric) records, because 55 records lack ORPHANET disease IDs (there may be more records affected but it's hard to tell because of the gene-centric records). ORPHANET was the ID-namespace with the 2nd-best coverage.

VS all records seemed to have GARD Disease IDs, but SRI Node Normalizer appears to have no support for this ID-namespace. Biolink-model also doesn't mention GARD as an ID-namespace.

Other options are:

  • OMIM - 2774
  • UMLS - 2529
  • MESH - 1450
  • ICD10-CM - 7

infores IDs

The infores for the BioThings API and original resource (RARe-SOURCE) are not in the infores registry tsv yet. PR has been made here biolink/biolink-model#1310

@erikyao
Copy link
Contributor

erikyao commented May 12, 2023

Hi @colleenXu, NOTE 2 is caused by shallow copying, will fix asap.

@newgene
Copy link
Member Author

newgene commented May 12, 2023

Thanks @colleenXu, for Note 3, I will pass the feedback to Tyler too, in addition to SRI team, who might be able to reach out the GARD team see if they have ID xrefs available already.

@erikyao
Copy link
Contributor

erikyao commented May 15, 2023

Hi @colleenXu, NOTE 2 problem fixed. Thanks for pointing it out.

@colleenXu
Copy link

Updated SmartAPI yaml and registration to support UMLS Disease IDs and Gene names (when this is the output of the operation). NCATS-Tangerine/translator-api-registry@03ca460

This was done after discussion with @andrewsu on Monday, when we discovered that we may be able to use UMLS Disease IDs to cover the gap in record retrieval in Note 3.


Note 4

While testing this, I discovered that SRI Node Normalizer doesn't have much support for some NCBIGene IDs (doesn't retrieve names or many equivalent IDs): for example these 4 genes are found in this resource but their names aren't retrieved by Node Norm prod. It could be because these genes aren't protein-coding or are only loci linked to the diseases (rather than a more-defined genomic entity)

This is why I added support for Gene names for Disease->Gene operations

Note 5

However, when I tried to add support for Gene names for Gene->Disease operations, all the co-occurrence URLs would appear on every Edge, when they're supposed to show up only on the edge they correspond to (a specific gene-disease pair).

This seems to be unrelated to the addition of the UMLS Disease operations. If I:

  • comment out their refs in /query (so I only have the gene-diseaseOrphanet and its counterpart operation)
  • add symbol to the parameter.fields of gene-diseaseOrphanet
  • add input_name: symbol to the diseaseOrphanet-object x-bte-response-mapping...
  • I end up with edges with all the co-occurrence urls like below
edge would look like this when doing Example 1

This should only have the first url, not the second (that's part of a diff record/gene-disease pair...)

                "7d4a84de4a253d7d998db33e4c9bd74f": {
                    "predicate": "biolink:gene_associated_with_condition",
                    "subject": "NCBIGene:100",
                    "object": "MONDO:0011338",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:xref",
                            "value": [
                                "https://raresource.nih.gov/literature/cooccurrence/ADA/0008198",
                                "https://raresource.nih.gov/literature/cooccurrence/ADA/0005748"
                            ]
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:rare-source",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-rare-source",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:rare-source"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-explorer",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-rare-source"
                            ]
                        }
                    ]
                },

@colleenXu
Copy link

colleenXu commented May 22, 2023

Closing because all BTE instances have access to this API now. Will open new issue if

  • there is work regarding Notes 1 (reverses), 4 (NCBIGene IDs), and 5 (bug)
  • there is support for GARD IDs, so we want to update the x-bte annotation here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source Data source pending to create a new API
Projects
None yet
Development

No branches or pull requests

3 participants