-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate why BTE doesn't retrieve variant-disease relations from clinvar #548
Comments
Looks like the SmartAPI annotation for myvariant.info clinvar uses the omim id to identify the disease. If you look at the myvariant info link you sent, the diseases lack an omim id but do have other id types (ie. mondo) which are not utilized by the smartapi annotation. {
"accession":"RCV000442563",
"clinical_significance":"Likely pathogenic",
"conditions":{
"identifiers":{
"human_phenotype_ontology":"HP:0007474",
"medgen":"C0025202",
"mesh":"D008545",
"mondo":"MONDO:0005105"
},
"name":"Melanoma"
},
...
} Meanwhile, if you test other clinvar relations they seem to work on bte (ie. the example DBSNP:rs1193171808 -> OMIM:615592 given on the smartapi annotation) |
Thanks @rjawesome for this careful diagnosis. Makes sense! So in addition to the OMIM mapping in the SmartAPI annotation in the |
It seems mesh and mondo are not indexed by myvariant so I don't know if that is queryable. Right now I have made a pull request to add HPO. |
This can be partially addressed by adding more x-bte annotation (+ indexing fields if needed). However, this kind of "multiple prefixes/namespaces" issue is related to the #656 |
Notes on the current situationAdded and deployed orphanet / hp operations . All operations passed manual testing, including However, this didn't address the original issue, because MyVariant's clinvar rcv entries for DBSNP:rs121913377 seem to use an HPO ID for melanoma that is wrong or outdated: more examples of strange IDs (HPO, Orphanet, MedGen)
I noticed multiple kinds of clinvar rcv disease IDs that seemed to be wrong / outdated, but I haven't checked clinvar or OLS to see if the IDs are also wrong there (vs something going on in MyVariant parsing?). HPO:
Orphanet:
MedGen:
Another issue is that this set of operations (omim, orphanet, hp) only covers 48% of the dataset (1038239 / 2162597) Possible next steps
|
The mondo/mesh namespaces have now been indexed biothings/myvariant.info#175 (comment) and I added x-bte operations to cover them NCATS-Tangerine/translator-api-registry@d4228a7 Now:
original query and current response
send a POST request to the api-specific endpoint, MyVariant only. Like http://localhost:3000/v1/smartapi/09c8782d9f4027712e65b95424adba79/query.
Response will have this edge from clinvar connecting BRAF V600E to melanoma (MONDO:0005105).
Last thing to do before closing this issue is to investigate the odd IDs (from the previous post) |
On MyVariant clinvar data's melanoma identifiers: I think the identifier set is the same between records (variants)
I was concerned with the HP ID What I found
All the other "odd" HP IDs I saw in MyVariant's clinvar data are also alternative IDs
MyVariant is using:
I wonder if MyVariant can map these alternative IDs to their proper/main IDs, and use the proper/main IDs instead... |
Regarding the "odd" orphanet IDs I saw in MyVariant's clinvar data...they probably come from the original clinvar data. But I wonder if it's possible to keep > 1 ID for a namespace, in cases where the clinvar data may provide multiple (see Example 1 where clinvar probably provides 2 IDs and one is correct). Examples
Example 1: MyVariant is using
Example 2: MyVariant is using
|
Finally, on the "odd" medgen IDs I saw in MyVariant's clinvar data...they probably come from the original clinvar data. BTE isn't using medgen namespaces because Translator doesn't seem to support it yet (biolink-model, node norm). But I wonder:
Example 1: medgen CN517202 for "not provided"
MyVariant is using medgen
Example 2: MyVariant is using medgen C0005283 for beta Thalassemia (BTHAL)
Found the same situation as above with the melanoma medgen ID. The MyVariant record for rs1847557333 and beta Thalassemia (BTHAL) match this RCV record - which is using the same MedGen ID. So maybe the original clinvar data file is also using this ID. But it's still confusing because that MedGen ID's page says C0005283 is the UMLS concept ID, vs the MedGen UID: 2611 |
So to summarize my ideas after the MyVariant clinvar disease ID analyses I did (above posts)... I suspect the "odd" IDs are coming from the original clinvar ingest data. But I wonder if there are parser changes that could help:
|
Closing this issue because the original problem has been addressed with mondo/mesh namespace coverage. As for the "odd MyVariant clinvar disease IDs" (summary in previous post):
|
Clinvar contains relationships between genetic variants and diseases (e.g., BRAF V600E -> melanoma), and that relationship appears to be captured in myvariant.info (e.g., http://myvariant.info/v1/variant/rs121913377). But I can't get this relationship via BTE when querying using any of these identifiers:
Note the DBSNP query gets results based on CIViC and Disgenet, but not clinvar. It appears that the clinvar fields are captured in the myvariant.info smartAPI annotation, but I can't quite figure out why those results aren't being captured.
TRAPI Query template
The text was updated successfully, but these errors were encountered: