-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BioThings suppKG: parser, x-bte, adding to BTE #706
Comments
Thanks to @mnarayan1, we have a SmartAPI yaml https://github.com/NCATS-Tangerine/translator-api-registry/blob/master/suppkg/suppkg.yaml that covers supplement treatments for disease. We were able to use templated requestBody to generate a BioThings query structure that we haven't tried before: setting a field to multiple possible values using OR. I've registered the SmartAPI yaml https://smart-api.info/registry?q=b48c34df08d16311e3bca06b135b828d So it's now accessible through any BTE instance using the api-specific endpoints - but it's not used by the team-specific / ara-specific endpoints yet. |
Here's a TRAPI query for "zinc supplement" -> disease
Response: suppKG1.txt |
But....I still want to discuss the "UMLS:DC" IDs with @andrewsu (previous posts here and here), before moving forward. I'm using an "ulcerative colitis" -> supplement response as my reference: suppkg2.txt TRAPI query
AnalysisThe IDs may be real UMLS IDs, if you remove the "D".click to see table
The UMLS ID names may match suppKG's associationsThe edge for "1,200 mg" (UMLS:DC0016157) actually is about fish oils (UMLS:C0016157), and doesn't mention "1,200 mg"
The edge for "fibersol-2" (UMLS:DC0032594) actually is about polysaccharides (UMLS:C0032594), and not fibersol-2
fibersol-2 is a brand supplement with fiber and maltodextrin, derived from corn But the edge is actually about two different kinds of polysaccharides:
Other analysis: seems okay to use UMLS ID/name but other things are going onThe edge for "arerra" (UMLS:DC0349374) actually mentions cow milk (UMLS:C0349374). but it turns out "arerra" is an obscure name for the supplement
"arerra" is a synonym for fermented milk
suppKG name + real UMLS name both don't match the paper: entity-resolution issue?
The Edge for beesnest plant (UMLS:DC1141640) isn't about bee's nest-plant/wild carrot/Queen Anne's lace. It also isn't about the food carrots (Carrots - dietary; UMLS:C1141640). The paper is about Morinda officinalis aka Indian mulberry.
|
Note that "moving forward" steps would be:
|
It seems like the authors' intent is clear that "DC" IDs are meant to represent concepts for which they find no synonymous UMLS ID. @colleenXu, you've found many examples where it appears that there is a very tight connection between the "DC" ID and the corresponding UMLS ID. However, I don't think we have the time or expertise to be able to evaluate that linking exhaustively. Since the consequence of moving forward as-is is underlinking (rather than inclusion of false assertions, at least beyond the expected rate from a text-mined resource), I think we should go forward with that plan. So please proceed with the next steps you outlined in the preceding comment. Thanks! |
will allow use by team-specific endpoints and ara related to https://github.com/biothings/biothings_explorer/issues/706\#issuecomment-1692883074
After discussion with Andrew (8/29?), we agreed to go forward with the DC IDs. I followed my earlier post of "next steps to deployment":
|
I have another thought on the "DC" terms, but I don't know if @erikyao already investigated this... Based on Yao's url https://github.com/zhang-informatics/SemRep_DS/blob/main/docs/SemRep_full_fielded_output.txt:
So I wonder if we'd want these "DC" terms in different fields of the BioThings SuppKG API. Right now, they're in And I was wondering if we know more about the "DC" terms, which may help us decide if they are a different namespace (and if so, what the prefix and other namespace info would be).
|
After reviewing this again, I think we should move forward with the "quickest path" solution -- keeping the Also just noting for future reference that in the source file, there are 53707 IDs that start with |
Now being addressed by a different commit biothings/bte-server@58177d3. This is now deployed on dev/CI instances. See Jackson's post here |
Closing this issue since the changes have been deployed to Prod with the Feb 2024 release. I've confirmed that I can query BioThings suppKG through BTE prod |
Opening an issue here to better track the status of this effort.
Previous discussion in NCATS-Tangerine/translator-api-registry#122, with the currently-relevant comments starting NCATS-Tangerine/translator-api-registry#122 (comment) and biothings/pending.api#55 (comment)
Currently some concerns related to the data/parser...
The text was updated successfully, but these errors were encountered: