Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

civic bot not adding "instance of" statement #121

Open
andrewsu opened this issue Aug 9, 2019 · 1 comment
Open

civic bot not adding "instance of" statement #121

andrewsu opened this issue Aug 9, 2019 · 1 comment
Assignees

Comments

@andrewsu
Copy link
Member

andrewsu commented Aug 9, 2019

example: https://www.wikidata.org/wiki/Q61818930

I think this sparql query shows all items with civicIDs without instance of statements: https://w.wiki/6ww

I'm expecting that all items added/edited by the civic variant bot should be on items having instance of sequence variant (Q15304597) or one of its children

probably need to have this issue fixed before the paper goes out...

@andrawaag
Copy link

andrawaag commented Aug 9, 2019

This one is already on my plate and a tough one. There are basically, two issues here.

  1. Currently, not all of the sequence ontology is in Wikidata yet, simply because it is not available as CC0. This means that if a variant type by its representation in the sequence ontology, is not yet in Wikidata it needs to be added manually. My understanding of non-cc0 data is that one can not batch upload all, but adding a reference to a single SO is allowed.
    Being able to upload all of SO to Wikidata would solve this.

  2. Some variants don't have a specific variant type annotated in CIViC. The example given is of this type. (https://www.wikidata.org/wiki/Q61818930) Its CIViC record gives "Variant Type:
    None specified.". Currently, the bot ignores this. An easy fix here would be to add, as you suggest, "instance of sequence variant (Q15304597)".

  3. Some time ago we increased the threshold wrt to quality. When we started we added all CIViC records, currently, we only add CIVIC records with high-quality indication. This has resulted in items in Wikidata that only mention the CIViC ID, plus of which gene it is a variant.

Concluding.

  • I have a list of missing sequence ontology terms that is already on my TODO list. If we don't add all terms this will be a recurring process, every time curators introduce new sequence variants.
  • For those terms that miss a variant type in CIViC, we add the default, being "sequence variant". Which leads to the question of what to do once a more specific variant emerges. (Delete the default?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants