-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated Ensembl IDs #23
Comments
No problem Andrea, thanks for pointing these out! |
Looks like some of the others are cases in which a gene has two different Entrez IDs but Ensembl calls it the same gene. |
I see. The latter is the same case of #19 Regarding Fibronectin-like cases, namely proteins with an ENSG*, I tried the get them all with the following query: SELECT distinct ?item ?itemLabel
WHERE
{
?item wdt:P594 ?ensg .
?item wdt:P31|wdt:P279 wd:Q8054 .
FILTER NOT EXISTS {?item wdt:P31|wdt:P279 wd:Q7187}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} Turned out there are 2 proteins (i.e., Fibronectin 1 and Myoglobin) with both an ENSG* and an ENSMUS* identifier as Ensembl Gene ID. The user you mentioned inserted those statements in 2013. The solution is removing those 4 statements. Regarding CRIP1, someone at If you agree, I'd proceed as suggested. |
Looks good to me, thanks |
You are welcome! I've just updated those 3 items. Strangely, this query: SELECT distinct ?item ?itemLabel
WHERE
{
?item wdt:P594 ?ensg .
?item wdt:P31|wdt:P279 wd:Q8054 .
FILTER NOT EXISTS {?item wdt:P31|wdt:P279 wd:Q7187}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} still returns old results. Probably something similar to #22 is going on. I'm going to close this issue and to report the problem with stale data in Phabricator. Thanks! 🤙 |
Hi guys! I am opening this issue to notify a potential problem that I found in data.
According to this query:
Try it!
There are some Ensembl IDs re used across items. It sounds pretty strange.
For example,
Q413766
is Fibronectin 1 protein, andQ14819473
is its encoding gene. Both items share?item wdt:P594 'ENSG00000115414'
. AFAIK, ENSG* should be reserved to genes.Is there something to check in data loading process?
PS: guys at SuLab, please don't hate me too much for my issues submissions 😃
The text was updated successfully, but these errors were encountered: