-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCBI taxonomy as a taxonomic authority #5
Comments
To expand on this: To the best of my knowledge, right now, the scientificName field can only contain a Linnaean scientific name that matches on WoRMS, or an OTU identifier from BOLD or UNITE. However, many people working with DNA-derived occurrences obtain scientific names from NCBI taxonomy. Can we discuss why these names (or their associated taxonomy IDs) are not acceptable as scientificName values? Since the NCBI taxonomy seems to be a standard in this field of study, wouldn't we want to accommodate that? Right now, I'm handling this by going to progressively higher taxonomic ranks (genus --> family --> order, etc.) until I find a term that matches on WoRMS. But I'm not sure to what degree this maintains the integrity of the original data. Secondly, records may have associated NCBI taxonomy IDs and/or GenBank IDs. If these are not acceptable in the scientificName and associated columns, where could they be included? |
I am interested in hearing more about this. NCBI Taxonomy Browser shows Linnaean scientific names, so these should be available for use. Then again, these names should also be matching in WoRMS--so will be available, even if 'source' is not NCBI? Is the issue then that under scientificNameID, would like to use NCBI Taxonomy ID instead of AphiaID? Does GBIF allow other sources of scientificNameID, and only OBIS requires AphiaID? Example (hyperlinks on OBIS site) For scientificNameID then, would use urn:lsid:marinespecies.org:taxname:141580, or it could be NCBI:txid1421134 I imagine the challenge is if NCBI has names that are NOT available (or correct?) on WoRMS. In that case, it is a matter of updates between the two? Regarding the first example by @dianalg - there is discussion here on a verbatimScientificName tdwg/dwc#181 Secondly, seems similar--need to identify what are the fields to use for other ID codes. |
Suggestions that were recently made to me: |
Yes, OBIS only accepts a WoRMS LSID in |
Thanks @albenson-usgs. Am curious about usage, and so now looking at 2 random marine examples on GBIF. 1) Somniosus microcephalus and 2) Leptasterias polaris. I note that (apart from 1 record for dynatax.se LSID), only OBIS records with scientificNameID, with WoRMS LSID. Most records (not coming from OBIS) on GBIF do not use scientificNameID, but rather taxonID (which is not a LSID). Would it be acceptable to fill in taxonID with NCBI taxon code (or BOLD BIN--also used often), in addition to scientificNameID? Thus, have the WoRMS taxon name, but also information on the potential genetic identifier (BINs and NCBI do not always match 1:1 with WoRMS LSID :) |
Note, see my test queries of 2 marine species on GBIF here: |
Good question and actually I would extend it to say can we use NCBI taxon code instead of WoRMS LSID when there is no match in WoRMS? (I think that's Diana's question) This would be a question for the OBIS Steering Group. Further, it's recently come to my attention that OBIS may not be using |
If WORMS IDs are preferred over NCBI IDs, it would be very useful to have a look-up table linking these two standards. I worry that searching by a name, like a genus, in isolation might accidentally give you the WoRMS ID of an organism with the same name but that is totally different lineage from the NCBI sequence you matched. There are so many records we can't manually check them all and so we rely on more automated searches and tools. If NCBI IDs are acceptable instead of WoRMS IDs then that would be much easier for us to use across our datasets. |
Yes @claudenozeres, I have used As @claudenozeres said and @albenson-usgs clarified, my real issue is whether an NCBI name/code could be used if there is no matching name on WoRMS. So, for example, I have the name "phototrophic eukaryote" as the assigned taxon in many rows of an eDNA dataset. This has a matching name and taxonomy ID on NCBI, but is a non-Linnaean term that does not match on WoRMS. But OBIS requires a WoRMS-approved name in the Right now, my options are 1) work my way up the taxonomic tree until I get a rank for "phototrophic eukaryote" that matches on WoRMS or 2) remove the record before submitting the data to OBIS. Following strategy 1, I'm putting "Biota" in the That said, it seems like this kind of issue will be really common for genetically-derived data. And my work around (strategy 1 above) does run the risk that @kpitz is describing. |
Regarding look-up table mentioned by @kpitz , that would be an important tool, and would help with adoption of use with WoRMS, so I would push for further work between the two because there are conflicts and lack of attention that I can see. Going forward, if increasingly common and not easy/satisfactory, an alternative would be not to use OBIS+WoRMS, but to publish on GBIF with taxonomy of choice. I think the first one is valuable if it leads to stronger connections and updates between resources, namely WoRMS, NCBI, and BOLD (existing links but not very solid at the moment). Similar to how OBIS became vastly improved with names once they adopted WoRMS as their taxonomic backbone, instead of continuing on their own. |
I recommend Dhugal Lindsay et al. 2017 for an interesting summary of issues with occurrence datasets and genetically identified taxa. https://www.tandfonline.com/doi/full/10.1080/17451000.2016.1268261. They highlight several issues to be improved for sequence data on biodiversity portals. |
Thanks everyone for your valuable feedback. While we intend to keep WoRMS as our taxonomic backbone (as the NCBI disclaimer states: "the NCBI taxonomy database is not an authoritative source for nomenclature or classification"), I'm going to discuss with WoRMS and the taxonomy task team to see if we can come up with some recommendations for using NCBI and BOLD identifiers and correct use of Note that the WoRMS API has an endpoint to get an Aphia record by NCBI ID, for example: https://www.marinespecies.org/rest/AphiaRecordByExternalID/94237?type=ncbi. I'm not sure how complete this is. Somewhat related: gbif/doc-publishing-dna-derived-data#35 |
Thanks, @pieterprovoost, some recommendations around this issue would be great to start. Do you have any sense of when we might expect those? |
After discussing with WoRMS, we propose the following:
So for @dianalg's example I would propose this:
Hopefully this is a workable solution. Finally, please note that we are not outright rejecting records without WoRMS LSID, but they may get flagged as not being linked to the taxonomic backbone. It would be a shame if people decide not to publish to OBIS at all due to this requirement. Making data findable and accessible should be the priority, even if interoperability is not perfect. |
Is anyone going to bring this to TDWG for discussion with the broader community? I note that both of the issues Pieter links to are closed and are in GBIF only discussion areas. I think we would all benefit from wider community input on how to move forward with this. |
I agree with @albenson-usgs --need to inform/alert/hear from TDWG or broader community. We raised these matters for OBIS, but are applicable to others (and may not be aware). @pieterprovoost's summary with example is very useful. |
Could NCBI taxonomy IDs be useful as taxonomic links?
Where will these be added?
The text was updated successfully, but these errors were encountered: