Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

required vs highly recommended fields in the guidelines for metabarcoding data #1

Open
morien opened this issue Feb 22, 2021 · 4 comments
Assignees

Comments

@morien
Copy link

morien commented Feb 22, 2021

I work as a bioinformatician at the Hakai Institute and I'm helping to develop best practices for submission of our eDNA experiments to OBIS. I had a look at the metabarcoding guidelines in this repo and I have a suggestion.

Since our eDNA occurrence data is contextualized by the target gene, subfragment, and even forward and reverse primer pair in the same way that a trawl dataset would be contextualized by the holes in their net, shape of the net, depth of the trawl, etc., I expected the gene, subfragment, and f & r primer fields to be required for a submission, but they're only highly recommended.

As someone who might want to use OBIS as a source for occurrence data from sequencing experiments in the future, I'd like to make the suggestion that these fields be required for submission of metabarcoding data. Without that information, it is not possible to know whether absences are due to the target gene/fragment/primers being used, or reflecting the actual absence of a particular organism from the sampling environment.

@SSuominen1
Copy link
Contributor

I agree that detailed information on which gene fragment was sequenced is very important for possible future work with this data. However, by adding it as a required field, it would mean that we are potentially rejecting datasets, that could still bring value by recording presences of certain taxa.

For the best practices guide, I would definitely add these fields as necessary information!

Generally I think estimating absences will be difficult, and would require very careful consideration, and very complete metadata, including sequencing depth. Of course this would be the ideal dataset to have (and therefore the highly recommended fields).

@dschigel
Copy link

We had a similar discussion in GBIF with colleagues and prefer to stick to low-threshold principle of not increasing the number of required fields, as @SSuominen1 points out, to avoid rejections. Primer information is very important for quantitative dataset and absences in multi-species sampling, but only desired, not critical for e.g. in single species sequence data and presence-only data. In the guide these are under Highly recommended which is the highest importance category which does not block publishing.

@morien
Copy link
Author

morien commented Feb 25, 2021

Thanks for your replies. I understand the logic here and I'm also supportive of avoiding data rejections. The justification you provide makes sense for other types of incidental metadata (water sample depth, temperature, longitude/latitude, etc). However, I cannot imagine a situation in which someone who generated a dataset wouldn't know what gene target or primers they used. Unlike environmental measurements I mentioned above, gene target, etc. are necessary info to actually conduct the experiment itself. You cannot conduct a sequencing experiment without knowing your target gene and primers, so if someone withholds the information they would be doing so voluntarily, to the harm of others who wish to use the data by accessing it through OBIS. To be clear, I don't want to imply that I know better than you all, but I just don't understand how it is justified to leave these particular fields as optional.

@dschigel
Copy link

I agree @morien that knowing a target gene is essential for interpretation. As you can e.g. see case of fungal primers, the most widely uses region ITS2 can be sequenced using a high number of primer pairs for the same target region. What I meant that in this example for single single species data "ITS2" should be sufficient to enable reuse of sequence and DNA derived biodiversity data, but for the species mix data (as primers have amplification biases), knowing primers is indeed essential for reuse and interpretation. Hopefully "highly recommended" covers it well in an insisting, but not blocking manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants