Skip to content
This repository has been archived by the owner on Apr 9, 2022. It is now read-only.

applying dbotu to denoised data #9

Open
elsherbini opened this issue Jan 18, 2018 · 1 comment
Open

applying dbotu to denoised data #9

elsherbini opened this issue Jan 18, 2018 · 1 comment

Comments

@elsherbini
Copy link

Hi all,

I'd like to apply distribution based clustering to data that has already been denoised with dada2 and wanted to ask your advice.

The idea is to cluster together sequences that came from the same organism, when the same organism has several copies of the amplified gene with slightly different sequences.

Do you think this would be as simple as running dbotu3 on the denoised table with an abundance cutoff of 0.0? And since the data is already error-corrected, would you suggest more conservative values for the genetic cutoff and distribution cutoff? Are there any caveats you could think of?

@swo
Copy link
Owner

swo commented Jan 22, 2018

This sounds like a great use for dbOTU!

If you expect the different sequence variants to be similar in abundance, then an abundance cutoff of 0.0 sounds right. If you knew that, say, each organism had a single dominant gene variant that was X-fold more abundant than every other variant, then you could use X as your abundance cutoff.

For your case, the best way to choose the genetic cutoff is using the typical variation between copies of the gene. If you know that variants within an organism has X% dissimilarity but variants from different organisms are Y% (> X%) dissimilar, then you can but a cutoff between X and Y.

It's not obvious to me that the distribution criterion should be modified for your application, but there is also no gold standard for determining that cutoff. Do some sensitivity tests!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants