applying dbotu to denoised data #9

elsherbini · 2018-01-18T19:17:17Z

Hi all,

I'd like to apply distribution based clustering to data that has already been denoised with dada2 and wanted to ask your advice.

The idea is to cluster together sequences that came from the same organism, when the same organism has several copies of the amplified gene with slightly different sequences.

Do you think this would be as simple as running dbotu3 on the denoised table with an abundance cutoff of 0.0? And since the data is already error-corrected, would you suggest more conservative values for the genetic cutoff and distribution cutoff? Are there any caveats you could think of?

swo · 2018-01-22T20:47:08Z

This sounds like a great use for dbOTU!

If you expect the different sequence variants to be similar in abundance, then an abundance cutoff of 0.0 sounds right. If you knew that, say, each organism had a single dominant gene variant that was X-fold more abundant than every other variant, then you could use X as your abundance cutoff.

For your case, the best way to choose the genetic cutoff is using the typical variation between copies of the gene. If you know that variants within an organism has X% dissimilarity but variants from different organisms are Y% (> X%) dissimilar, then you can but a cutoff between X and Y.

It's not obvious to me that the distribution criterion should be modified for your application, but there is also no gold standard for determining that cutoff. Do some sensitivity tests!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

applying dbotu to denoised data #9

applying dbotu to denoised data #9

elsherbini commented Jan 18, 2018

swo commented Jan 22, 2018

applying dbotu to denoised data #9

applying dbotu to denoised data #9

Comments

elsherbini commented Jan 18, 2018

swo commented Jan 22, 2018