You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 9, 2022. It is now read-only.
I'd like to apply distribution based clustering to data that has already been denoised with dada2 and wanted to ask your advice.
The idea is to cluster together sequences that came from the same organism, when the same organism has several copies of the amplified gene with slightly different sequences.
Do you think this would be as simple as running dbotu3 on the denoised table with an abundance cutoff of 0.0? And since the data is already error-corrected, would you suggest more conservative values for the genetic cutoff and distribution cutoff? Are there any caveats you could think of?
The text was updated successfully, but these errors were encountered:
If you expect the different sequence variants to be similar in abundance, then an abundance cutoff of 0.0 sounds right. If you knew that, say, each organism had a single dominant gene variant that was X-fold more abundant than every other variant, then you could use X as your abundance cutoff.
For your case, the best way to choose the genetic cutoff is using the typical variation between copies of the gene. If you know that variants within an organism has X% dissimilarity but variants from different organisms are Y% (> X%) dissimilar, then you can but a cutoff between X and Y.
It's not obvious to me that the distribution criterion should be modified for your application, but there is also no gold standard for determining that cutoff. Do some sensitivity tests!
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi all,
I'd like to apply distribution based clustering to data that has already been denoised with dada2 and wanted to ask your advice.
The idea is to cluster together sequences that came from the same organism, when the same organism has several copies of the amplified gene with slightly different sequences.
Do you think this would be as simple as running dbotu3 on the denoised table with an abundance cutoff of 0.0? And since the data is already error-corrected, would you suggest more conservative values for the genetic cutoff and distribution cutoff? Are there any caveats you could think of?
The text was updated successfully, but these errors were encountered: