-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate determining coverage minimum and maximum coverage #15
Comments
Current Solution (working):
We do it this way, as we use the assumption that almost every k-mer that appears exactly once is an error.
|
We decided we are not going to implement a MAX kmer coverage auto as we don't have a good reason to. |
Since there is a plan to remove the dependency of Jellyfish we can no longer calculate a minimum k-mer value. Leaving this issue up in the mean time, but no more progress is being made on this. |
We can leave the Jellyfish dependency for now and merge the auto min kmer threshold system into bio_hansel with the caveat that in order to do the auto-kmer threshold, the user will need to run Jellyfish and be okay with the analyses taking longer and using more computational resources. Automatically determining the min coverage depth could be useful for other applications like setting min coverage for some de novo assemblers, setting min freq for kmers when running Mash with reads, as well as for setting the min k-mer threshold for bio_hansel. So the code you've written could be extracted into a separate package and implemented as a generic tool with a wrapper for Galaxy if there's a good use case for it, which I think there is. |
Instead of asking end user for these values, we should determine genome coverage based on the reads themselves.
The text was updated successfully, but these errors were encountered: