Automate determining coverage minimum and maximum coverage #15

Takadonet · 2017-10-23T16:28:43Z

Instead of asking end user for these values, we should determine genome coverage based on the reads themselves.

mgopez · 2017-11-08T15:29:43Z

Current Solution (working):

Estimate the expected k-mer coverage depth through finding the maxima of the k-mer depth coverage values.
Calculate the error rate using

num_kmers_appear_once / num_unique_kmers

We do it this way, as we use the assumption that almost every k-mer that appears exactly once is an error.

Multiply error rate, and k-mer coverage depth to get the (slightly underestimated) expected k-mer coverage depth of errors.
Use a Poisson distribution with expected coverage value, and k-mer coverage depth of errors to pull out the minimum depth to be confident that the observation is not entirely caused by errors.

Which is min_kmer_coverage.

mgopez · 2017-11-08T15:30:24Z

We decided we are not going to implement a MAX kmer coverage auto as we don't have a good reason to.

mgopez · 2017-12-19T14:25:52Z

Since there is a plan to remove the dependency of Jellyfish we can no longer calculate a minimum k-mer value. Leaving this issue up in the mean time, but no more progress is being made on this.

peterk87 · 2017-12-19T15:12:06Z

We can leave the Jellyfish dependency for now and merge the auto min kmer threshold system into bio_hansel with the caveat that in order to do the auto-kmer threshold, the user will need to run Jellyfish and be okay with the analyses taking longer and using more computational resources.

Automatically determining the min coverage depth could be useful for other applications like setting min coverage for some de novo assemblers, setting min freq for kmers when running Mash with reads, as well as for setting the min k-mer threshold for bio_hansel.

So the code you've written could be extracted into a separate package and implemented as a generic tool with a wrapper for Galaxy if there's a good use case for it, which I think there is.

Takadonet assigned mgopez Oct 23, 2017

mgopez added in progress and removed in progress labels Oct 24, 2017

mgopez added the in progress label Nov 8, 2017

mgopez added waiting to merge and removed in progress labels Dec 5, 2017

mgopez added enhancement and removed waiting to merge labels Dec 19, 2017

mgopez mentioned this issue Dec 19, 2017

#15, #18: Automating K-mer Min Module #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate determining coverage minimum and maximum coverage #15

Automate determining coverage minimum and maximum coverage #15

Takadonet commented Oct 23, 2017

mgopez commented Nov 8, 2017

mgopez commented Nov 8, 2017

mgopez commented Dec 19, 2017

peterk87 commented Dec 19, 2017

Automate determining coverage minimum and maximum coverage #15

Automate determining coverage minimum and maximum coverage #15

Comments

Takadonet commented Oct 23, 2017

mgopez commented Nov 8, 2017

mgopez commented Nov 8, 2017

mgopez commented Dec 19, 2017

peterk87 commented Dec 19, 2017