You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
The newest Sourmash 4.0 release is much faster (many operations moved from Python to Rust), and includes a new sourmash sketch command that allows for separately making DNA and protein sketches. This is super helpful as the parameters for protein sketches vs DNA sketches are different. Additionally, it now has native support for amino acid k-mer sizes! E.g. I'd like to do:
DNA nucleotide k-mer size=21
Protein amino acid k-mer size=10
Dayhoff amino acid k-mer size=17
Right now, this has to be done all in one command, and all alphabets get k-merized at each k-size, which doesn't really make sense. Dayhoff with nucleotide k=21 has far too low information content to be usable. While DNA has 4^21 options at ksize=21, since Dayhoff is an amino acid alphabet, the k-mer size is really 21/7, so 6^7 << 4^21 and doesn't have enough information to distinguish between cell types. It's basically random at that point.
Is your feature request related to a problem? Please describe
The newest Sourmash 4.0 release is much faster (many operations moved from Python to Rust), and includes a new
sourmash sketch
command that allows for separately making DNA and protein sketches. This is super helpful as the parameters for protein sketches vs DNA sketches are different. Additionally, it now has native support for amino acid k-mer sizes! E.g. I'd like to do:Right now, this has to be done all in one command, and all alphabets get k-merized at each k-size, which doesn't really make sense. Dayhoff with nucleotide k=21 has far too low information content to be usable. While DNA has 4^21 options at ksize=21, since Dayhoff is an amino acid alphabet, the k-mer size is really 21/7, so 6^7 << 4^21 and doesn't have enough information to distinguish between cell types. It's basically random at that point.
https://github.com/dib-lab/sourmash/blob/5e66db91e62353de2b79f23cd198ef6f5c5544d1/doc/sourmash-sketch.md
Describe the solution you'd like
Add Sourmash 4.0: https://anaconda.org/bioconda/sourmash
(released ~1 week ago)
Describe alternatives you've considered
Could stay with current Sourmash but this is the future!!
Additional context
NA
The text was updated successfully, but these errors were encountered: