Support 2bit files #12

bovee · 2018-08-21T20:19:27Z

This should be relatively easy (except for autodetection). Basically, copy the FASTA parsing code and add an additional step to parse the "sequence" from a bitstring (that contains a mask too) into a Vec of nucleotides and add a impl<'a> From<TwoBit<'a>> for SeqRecord<'a>. Note we can't use our existing bitkmer code because 2bit decodes 0 to 3 as GACT instead of ACGT as we do.

~~Format details: http://jcomeau.freeshell.org/www/genome/2bitformat.html~~ (this seems to be a format someone just made up and doesn't match the output of faToTwoBit at all)

Format details: http://genome.ucsc.edu/FAQ/FAQformat.html#format7

The text was updated successfully, but these errors were encountered:

bovee · 2019-08-27T16:33:43Z

~~Autodetection can probably rely upon the first line(s) matching something like >...:\d+-\d+\r?\nP? We probably don't want this as an actual regex though.~~ There's an actual magic header sequence in the details from UCSC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 2bit files #12

Support 2bit files #12

bovee commented Aug 21, 2018 •

edited

Loading

bovee commented Aug 27, 2019 •

edited

Loading

Support 2bit files #12

Support 2bit files #12

Comments

bovee commented Aug 21, 2018 • edited Loading

bovee commented Aug 27, 2019 • edited Loading

bovee commented Aug 21, 2018 •

edited

Loading

bovee commented Aug 27, 2019 •

edited

Loading