Releases: divonlan/genozip
15.0.68
- Deep: reduction in memory in --test and genounzip of Deep files: typically 10-20% less RAM consumption
- Deep: new option: --deep=no-qual to Deep seq, qname only (not qual): consumes drastically less RAM, and generates a file of size in between compressing the FASTQ and BAM alone, and full Deep.
- License: clarify the meaning of "Recognized Academic Research Institution"
- BAM: further reduction in RAM consumption when compressing and uncompressing files with many secondary or supplementary alignments.
- New diagnostic: --show-huffman
15.0.67
- Improvements in Deep.
15.0.66
- BAM: better compression of PacBio and Nanopore files generated with minimap2, pbmm2, winnowmap
- BAM: further small reduction in RAM consumption when compressing and uncompressing SAM/BAM/CRAM files with many Supplementary and Secondary alignments
- Bug fixes
15.0.65
- BAM: significantly reduced memory consumption when compressing and uncompressing SAM/BAM/CRAM files in which the majority of alignments are secondary alignments.
- BAM: better compression for files genereated by CPU (https://github.com/cheehongsg/CPU)
- New diagnostic options: --show-sec-gencomp, --show-reading-list, --force-reread, --show-scan
- Bug fixes
15.0.64
- Bug fix: segmentation fault in Academic version (introduced 15.0.63)
15.0.63
- FASTQ: much faster compression of most MGI, most Element, and some Illumina FASTQs due to better scaling of CPU cores on machines with > 40 cores
- New option: --not-paired: used in combination of --deep to inform Genozip that the two FASTQs files provided are not paired-end.
- Bug fix: correct handling of BGZF-compressed files with a BGZF End-of-File block in their midst (instead of at their end): Until version 15.0.46 the file was compressed up the BGZF EOF block, and the rest of the file was lost. Between 15.0.48 to 15.0.62 Genozip errored on this situation. This edge case was discovered during development and has not been encountered so far in any real-world files.
15.0.62
- I/O optimizations for faster compression
- Bug fixes
- New diagnostic options: --show-gz-uncomp, --generate-gzil
- Removed bash autocomplete for genozip as it didn't work very well. If this was installed, it can be removed by manually editing ~/.bash_completion
15.0.61
- --optimize can now take an optional argument for fine-grained control of which fields get optimized: --optimize=QUAL,rx:f (optimize if possible, but only these fields) or --optimize=^QUAL,rx:f (optimize all fields possible, except for these fields)
- VCF: better compression of files generated by freebayes ; better compression of Type=Float annotations
- Bug fixes
15.0.60
- Major revamp of the --optimize option:
Uncompression verification for files compressed with data-modifying options --optimize, --add-line-numbers and --head: Previously, if genozip modified the original data due to these options, the correctness of the uncompression was not verified in genounzip and using --test in genozip was not possible. Now, genounzip as well as genozip --test do verify that the file is reconstructed correctly, i.e. that it is identical to the data as it was after the modifications. Note that this still does not test for any errors in the modification process itself.
New optimization for SAM/BAM/CRAM included in --optimize: all floating point and floating point array tags (e.g. rq:f, ZP:B:f) are rounded to 10 significant bits for BAM/CRAM (i.e. an accuracy of approximately 3 siginficant decimal digits) and 3 significant digits for SAM.
New optimization for SAM/BAM/CRAM included in --optimize: tags containing base quality scores (for example, of barcodes) are binned in the same way that QUAL is binned. The tags supported are the standard tags QT:Z CY:Z BZ:Z and the 10xGenomics/STARsolo tags UY:Z sQ:Z 2Y:Z GY:Z fq:Z QX:Z. Note that OQ:Z is not binned.
New optimization for VCF included in --optimize: the QUAL field and all floating-point INFO and FORMAT annotations are rounded to three significant digits
New optimization for VCF included in --optimize: GQ, SPL - Phred values are capped at 60
quality scores in SAM/BAM/CRAM/FASTQ are not binned if they are already binned
Command line options optimize-* --GP-to-PP --GL-to-PL are canceled - --optimize includes all the optimizations - SAM/BAM/CRAM: Better compression of files generated by 10xGenomics CellRanger-DNA
- discontinued little-used optimization options for GFF/GVF/GTF
- discontinued little-used option --match-chrom-to-reference
- genocat --validate: change in arguments of --validate. Details: genozip.com/genocat
- bug fixes
15.0.59
- CRAM is now a "first-class citizen" in Genozip - all functionality availabe for BAM is also availble for CRAM.
- SAM/BAM/CRAM: new option: genocat --cram: output a file in CRAM format
- Deep: better detection of more use cases