Skip to content

Commit

Permalink
15.0.61
Browse files Browse the repository at this point in the history
  • Loading branch information
Divon Lan committed Jun 22, 2024
1 parent 0307575 commit 10b9229
Show file tree
Hide file tree
Showing 86 changed files with 1,110 additions and 574 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
*.h eol=lf
*.asm eol=lf
*.S eol=lf
Makefile eol=lf

# Declare our textual formats as binary so we can check in test files with Unix or Windows style end-of-lines
*.vcf binary
Expand Down
4 changes: 3 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"files.eol": "\n",

"files.associations": {
"*.vcf": "plaintext",
"*.eb": "python",
"genozip.h": "c",
"strings.h": "c",
Expand All @@ -38,7 +39,8 @@
"utility": "c",
"endianness.h": "c",
"version.h": "c",
"libgen.h": "c"
"libgen.h": "c",
"compare": "c"
},
"cmake.sourceDirectory": "C:/Users/divon/genozip/src/onion",
"cmake.configureOnOpen": false
Expand Down
5 changes: 3 additions & 2 deletions LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ License agreement:

a. Academic License: Using Genozip Executables for academic research, educational or training
purposes provided that You are a recognized academic research institution which is not a hospital,
or a registered student at such an institution, but excluding use with Your Commercial Data.
or a registered student at such an institution, but excluding use with Your Commercial Data, and
limited to a total of 10,000 files per institution.

b. Academic License: Using Genozip Executables for another non-commercial purpose, if it has
been pre-approved by Licensor in writing. Email [email protected] to seek such an approval.
Expand Down Expand Up @@ -158,5 +159,5 @@ ABOVE STATED REMEDY FAILS OF ITS ESSENTIAL PURPOSE.

END OF TERMS AND CONDITIONS

Genozip license version: 15.0.60
Genozip license version: 15.0.61

5 changes: 5 additions & 0 deletions RELEASE_NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ Note on versioning:
- Minor version changes with bug fixes and minor feature updates
- Some minor versions are skipped due to failed deployment pipelines

15.0.61 22/6/2024
- --optimize can now take an optional argument for fine-grained control of which fields get optimized: --optimize=QUAL,rx:f (optimize if possible, but only these fields) or --optimize=^QUAL,rx:f (optimize all fields possible, except for these fields)
- VCF: better compression of files generated by freebayes ; better compression of Type=Float annotations
- Bug fixes

15.0.60 15/6/2024
- Major revamp of the --optimize option:
> Uncompression verification for files compressed with data-modifying options --optimize, --add-line-numbers and --head: Previously, if genozip modified the original data due to these options, the correctness of the uncompression was not verified in genounzip and using --test in genozip was not possible. Now, genounzip as well as genozip --test do verify that the file is reconstructed correctly, i.e. that it is identical to the data as it was after the modifications. Note that this still does not test for any errors in the modification process itself.
Expand Down
4 changes: 2 additions & 2 deletions installers/LICENSE.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"Genozip Executables" shall mean the executable files genozip, genounzip, genocat and genols (with or without an .exe file name suffix).<br><br>
Other words and terms in this License shall be interpreted as their usual meaning in the context of a software product.<br><br>
2. Grant of copyright license. Licensor hereby grants to You a limited non-exclusive, non-transferrable, non-sublicensable, revokable copyright license to use Genozip on Your Computers, if you meet the conditions attached to any of the License Types a through f below, for the limited purpose attached to that particular License Type, and subject to the terms and conditions of this License agreement:<br><br>
a. Academic License: Using Genozip Executables for academic research, educational or training purposes provided that You are a recognized academic research institution which is not a hospital, or a registered student at such an institution, but excluding use with Your Commercial Data.<br><br>
a. Academic License: Using Genozip Executables for academic research, educational or training purposes provided that You are a recognized academic research institution which is not a hospital, or a registered student at such an institution, but excluding use with Your Commercial Data, and limited to a total of 10,000 files per institution.<br><br>
b. Academic License: Using Genozip Executables for another non-commercial purpose, if it has been pre-approved by Licensor in writing. Email [email protected] to seek such an approval.<br><br>
c. Standard, Enterprise or Premium License: Using Genozip Executables for any legal purpose, if the license was purchased and paid for, and for the duration that it is in effect. In addition, for Premium License only: Distributing Genozip Executables to others.<br><br>
d. Decompression License: Using a subset of Genozip Executables consisting of genounzip, genocat, genols for any legal purpose. A Decompression License is free of charge.<br><br>
Expand All @@ -34,4 +34,4 @@
10. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides Genozip on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Genozip and assume any risks associated with Your exercise of permissions under this License.<br><br>
11. LIMITATION OF LIABILITY. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, STRICT LIABILITY OR OTHER LEGAL OR EQUITABLE THEORY, SHALL LICENSOR OR DEVELOPER BE LIABLE FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER ARISING AS A RESULT OF THIS LICENSE OR OUT OF THE USE OR INABILITY TO USE GENOZIP (INCLUDING BUT NOT LIMITED TO DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, FILE CORRUPTION, DATA LOSS, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES), EVEN IF LICENSOR OR DEVELOPER HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IN NO EVENT WILL LICENSOR'S OR DEVELOPER'S TOTAL LIABILITY TO LICENSEE FOR ALL DAMAGES (OTHER THAN AS MAY BE REQUIRED BY APPLICABLE LAW IN CASES INVOLVING PERSONAL INJURY) EXCEED THE AMOUNT OF $500 USD. THE FOREGOING LIMITATIONS WILL APPLY EVEN IF THE ABOVE STATED REMEDY FAILS OF ITS ESSENTIAL PURPOSE.<br><br>
END OF TERMS AND CONDITIONS<br><br>
Genozip license version: 15.0.60<br><br>
Genozip license version: 15.0.61<br><br>
Binary file modified installers/genozip-installer.exe
Binary file not shown.
Binary file modified installers/genozip-linux-x86_64.tar
Binary file not shown.
Binary file modified installers/genozip-osx-arm.tar
Binary file not shown.
Binary file modified installers/genozip-osx-x86.tar
Binary file not shown.
8 changes: 4 additions & 4 deletions src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,10 @@ MY_SRCS = genozip.c genols.c context.c container.c strings.c stats.c arch.c tip.
zip.c piz.c reconstruct.c recon_history.c recon_peek.c seg.c zfile.c aligner.c flags.c specials.c \
reference.c contigs.c ref_lock.c refhash.c ref_make.c ref_contigs.c ref_iupacs.c ref_cache.c digest.c \
vcf_piz.c vcf_seg.c vcf_vblock.c vcf_header.c vcf_info.c vcf_samples.c vcf_hgvs.c vcf_modify.c \
vcf_format_GT.c vcf_format_PS_PID.c vcf_dbsnp.c vcf_giab.c vcf_vep.c vcf_qual.c vcf_1000G.c \
vcf_format_GT.c vcf_format_PS_PID.c vcf_dbsnp.c vcf_giab.c vcf_vep.c vcf_qual.c vcf_1000G.c vcf_me.c \
vcf_refalt.c vcf_format.c vcf_illum_gtyping.c vcf_gwas.c vcf_vagrent.c vcf_svaba.c vcf_pbsv.c \
vcf_icgc.c vcf_snpeff.c vcf_cosmic.c vcf_mastermind.c vcf_isaac.c vcf_manta.c vcf_pos.c vcf_ultima.c \
vcf_platypus.c vcf_info_AC_AF_AN.c vcf_format_GQ.c vcf_gatk.c vcf_sv.c vcf_gnomad.c vcf_me.c \
vcf_platypus.c vcf_info_AC_AF_AN.c vcf_format_GQ.c vcf_gatk.c vcf_sv.c vcf_gnomad.c vcf_freebayes.c \
sam_seg.c sam_piz.c sam_shared.c sam_header.c sam_md.c sam_nm.c sam_tlen.c sam_cigar.c sam_fields.c \
sam_sa.c bam_seg.c bam_seq.c bam_show.c sam_pacbio.c sam_ultima.c sam_xcons.c cram.c agilent.c \
sam_seq.c sam_qual.c sam_sag_zip.c sam_sag_piz.c sam_sag_load.c sam_sag_ingest.c sam_sag_scan.c \
Expand Down Expand Up @@ -515,8 +515,8 @@ SH_VERIFY_ALL_COMMITTED = \
if (( `cd secure; git status -s | wc -l` != 0 )); then echo "ERROR: there are some UNCOMMITTED changes:" ; echo ; git status -s ; exit 1; fi

SH_VERIFY_ALL_STAGED = \
if (( ` git status -s | cut -c2 | grep -v " " | wc -l` != 0 )); then echo "ERROR: there are some UNSTAGED changes:" ; echo ; git status -s ; exit 1; fi; \
if (( `cd secure; git status -s | cut -c2 | grep -v " " | wc -l` != 0 )); then echo "ERROR: there are some UNSTAGED changes:" ; echo ; git status -s ; exit 1; fi
if (( ` git diff | wc -l` != 0 )); then echo "ERROR: there are some UNSTAGED changes:" ; echo ; git status -s ; exit 1; fi; \
if (( `cd secure; git diff | wc -l` != 0 )); then echo "ERROR: there are some UNSTAGED changes in 'secure':" ; echo ; ( cd secure ; git status -s ) ; exit 1; fi

test:
@cat test.sh | tr -d "\r" | bash -
Expand Down
29 changes: 20 additions & 9 deletions src/arch.c
Original file line number Diff line number Diff line change
Expand Up @@ -255,27 +255,27 @@ double arch_get_physical_mem_size (void)
return mem_size;
}

StrText arch_get_filesystem_type (void)
StrText arch_get_filesystem_type (FileP file)
{
StrText s = { "unknown" };
int save_errno = errno; // save errno, as this function is often used in ASSERT.

if (txt_file && txt_file->is_remote) {
if (file && file->is_remote) {
strcpy (s.s, "remote");
goto done;
}

if (txt_file && txt_file->redirected && !txt_file->name) {
if (file && file->redirected && !file->name) {
strcpy (s.s, "pipe");
goto done;
}

if (!txt_file || !txt_file->file || !txt_file->name)
if (!file || !file->file || !file->name)
goto done;

#ifdef __linux__
struct statfs fs;
if (statfs (txt_name, &fs)) goto done;
if (statfs (file->name, &fs)) goto done;

rom name = NULL;
#define NAME(magic, name_s) case magic: name = name_s; break
Expand Down Expand Up @@ -307,13 +307,13 @@ StrText arch_get_filesystem_type (void)

#elif defined __APPLE__
struct statfs fs;
if (statfs (txt_name, &fs)) goto done;
if (statfs (file->name, &fs)) goto done;

memcpy (s.s, fs.f_fstypename, MIN_(sizeof(fs.f_fstypename), sizeof(s)-1));

#elif defined _WIN32
WCHAR ws[100];
if (!GetVolumeInformationByHandleW ((HANDLE)_get_osfhandle(fileno (txt_file->file)), 0, 0, 0, 0, 0, ws, ARRAY_LEN(ws))) goto done;
if (!GetVolumeInformationByHandleW ((HANDLE)_get_osfhandle(fileno (file->file)), 0, 0, 0, 0, 0, ws, ARRAY_LEN(ws))) goto done;

if (wcstombs (s.s, ws, sizeof(s.s)-1) == (size_t)-1)
strcpy (s.s, "failed-wcstombs"); // can happen if locale is set to non-english
Expand All @@ -324,6 +324,16 @@ StrText arch_get_filesystem_type (void)
return s;
}

StrText arch_get_txt_filesystem (void)
{
return arch_get_filesystem_type (txt_file);
}

StrText arch_get_z_filesystem (void)
{
return arch_get_filesystem_type (z_file);
}

// returns value in bytes
uint64_t arch_get_max_resident_set (void)
{
Expand Down Expand Up @@ -436,8 +446,9 @@ StrTextSuperLong arch_get_genozip_executable (void)
memmove (loc + 7, loc + bn_len, strlen (loc+bn_len) + 1/*\0*/);
memcpy (loc, "genozip", 7);
}

else

// note: do nothing is is_genounzip - this is likely genozip --decompress
else if (!is_genounzip)
ABORT ("Cannot find substring %s in %s", bn, fn.s);
}

Expand Down
4 changes: 3 additions & 1 deletion src/arch.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
extern void arch_initialize (rom argv0);
extern unsigned arch_get_num_cores (void);
extern double arch_get_physical_mem_size (void);
extern StrText arch_get_filesystem_type (void);
extern StrText arch_get_filesystem_type (FileP file);
extern StrText arch_get_txt_filesystem (void);
extern StrText arch_get_z_filesystem (void);

extern rom arch_get_endianity (void);
extern void arch_set_locale (void);
Expand Down
38 changes: 25 additions & 13 deletions src/bgzf.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ typedef struct BgzfFooter {
} BgzfFooter;

static FlagsBgzf bgzf_recompression_levels[1+MAX_FLAG_BGZF] = {
{ .library = BGZF_LIBDEFLATE19, .level = 0 }, // --bgzf=0 : BGZF blocks with no compression
{ .library = BGZF_IGZIP, .level = 1 }, // --bgzf=1 : note: this is IGZIP LVL0
{ .library = BGZF_IGZIP, .level = 2 }, // --bgzf=2 : note: this is IGZIP LVL1
{ .library = BGZF_LIBDEFLATE19, .level = 1 }, // --bgzf=3
{ .library = BGZF_LIBDEFLATE19, .level = 7 }, // --bgzf=4
{ .library = BGZF_LIBDEFLATE19, .level = 9 }, // --bgzf=5
{ .library = BGZF_LIBDEFLATE19, .level = 0, .has_eof_block = true }, // --bgzf=0 : BGZF blocks with no compression
{ .library = BGZF_IGZIP, .level = 1, .has_eof_block = true }, // --bgzf=1 : note: this is IGZIP LVL0
{ .library = BGZF_IGZIP, .level = 2, .has_eof_block = true }, // --bgzf=2 : note: this is IGZIP LVL1
{ .library = BGZF_LIBDEFLATE19, .level = 1, .has_eof_block = true }, // --bgzf=3
{ .library = BGZF_LIBDEFLATE19, .level = 7, .has_eof_block = true }, // --bgzf=4
{ .library = BGZF_LIBDEFLATE19, .level = 9, .has_eof_block = true }, // --bgzf=5
};

#define bgzf_no_recompression (FlagsBgzf){ .library = BGZF_NO_LIBRARY, .level = BGZF_NO_BGZF }
#define bgzf_no_recompression (FlagsBgzf){ .library = BGZF_NO_LIBRARY, .level = BGZF_NO_BGZF, .has_eof_block = false }

// possible return values, see libdeflate_result in libdeflate.h
static rom libdeflate_error (int err)
Expand Down Expand Up @@ -98,10 +98,22 @@ void bgzf_initialize_discovery (FileP file)
ASSERTNOTINUSE (file->bgzf_plausible_levels);

if (file->codec == CODEC_GZ) {
if (flag.show_gz || flag.show_bgzf) {
iprintf ("%s: is GZIP but not BGZF\n", file->name); fflush (info_stream);
if (flag.show_gz) exit_ok;
if (flag.show_gz) {
// attempt to detect GZ blocks (up to 64MB)
segconf.vb_size = 65 MB;
txt_file = file;
z_file = CALLOC (sizeof (File));
txtfile_read_vblock (evb);
iprintf ("%s: is %s but not BGZF\n", txt_file->name, src_codec_name (txt_file->source_codec, flag.zip_comp_i).s); fflush (info_stream);
FREE (z_file);
exit_ok;
}

else if (flag.show_bgzf) {
iprintf ("%s: is GZIP but not BGZF\n", file->name);
fflush (info_stream);
}

else return;
}

Expand Down Expand Up @@ -294,7 +306,7 @@ static int32_t bgzf_read_block_raw (FILE *file, // txt_file is not yet assigned
feof (file) ? "Unexpected end of file while reading" : "Failed to read body",
basename, ftello64 (file),
(is_remote && save_errno == ESPIPE) ? "Disconnected from remote host" : strerror (save_errno),
bytes, body_size, arch_get_filesystem_type().s,
bytes, body_size, arch_get_txt_filesystem().s,
feof (file) ? "If file is expected to be truncated, you may use --truncate-partial-last-line to disregard the final partial BGZF block." : "");

return (bytes == body_size) ? BGZF_BLOCK_SUCCESS : BGZF_BLOCK_TRUNCATED;
Expand All @@ -316,7 +328,7 @@ int32_t bgzf_read_block (FileP file, // txt_file is not yet assigned when called
flag.truncate || // possibly compressing while downloading
file->disk_so_far == file->disk_size, // entire file was read
"Abrupt EOF in BGZF file %s: disk_so_far=%s disk_size=%s filesystem=%s",
file->name, str_int_commas (file->disk_so_far).s, str_int_commas (file->disk_size).s, arch_get_filesystem_type().s);
file->name, str_int_commas (file->disk_so_far).s, str_int_commas (file->disk_size).s, arch_get_filesystem_type (file).s);

return 0; // no EOF block, that's fine
}
Expand Down Expand Up @@ -517,7 +529,7 @@ void bgzf_reread_uncompress_vb_as_prescribed (VBlockP vb, FILE *file)
STRl (bgzf_block, BGZF_MAX_BLOCK_SIZE);
int32_t ret = bgzf_read_block_raw (file, (uint8_t*)qSTRa(bgzf_block), txt_file->basename, false, HARD_FAIL, NULL);
ASSERT (ret != BGZF_ABRUBT_EOF, "Unexpected BGZF_ABRUBT_EOF while re-reading BGZF block in %s: filesystem=%s offset=%"PRIu64" uncomp_block_size=%u",
txt_name, arch_get_filesystem_type().s, offset, isize);
txt_name, arch_get_txt_filesystem().s, offset, isize);

bgzf_uncompress_one_prescribed_block (vb, STRa(bgzf_block), uncomp_block, isize, line->offset.bb_i);

Expand Down
26 changes: 13 additions & 13 deletions src/biopsy.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,7 @@
// under penalties specified in the license.

#include <errno.h>
#include "genozip.h"
#include "buffer.h"
#include "strings.h"
#include "vblock.h"
#include "seg.h"
#include "file.h"
#include "biopsy.h"

Expand Down Expand Up @@ -78,27 +75,30 @@ void biopsy_take (VBlockP vb)
{
if (!flag.biopsy || !Ltxt) return;

ARRAY (int32_t, vb_i, biopsy_vb_i);

for (int i=0; i < vb_i_len; i++)
if (vb_i[i] == vb->vblock_i) {
memmove (&vb_i[i], &vb_i[i+1], sizeof(uint32_t) * (vb_i_len-(i+1))); // remove from list
biopsy_vb_i.count--;

for_buf2 (int32_t, vb_i, i, biopsy_vb_i)
if (*vb_i == vb->vblock_i) {
buf_remove (biopsy_vb_i, int32_t, i, 1);
goto start_biopsy;
}

else if (-vb_i[i]-1 == vb->comp_i)
else if (-vb_i[i]-1 == vb->comp_i) // vb_i is actually a comp_i
goto start_biopsy;


// always output the txt header except if --no-header
// case: this is txt_header but it is not explicitly on the list: biopsy it anyway, except if --no-header
if (vb->vblock_i == 0 && !flag.no_header)
goto start_biopsy;

return; // we were not requested to take a biopsy from this vb

start_biopsy: {
// modify, if needed, before taking biopsy
if (segconf.zip_txt_modified && DTP(zip_modify) && vb->vblock_i != 0 &&
!flag.make_reference && Ltxt) {
ctx_clone (vb);
zip_modify (vb);
}

DO_ONCE {
SNPRINTF (biopsy_fn, "%s", file_plain_ext_by_dt (txt_file->data_type));
file_remove (biopsy_fn.s, true); // remove old file
Expand Down
2 changes: 1 addition & 1 deletion src/bits.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ static uint64_t __inline windows_popcount (uint64_t w)
w = (uint64_t)(w * ((uint64_t)~(uint64_t)0/255)) >> (sizeof(uint64_t) - 1) * 8;
}

#define POPCOUNT(x) windows_popcountl(x)
#define POPCOUNT(x) windows_popcount(x)
#else
#define POPCOUNT(x) (unsigned)__builtin_popcountll(x)
#endif
Expand Down
2 changes: 1 addition & 1 deletion src/buf_list.c
Original file line number Diff line number Diff line change
Expand Up @@ -729,7 +729,7 @@ void buflist_test_overflows_all_other_vb (VBlockP caller_vb, rom msg, bool force
fprintf (stderr, "\nTesting all other VBs (WARNING: NOT thread safe - might segfault; activated by certain flags (see code)):\n");
bool corruption_detected = false;
for (VBlockPoolType type=POOL_MAIN; type <= POOL_BGZF; type++) {
VBlockPool *vb_pool = vb_get_pool (type, SOFT_FAIL);
VBlockPoolP vb_pool = vb_get_pool (type, SOFT_FAIL);
if (!vb_pool) continue;

for (VBID vb_id=-1; vb_id < (int)vb_pool->num_vbs; vb_id++) {
Expand Down
2 changes: 1 addition & 1 deletion src/codec.c
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ Codec codec_assign_best_codec (VBlockP vb,
else {
LocalGetLineCB *callback = (ST(LOCAL) && !data_override && !ctx->no_callback) ? zip_get_local_data_callback (vb->data_type, ctx) : NULL;

zfile_compress_section_data_ex (vb, ctx, SEC_RANDOM_ACCESS/*a secion with SectionType header*/, callback ? NULL : data, callback, data->len, *selected_codec, SECTION_FLAGS_NONE,
zfile_compress_section_data_ex (vb, ctx, SEC_RANDOM_ACCESS/*a section with SectionType header*/, callback ? NULL : data, callback, data->len, *selected_codec, SECTION_FLAGS_NONE,
"codec_assign_best_codec");
tests[t].size = vb->z_data_test.len;
vb->z_data_test.len = 0;
Expand Down
Loading

0 comments on commit 10b9229

Please sign in to comment.