diff --git a/README.md b/README.md index eee084f..847d4d6 100644 --- a/README.md +++ b/README.md @@ -10,10 +10,12 @@ See the [Release page](https://github.com/circulosmeos/gztool/releases) for exec Considerations ============== -* Please, note that the initial index creation still consumes as much time as a complete file decompression. +* Please, note that the initial complete index creation still consumes as much time as a complete file decompression. Once created the index will reduce this time. -Nonetheless, note that **`gztool` can monitor a growing gzip file** (for example, a log created by rsyslog directly in gzip format) and generate the index on-the-fly, thus reducing in the practice to zero the time of index creation. See the `-S` (*Supervise*) option. +Nonetheless, note that `gztool` **creates index interleaved with extraction of data**, so in the practice there's no waste of time. Note that if extraction of data or just index creation are stopped at any moment, `gztool` will reuse the remaining index on the next run over the same data, so time consumption is always minimized. + +Also **`gztool` can monitor a growing gzip file** (for example, a log created by rsyslog directly in gzip format) and generate the index on-the-fly, thus reducing in the practice to zero the time of index creation. See the `-S` (*Supervise*) option. * Index size is approximately 1% or less of compressed gzip file. The bigger the gzip usually the better the proportion. @@ -30,13 +32,15 @@ Nonetheless Mark Adler, the author of [zlib](https://github.com/madler/zlib), pr Also, some optimizations has been made: * **`gztool` can *Supervise* an still-growing gzip file** (for example, a log created by rsyslog directly in gzip format) and generate the index on-the-fly, thus reducing in the practice to zero the time of index creation. See `-S`. +* extraction of data and index creation are interleaved, so there's no waste of time for the index creation. +* **index files are reusable**, so they can be stopped at any time and reused and/or completed later. * an *ex novo* index file format has been created to store the index * span between index points is raised by default from 1 to 10 MiB, and can be adjusted with `-s` (*span*). * windows are compressed in file * windows are not loaded in memory unless they're needed, so the app memory footprint is fairly low. * data can be provided from/to stdin/stdout -More functionality is planned: with v0.3 *index files will be reusable*, so they can be stopped at any time and reused and/or completed later. +More functionality is planned. Compilation =========== @@ -65,61 +69,83 @@ Copy gztool.c to the directory where you compiled zlib, and do: Usage ===== - $ gztool [-b #] [-cdefhilsS] [-I ] ... - - -b #: extract data from indicated byte position number - of gzip file, using index - -c: raw-gzip-compress indicated file to STDOUT - -d: raw-gzip-decompress indicated file to STDOUT - -e: if multiple files are indicated, continue on error - -f: with `-i` force index overwriting if one exists - with `-b` force index creation if none exists + gztool (v0.3.14) + GZIP files indexer and data retriever. + Create small indexes for gzipped files and use them + for quick and random positioned data extraction. + No more waiting when the end of a 10 GiB gzip is needed! + //github.com/circulosmeos/gztool (by Roberto S. Galende) + + $ gztool [-b #] [-s #] [-v #] [-cdefFhilStT] [-I ] ... + + Note that actions `-bStT` proceed to an index file creation (if + none exists) INTERLEAVED with data extraction. As extraction and + index creation occur at the same time there's no waste of time. + Also you can interrupt actions at any moment and the remaining + index file will be reused (and completed if necessary) on the + next gztool run over the same data. + + -b #: extract data from indicated uncompressed byte position of + gzip file (creating or reusing an index file) to STDOUT. + -c: utility: raw-gzip-compress indicated file to STDOUT + -d: utility: raw-gzip-decompress indicated file to STDOUT + -e: if multiple files are indicated, continue on error (if any) + -f: force index overwriting from scratch, if one exists + -F: force index creation/completion first, and then action: if + `-F` is not used, index is created interleaved with actions. -h: print this help -i: create index for indicated gzip file (For 'file.gz' - the default index file name will be 'file.gzi') + the default index file name will be 'file.gzi'). -I INDEX: index file name will be 'INDEX' - -l: list info contained in indicated index file - -s #: span in MiB between index points. By default is 10. - -S: supervise indicated file: create a growing index, - for a still-growing gzip file. (`-i` is implicit). - -Please, **note that STDOUT is used for data extraction** with `-bcd` modifiers. + -l: check and list info contained in indicated index file + -s #: span in uncompressed MiB between index points when + creating the index. By default is `10`. + -S: Supervise indicated file: create a growing index, + for a still-growing gzip file. (`-i` is implicit). + -t: tail (extract last bytes) to STDOUT on indicated gzip file + -T: tail (extract last bytes) to STDOUT on indicated still-growing + gzip file, and continue Supervising & extracting to STDOUT. + -v #: output verbosity: from `0` (none) to `3` (maniac) + Default is `1` (normal). + + Example: Extract data from 1000000000 byte (1 GB) on, + from `myfile.gz` to the file `myfile.txt`. Also gztool will + create (or reuse, or complete) an index file named `myfile.gzi`: + $ gztool -b 1000000000 myfile.gz > myfile.txt + +Please, **note that STDOUT is used for data extraction** with `-bcdtT` modifiers. When using `S` (*Supervise*), the gzipped file may not yet exist when the command is executed, but it will wait patiently for its creation. Examples of use =============== -Make an index for test.gz. The index will be named test.gzi: +Make an index for `test.gz`. The index will be named `test.gzi`: $ gztool -i test.gz -Make an index for test.gz with name 'test.index' +Make an index for `test.gz` with name `test.index`: $ gztool -I test.index test.gz -Retrieve data from uncompressed byte position 1000000 inside test.gz: +Retrieve data from uncompressed byte position 1000000 inside test.gz. Index file will be created **at the same time** (named `test.gzi`): $ gztool -b 1000000 test.gz -In this latter case, if index hasn't yet been created the program will complain and stop. But index creation can be `forced` if it does not exist yet: - - $ gztool -fb 1000000 test.gz - **Supervise an still-growing gzip file and generate the index for it on-the-fly**. The index file name will be `openldap.log.gzi` in this case: $ gztool -S openldap.log.gz -Creating and index for all "\*gz" files in a directory. If `-e` were not used the process would stop on first file as an index for it already exist - `-e` continues processing next file regardless of previous errors. +Creating and index for all "\*gz" files in a directory. + + $ gztool -i *gz - $ gztool -ie *gz + ACTION: Create index - Index file 'data.1.tar.gz.gzi' already exists. - Index file 'data.2.tar.gz.gzi' already exists. - Index file 'data_project.0.tar.gz.gzi' already exists. - Processing 'data_project.1.tar.gz' ... - Built index with 129 access points. - Index written to 'data_project.1.tar.gz.gzi'. + Index file 'data.gzi' already exists and will be used. + (Use `-f` to force overwriting.) + Processing 'data.gz' ... + Index already complete. Nothing to do. Processing 'data_project.2.tar.gz' ... Built index with 73 access points. @@ -129,17 +155,17 @@ Creating and index for all "\*gz" files in a directory. If `-e` were not used th Built index with 3 access points. Index written to 'project_2.gz.gzi'. -Extract data from project.gz byte 25600000 to STDOUT, creating index if necessary (`-f`), and use `grep` on this output: +Extract data from `project.gz` byte 25600000 to STDOUT, and use `grep` on this output. Index file name will be `project.gzi`: - $ gztool -fb 25600000 project.gz | grep -i "balance = " + $ gztool -b 25600000 project.gz | grep -i "balance = " -Please, note that STDOUT is used for data extraction with `-bcd` modifiers, so an explicit command line redirection is needed if output is to be stored in a file: +Please, note that STDOUT is used for data extraction with `-bcdtT` modifiers, so an explicit command line redirection is needed if output is to be stored in a file: - $ gztool -fb 99900000 project.gz > uncompressed.data + $ gztool -b 99900000 project.gz > uncompressed.data -Show internals of all index files in this directory: +Show internals of all index files in this directory. `-e` is used not to stop the process if a `*.gzi` file is not a valid gzip index file: - $ gztool -l *.gzi + $ gztool -v2 -el *.gzi Checking index file 'accounting.gz.gzi' ... Number of index points: 73 @@ -193,7 +219,7 @@ Other tools which try to provide random access to gzipped files Version ======= -This version is **v0.2**. +This version is **v0.3.14**. Please, read the *Disclaimer*. This is still a beta release. In case of any errors, please open an *Issue*. diff --git a/gztool.c b/gztool.c index b411fd2..bf4162c 100644 --- a/gztool.c +++ b/gztool.c @@ -13,7 +13,7 @@ // // LICENSE: // -// v0.1, v0.2 by Roberto S. Galende, 2019-06 +// v0.1, v0.2, v0.3.* by Roberto S. Galende, 2019 // //github.com/circulosmeos/gztool // A work by Roberto S. Galende // distributed under the same License terms covering @@ -125,6 +125,7 @@ #include // uint32_t, uint64_t, UINT32_MAX #include +#include // va_start, va_list, va_end #include #include #include @@ -164,7 +165,7 @@ struct point { uint32_t window_size; /* size of (compressed) window */ unsigned char *window; /* preceding 32K of uncompressed data, compressed */ }; -// NOTE: window_beginning is not stored on disk, is an on-memory-only value +// NOTE: window_beginning is not stored on disk, it's an on-memory-only value /* access point list */ struct access { @@ -173,8 +174,9 @@ struct access { uint64_t file_size; /* size of uncompressed file (useful for bgzip files) */ struct point *list; /* allocated list */ unsigned char *file_name; /* path to index file */ + int index_complete; /* 1: index is complete; 0: index is (still) incomplete */ }; -// NOTE: file_name is not stored on disk, is an on-memory-only value +// NOTE: file_name and index_complete are not stored on disk (on-memory-only values) /* generic struct to return a function error code and a value */ struct returned_output { @@ -184,7 +186,31 @@ struct returned_output { enum EXIT_APP_VALUES { EXIT_OK = 0, EXIT_GENERIC_ERROR = 1, EXIT_INVALID_OPTION = 2 }; -enum SUPERVISE_OPTIONS { SUPERVISE_DONT = 0, SUPERVISE_DO = 1 }; +enum INDEX_AND_EXTRACTION_OPTIONS { JUST_CREATE_INDEX, SUPERVISE_DO, SUPERVISE_DO_AND_EXTRACT_FROM_TAIL, EXTRACT_FROM_BYTE, EXTRACT_TAIL }; + +enum ACTION + { ACT_NOT_SET, ACT_EXTRACT_FROM_BYTE, ACT_COMPRESS_CHUNK, ACT_DECOMPRESS_CHUNK, + ACT_CREATE_INDEX, ACT_LIST_INFO, ACT_HELP, ACT_SUPERVISE, ACT_EXTRACT_TAIL, + ACT_EXTRACT_TAIL_AND_CONTINUE }; + +enum VERBOSITY_LEVEL { VERBOSITY_NONE = 0, VERBOSITY_NORMAL = 1, VERBOSITY_EXCESSIVE = 2, VERBOSITY_MANIAC = 3 }; + +enum VERBOSITY_LEVEL verbosity_level = VERBOSITY_NORMAL; + + +// `fprintf` substitute for printing with VERBOSITY_LEVEL +void printToStderr ( enum VERBOSITY_LEVEL verbosity, const char * format, ... ) { + + // if verbosity of message is above general verbosity_level, ignore message + if ( verbosity <= verbosity_level ) { + va_list args; + va_start (args, format); + vfprintf ( stderr, format, args ); + va_end (args); + } + +} + /************** * Endianness * @@ -463,7 +489,7 @@ local unsigned char *decompress_chunk(unsigned char *source, uint64_t *size) free(in); free(out); if (ret != Z_OK && ret != Z_STREAM_END) { - fprintf(stderr, "Decompression of index' chunk terminated with error (%d).\n", ret); + printToStderr( VERBOSITY_NORMAL, "Decompression of index' chunk terminated with error (%d).\n", ret); } // return size of returned char array in size pointer parameter *size = output_size; @@ -534,6 +560,7 @@ local struct access *create_empty_index() } index->size = 8; index->have = 0; + index->index_complete = 0; return index; } @@ -551,7 +578,8 @@ local struct access *create_empty_index() // or uncompressed with size WINSIZE, // or store an empty window (NULL) because it resides on file. // uint32_t window_size : 0: compress passed window of size WINSIZE -// >0: store window, of size window_size, as it is, in point structure +// >0 & NULL != window: store window, of size window_size, as it is, in point structure +// >0 & NULL == window: ->window=NULL : this marks a window of size window_size that resides on file // OUTPUT: // pointer to (new) index (NULL on error) local struct access *addpoint(struct access *index, uint32_t bits, @@ -595,18 +623,19 @@ local struct access *addpoint(struct access *index, uint32_t bits, // compress window compressed_chunk = compress_chunk(next->window, &size, Z_DEFAULT_COMPRESSION); if (compressed_chunk == NULL) { - fprintf(stderr, "Error whilst compressing index chunk\nProcess aborted\n."); + printToStderr( VERBOSITY_NORMAL, "Error whilst compressing index chunk\nProcess aborted\n." ); return NULL; } free(next->window); next->window = compressed_chunk; /* uint64_t size and uint32_t window_size, but windows are small, so this will always fit */ next->window_size = size; + printToStderr( VERBOSITY_EXCESSIVE, "\t[%ld/%ld] window_size = %d\n", index->have, index->size, next->window_size); } else { if ( window == NULL ) { // create a NULL window: it resides on file, // and can/will later loaded on memory - next->window_size = 0; + next->window_size = window_size; next->window = NULL; } else { // passed window is already compressed: store as it is with size "window_size" @@ -630,12 +659,14 @@ local struct access *addpoint(struct access *index, uint32_t bits, // INPUT: // FILE *output_file : output stream // struct access *index : pointer to index -// uint64_t index_last_written_point : last index point already written to file +// uint64_t index_last_written_point : last index point already written to file: its values +// go from 1 to index->have, so that 0 has special value "None". // OUTPUT: // 0 on error, 1 on success int serialize_index_to_file( FILE *output_file, struct access *index, uint64_t index_last_written_point ) { - struct point *here; + struct point *here = NULL; uint64_t temp; + uint64_t offset; int i; ///* access point entry */ @@ -664,7 +695,9 @@ int serialize_index_to_file( FILE *output_file, struct access *index, uint64_t i /* writing and empy index is allowed: writes the header (of size 4*8 = 32 bytes) */ if ( index_last_written_point == 0 ) { + /* write header */ + fseeko( output_file, 0, SEEK_SET); /* 0x0 8 bytes (to be compatible with .gzi for bgzip format: */ /* the initial uint32_t is the number of bgzip-idx registers) */ temp = 0; @@ -682,18 +715,39 @@ int serialize_index_to_file( FILE *output_file, struct access *index, uint64_t i fwrite_endian(&temp, sizeof(temp), output_file); // have temp = UINT64_MAX; fwrite_endian(&temp, sizeof(temp), output_file); // size + } + // fseek to index position of index_last_written_point + offset = 4*sizeof(temp); + for (i = 0; i < index_last_written_point; i++) { + here = &(index->list[i]); + offset += sizeof(here->out) + sizeof(here->in) + + sizeof(here->bits) + sizeof(here->window_size) + + ((here->window_size==UNCOMPRESSED_WINDOW)? WINSIZE: (here->window_size)); + } + fseeko( output_file, offset, SEEK_SET); + printToStderr( VERBOSITY_MANIAC, "index_last_written_point = %ld\n", index_last_written_point ); + if (NULL!=here) { + printToStderr( VERBOSITY_MANIAC, "%d->window_size = %d\n", i, here->window_size ); + } + printToStderr( VERBOSITY_MANIAC, "offset = %ld\n", offset ); + if ( index_last_written_point != index->have ) { for (i = index_last_written_point; i < index->have; i++) { here = &(index->list[i]); fwrite_endian(&(here->out), sizeof(here->out), output_file); fwrite_endian(&(here->in), sizeof(here->in), output_file); fwrite_endian(&(here->bits), sizeof(here->bits), output_file); - fwrite_endian(&(here->window_size), sizeof(here->window_size), output_file); + if ( here->window_size==UNCOMPRESSED_WINDOW ) { + temp = WINSIZE; + fwrite_endian(&(temp), sizeof(here->window_size), output_file); + } else { + fwrite_endian(&(here->window_size), sizeof(here->window_size), output_file); + } here->window_beginning = ftello(output_file); if (NULL == here->window) { - fprintf(stderr, "Index incomplete! - index writing aborted.\n"); + printToStderr( VERBOSITY_NORMAL, "Index incomplete! - index writing aborted.\n" ); return 0; } else { fwrite(here->window, here->window_size, 1, output_file); @@ -722,81 +776,455 @@ int serialize_index_to_file( FILE *output_file, struct access *index, uint64_t i } -/* Make one entire pass through the compressed stream and build an index, with - access points about every span bytes of uncompressed output -- span is - chosen to balance the speed of random access against the memory requirements - of the list, about 32K bytes per access point. Note that data after the end - of the first zlib or gzip stream in the file is ignored. build_index() - returns the number of access points on success (>= 1), Z_MEM_ERROR for out - of memory, Z_DATA_ERROR for an error in the input file, or Z_ERRNO for a - file read error. On success, *built points to the resulting index. */ +/* Basic checks of existing index file: + - Checks that last index point ->in isn't greater than gzip file size +*/ // INPUT: -// FILE *in : input stream -// off_t span : span -// struct access **built: address of index pointer, equivalent to passed by reference -// enum SUPERVISE_OPTIONS supervise = SUPERVISE_DONT: usual behaviour -// = SUPERVISE_DO : supervise a growing "in" gzip stream -// unsigned char *index_filename : in case SUPERVISE_DO, index will be written on-the-fly +// struct access *index : pointer to index. Can be NULL => no check. +// unsigned char *file_name : gzip file name. Can be NULL or "" => no check. +// unsigned char *index_filename: index file name. Must be != NULL, but can be "". Only used to print warning. +// OUTPUT: +// 0 on error, 1 on success +int check_index_file( struct access *index, unsigned char *file_name, unsigned char *index_filename ) { + + if ( NULL != file_name && + strlen( file_name ) > 0 ) { + if ( NULL != index ) { + // size of input file + struct stat st; + stat( file_name, &st ); + printToStderr( VERBOSITY_EXCESSIVE, "(%ld >= %ld)\n", st.st_size, ( index->list[index->have - 1].in ) ); + if ( index->have > 1 && + st.st_size < ( index->list[index->have - 1].in ) + ) { + printToStderr( VERBOSITY_NORMAL, "WARNING: Index file '%s' corresponds to a file bigger than '%s'\n", + index_filename, file_name ); + return 0; + } + } + } + + return 1; + +} + + +// Creates index for a gzip stream (file or STDIN); +// This function is not called from action_create_index() if an index file +// already exists and it is complete. +// If an incomplete index is passed, it will be completed from the last +// available index point so the whole gzip stream is not processed again. +// Original (zran.c) comments: + /* Make one entire pass through the compressed stream and build an index, with + access points about every span bytes of uncompressed output -- span is + chosen to balance the speed of random access against the memory requirements + of the list, about 32K bytes per access point. Note that data after the end + of the first zlib or gzip stream in the file is ignored. build_index() + returns the number of access points on success (>= 1), Z_MEM_ERROR for out + of memory, Z_DATA_ERROR for an error in the input file, or Z_ERRNO for a + file read error. On success, *built points to the resulting index. */ +// INPUT: +// FILE *in : input stream +// unsigned char *file_name : name of the input file associated with FILE *in. +// Can be "" (no file name: stdin used), but not NULL. +// Used only if there's no usable index && input (FILE *in) +// is associated with a file (not stdin) && +// indx_n_extraction_opts == *_TAIL, for the use of the file +// size as approximation of the size of the tail to be output. +// off_t span : span +// struct access **built: address of index pointer, equivalent to passed by reference. +// Note that index may be received with some (all) points already set +// from caller, if an index file was already available - and so this +// function must use it or create new points from the last available one, +// if needed. +// enum INDEX_AND_EXTRACTION_OPTIONS indx_n_extraction_opts: +// = JUST_CREATE_INDEX: usual behaviour +// = SUPERVISE_DO : supervise a growing "in" gzip stream +// = SUPERVISE_DO_AND_EXTRACT_FROM_TAIL: like SUPERVISE_DO but +// this will also extract data to stdout, starting from +// the last available bytes (tail) on gzip when called. +// = EXTRACT_FROM_BYTE: extract from indicated offset, to stdout +// off_t offset : if indx_n_extraction_opts == EXTRACT_FROM_BYTE, this is the offset byte in +// in the uncompressed stream from which to extract to stdout. +// 0 otherwise. +// unsigned char *index_filename : in case of SUPERVISE_DO, index will be written on-the-fly // to this index file name. // OUTPUT: // struct returned_output: contains two values: // .error: Z_* error code or Z_OK if everything was ok -// .value: size of built index (index->size) +// .value: size of built index (index->have) local struct returned_output build_index( - FILE *in, off_t span, struct access **built, - enum SUPERVISE_OPTIONS supervise, unsigned char *index_filename ) + FILE *in, unsigned char *file_name, off_t span, struct access **built, + enum INDEX_AND_EXTRACTION_OPTIONS indx_n_extraction_opts, off_t offset, + unsigned char *index_filename ) { struct returned_output ret; - off_t totin, totout; /* our own total counters to avoid 4GB limit */ + off_t totin = 0; /* our own total counters to avoid 4GB limit */ + off_t totout = 0; /* our own total counters to avoid 4GB limit */ off_t last; /* totout value of last access point */ - struct access *index; /* access points being generated */ + off_t offset_in; + off_t avail_in_0; /* because strm.avail_in may not exhausts every cycle! */ + off_t avail_out_0; /* because strm.avail_out may not exhausts every cycle! */ + struct access *index = NULL;/* access points being generated */ + struct point *here = NULL; + uint64_t actual_index_point = 0; // only set initially to >0 if NULL != *built + uint64_t output_data_counter = 0;// counts uncompressed bytes output + unsigned char *decompressed_window; z_stream strm; FILE *index_file = NULL; size_t index_last_written_point = 0; - unsigned char input[CHUNK]; // TODO: convert to malloc + int continue_extraction = 0;/* if = 1 when to inconditionally extract data */ + int start_extraction_on_first_depletion = 0; // 0: extract - no depletion interaction. + // 1: start extraction on first depletion. + unsigned char input[CHUNK]; // TODO: convert to malloc unsigned char window[WINSIZE]; // TODO: convert to malloc + unsigned char window2[WINSIZE];// TODO: convert to malloc + uint64_t window2_size; // size of data stored in window2 buffer ret.value = 0; ret.error = Z_OK; - /* initialize inflate */ - strm.zalloc = Z_NULL; - strm.zfree = Z_NULL; - strm.opaque = Z_NULL; - strm.avail_in = 0; - strm.next_in = Z_NULL; - ret.error = inflateInit2(&strm, 47); /* automatic zlib or gzip decoding */ - if (ret.error != Z_OK) - return ret; + // previous condition: if passed index is complete end processing + // if indx_n_extraction_opts allows it. + if ( NULL != (*built) && + // if index->have == 0 index is superfluous + (*built)->have > 0 && + (*built)->index_complete == 1 ) { + if ( indx_n_extraction_opts == SUPERVISE_DO || + indx_n_extraction_opts == JUST_CREATE_INDEX ) { + printToStderr( VERBOSITY_NORMAL, "Index already complete. Nothing to do.\n" ); + ret.value = (*built)->have; + return ret; + } else { + printToStderr( VERBOSITY_NORMAL, "Index already complete - using it.\n" ); + } + } else { + printToStderr( VERBOSITY_NORMAL, "Processing index ...\n" ); + } - /* open index_filename for binary writing */ - // write index to index file: - if ( strlen(index_filename) > 0 ) { - index_file = fopen( index_filename, "wb" ); + /* open index_filename for binary reading & writing */ + if ( strlen(index_filename) > 0 && + ( NULL == index || index->index_complete == 0 ) + ) { + if ( access( index_filename, F_OK ) != -1 ) { + // index_filename already exist: + // "r+": Open a file for update (both for input and output). The file must exist. + // r+, because the index may be incomplete, and so build_index() will + // append new data and complete it (->have & ->size to correct values, not 0x0..0, 0xf..f). + index_file = fopen( index_filename, "r+b" ); + } else { + // index_filename does not exist: + index_file = fopen( index_filename, "w+b" ); + } } else { + // restrictions to not collide index output with data output to stdout + // MUST have been made on caller. SET_BINARY_MODE(STDOUT); // sets binary mode for stdout in Windows index_file = stdout; } if ( NULL == index_file ) { - fprintf( stderr, "Could not write index to file '%s'.\n", index_filename ); + printToStderr( VERBOSITY_NORMAL, "Could not write index to file '%s'.\n", index_filename ); goto build_index_error; } /* inflate the input, maintain a sliding window, and build an index -- this also validates the integrity of the compressed data using the check information at the end of the gzip or zlib stream */ - totin = totout = last = 0; - index = NULL; /* will be allocated by first addpoint() */ + + // if and index is already passed, use it: + if ( NULL != (*built) && + (*built)->have > 0 ) { + // NULL != *built (there is a previous index available: use it!) + // if index->have == 0 index is superfluous + + index = *built; + /* initialize file and inflate state to start there */ + strm.zalloc = Z_NULL; + strm.zfree = Z_NULL; + strm.opaque = Z_NULL; + strm.avail_in = 0; + strm.next_in = Z_NULL; + ret.error = inflateInit2(&strm, -15); /* raw inflate */ + if (ret.error != Z_OK) + return ret; + + index_last_written_point = index->have; + + // + // Select an index point to start, depending on indx_n_extraction_opts + // + if ( indx_n_extraction_opts == SUPERVISE_DO || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL || + indx_n_extraction_opts == JUST_CREATE_INDEX || + indx_n_extraction_opts == EXTRACT_TAIL + ) { + // move to last available index point, and continue from it + actual_index_point = index->have - 1; + // this index must be completed from last point: index->list[index->have-1] + totin = index->list[ actual_index_point ].in; + totout = index->list[ actual_index_point ].out; + here = &(index->list[ actual_index_point ]); + } + + if ( indx_n_extraction_opts == EXTRACT_FROM_BYTE ) { + // move to the point needed for positioning on offset, or + // move to last available point if offset can't be reached + // with actually available index + here = index->list; + actual_index_point = 0; + while ( + ++actual_index_point && + actual_index_point < index->have && + here[1].out <= offset + ) + here++; + actual_index_point--; + totin = index->list[ actual_index_point ].in; + totout = index->list[ actual_index_point ].out; + continue_extraction = 1; + // offset value comes from caller as parameter + } + + if ( index->index_complete == 1 ) { + if ( indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL || + indx_n_extraction_opts == EXTRACT_TAIL ) { + + offset = ( index->file_size - totout ) /4*3; + indx_n_extraction_opts = EXTRACT_FROM_BYTE; + continue_extraction = 1; + + } + } + + assert( NULL != here ); + + // fseek in data for correct position + // using here index data: + if ( stdin == in ) { + // read input until here->in - (here->bits ? 1 : 0) + uint64_t pos = 0; + uint64_t position = here->in - (here->bits ? 1 : 0); + ret.error = 0; + while ( pos < position ) { + if ( !fread(input, 1, (pos+CHUNK < position)? CHUNK: (position - pos), in) ) { + ret.error = -1; + break; + } + pos += CHUNK; + } + } else { + ret.error = fseeko(in, here->in - (here->bits ? 1 : 0), SEEK_SET); + } + if (ret.error == -1) + goto build_index_error; + if (here->bits) { + int i; + i = getc(in); + if (i == -1) { + ret.error = ferror(in) ? Z_ERRNO : Z_DATA_ERROR; + goto build_index_error; + } + (void)inflatePrime(&strm, here->bits, i >> (8 - here->bits)); + } + + // obtain window and initialize with it zlib's Dictionary + if (here->window == NULL && here->window_beginning != 0) { + /* index' window data is not on memory, + but we have position and size on index file, so we load it now */ + FILE *index_file; + if ( index->file_name == NULL || + strlen(index->file_name) == 0 ) { + printToStderr( VERBOSITY_NORMAL, "Error while opening index file.\nAborted.\n" ); + ret.error = Z_ERRNO; + goto build_index_error; + } + if (NULL == (index_file = fopen(index->file_name, "rb")) || + 0 != fseeko(index_file, here->window_beginning, SEEK_SET) + ) { + printToStderr( VERBOSITY_NORMAL, "Error while opening index file.\nAborted.\n" ); + ret.error = Z_ERRNO; + goto build_index_error; + } + // here->window_beginning = 0; // this is not needed + if ( NULL == (here->window = malloc(here->window_size)) || + !fread(here->window, here->window_size, 1, index_file) + ) { + printToStderr( VERBOSITY_NORMAL, "Error while reading index file.\nAborted.\n" ); + ret.error = Z_ERRNO; + goto build_index_error; + } + fclose(index_file); + } + + if (here->window_size != UNCOMPRESSED_WINDOW) { + /* decompress() use uint64_t counters, but index->list->window_size is smaller */ + uint64_t window_size = here->window_size; + /* window is compressed on memory, so decompress it */ + decompressed_window = decompress_chunk(here->window, &window_size); + // In order to avoid deleting the on-memory here->window_size, that may + // be needed later if index must be increased and written disk (fseeko): + (void)inflateSetDictionary(&strm, decompressed_window, window_size); // (window_size must be WINSIZE) + /*free(here->window); + here->window = decompressed_window; + here->window_size = UNCOMPRESSED_WINDOW; // uncompressed WINSIZE next->window*/ + } else { + (void)inflateSetDictionary(&strm, here->window, WINSIZE); + } + + } // end if ( NULL != *built && (*built)->have > 0 ) { + + + // more decisions for extracting uncompressed data + if ( ( NULL != (*built) && stdin == in ) || + NULL == (*built) || + ( NULL != (*built) && (*built)->index_complete == 0 ) + ) { + // index available and stdin is used as gzip data input, + // or no index is available, + // or index exists but it is incomplete. + + if ( stdin == in ) { + // stdin is used as input for gzip data + + if ( indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL ) { + start_extraction_on_first_depletion = 1; + // continue_extraction = 1; on depletion + } + if ( indx_n_extraction_opts == EXTRACT_TAIL ) { + start_extraction_on_first_depletion = 1; + // continue_extraction = 0; on depletion + } + + } else { + // there's a gzip filename + + if ( indx_n_extraction_opts == EXTRACT_TAIL || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL ) { + // set offset_in (equivalent to offset but for ->in values) + // to the last CHUNK of gzip data ... this can be a good tail... + struct stat st; + if ( strlen( file_name ) > 0 ) { + stat(file_name, &st); + if ( st.st_size > 0 ) { + continue_extraction = 1; + if ( st.st_size <= CHUNK ) { + // gzip file is really small: + // change operation mode to extract from byte 0 + offset_in = 0; + } else { + offset_in = st.st_size - CHUNK; + } + printToStderr( VERBOSITY_MANIAC, "offset_in=%ld\n", offset_in ); + } else { + start_extraction_on_first_depletion = 1; + } + } else { + start_extraction_on_first_depletion = 1; + } + /*if ( indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL ) + continue_extraction = 1; + else + continue_extraction = 0;*/ // both set later + } + + } // end if ( stdin == in ) + + } // end if ( ( NULL != *built && stdin == in ) || + // NULL == *built ) || + // ( NULL != (*built) && (*built)->index_complete == 0 ) ) + + + // decrement offset_in and offset by actual position: + if ( offset_in > 0 && + NULL != here ) { + if ( here->in > offset_in ) + offset_in = 0; + else + offset_in -= here->in; + } + if ( offset > 0 && + NULL != here ) { + if ( here->out > offset ) + offset = 0; + else + offset -= here->out; + } + + + // default zlib initialization + // when no index entry points has been found: + if ( NULL == (*built) || + NULL == here ) { + // NULL != *built (there is no previous index available: build it from scratch) + + /* initialize inflate */ + strm.zalloc = Z_NULL; + strm.zfree = Z_NULL; + strm.opaque = Z_NULL; + strm.avail_in = 0; + strm.next_in = Z_NULL; + ret.error = inflateInit2(&strm, 47); /* automatic zlib or gzip decoding (15 + automatic header detection) */ + printToStderr( VERBOSITY_MANIAC, "ret.error = %d\n", ret.error ); + if (ret.error != Z_OK) + return ret; + totin = totout = last = 0; + index = NULL; /* will be allocated by first addpoint() */ + } + strm.avail_out = 0; do { /* get some compressed data from input file */ + strm.avail_in = fread(input, 1, CHUNK, in); - if ( supervise == SUPERVISE_DO && + + avail_in_0 = strm.avail_in; + + printToStderr( VERBOSITY_MANIAC, "totin=%ld,totout=%ld,ftello=%ld,avail_in=%d\n", totin, totout, ftello(in), strm.avail_in ); + + if ( (indx_n_extraction_opts == SUPERVISE_DO || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL) && strm.avail_in == 0 ) { + + // check conditions to start output of uncompressed data + printToStderr( VERBOSITY_MANIAC, ">>> %d, %d, %d", + continue_extraction, start_extraction_on_first_depletion, indx_n_extraction_opts ); + if ( start_extraction_on_first_depletion == 1 ) { + start_extraction_on_first_depletion = 0; + + // output uncompressed data + unsigned have = WINSIZE - strm.avail_out; + output_data_counter += have; + if (fwrite(strm.next_out, 1, have, stdout) != have || ferror(stdout)) { + (void)inflateEnd(&strm); + ret.error = Z_ERRNO; + goto build_index_error; + } + fflush(stdout); + + if ( continue_extraction == 0 ) { + // the process ends here as all required data has been output + // (index remains incomplete) + ret.error = Z_OK; + if ( NULL != index ) { + ret.value = index->have; + } + goto build_index_error; + } + + // continue extracting data as usual, + offset = 0; + offset_in = 0; + // though as indx_n_extraction_opts != EXTRACT_FROM_BYTE it'll + // patiently waits if data exhausts. + + } + // sleep and retry sleep( WAITING_TIME ); continue; + } + if (ferror(in)) { ret.error = Z_ERRNO; goto build_index_error; @@ -807,12 +1235,13 @@ local struct returned_output build_index( } strm.next_in = input; - /* process all of that, or until end of stream */ + /* process all of strm.next_in (size strm.avail_in), or until end of stream */ do { /* reset sliding window if necessary */ if (strm.avail_out == 0) { strm.avail_out = WINSIZE; strm.next_out = window; + avail_out_0 = strm.avail_out; } /* inflate until out of input, output, or at end of block -- @@ -824,11 +1253,78 @@ local struct returned_output build_index( totout -= strm.avail_out; if (ret.error == Z_NEED_DICT) ret.error = Z_DATA_ERROR; - if (ret.error == Z_MEM_ERROR || ret.error == Z_DATA_ERROR) + if (ret.error == Z_MEM_ERROR || ret.error == Z_DATA_ERROR) { + printToStderr( VERBOSITY_EXCESSIVE, "ERR totin=%ld, totout=%ld, ftello=%ld\n", totin, totout, ftello(in) ); goto build_index_error; + } if (ret.error == Z_STREAM_END) break; + // maintain a backup window for the case of sudden Z_STREAM_END + // and indx_n_extraction_opts == *_TAIL + if ( ( NULL == index || index->index_complete == 0 ) && + ( indx_n_extraction_opts == EXTRACT_TAIL || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL ) ) { + window2_size = WINSIZE - strm.avail_out; + memcpy( window2, window, window2_size ); + // TODO: change to pointer flip at the end of loop + } + + // + // if required by passed indx_n_extraction_opts option, extract to stdout: + // + // EXTRACT_FROM_BYTE: extract all: + if ( indx_n_extraction_opts == EXTRACT_FROM_BYTE ) { + unsigned have = avail_out_0 - strm.avail_out; + avail_out_0 = strm.avail_out; + printToStderr( VERBOSITY_MANIAC, ">1> %ld, %d, %d ", offset, have, strm.avail_out ); + if ( offset > have ) { + offset -= have; + } else { + if ( ( offset > 0 && offset <= have ) || + offset == 0 ) { + // print offset - have bytes + // If offset==0 (from offset byte on) this prints always all bytes: + output_data_counter += have - offset; + if (fwrite(window + offset, 1, have - offset, stdout) != (have - offset) || + ferror(stdout)) { + (void)inflateEnd(&strm); + ret.error = Z_ERRNO; + goto build_index_error; + } + offset = 0; + fflush(stdout); + } + } + } else { + // continue_extraction in practice marks the use of "offset_in" + if ( continue_extraction == 1 ) { + unsigned have = WINSIZE - strm.avail_out; + unsigned have_in = avail_in_0 - strm.avail_in; + avail_in_0 = strm.avail_in; + printToStderr( VERBOSITY_MANIAC, ">2> %ld, %d, %d ", offset_in, have_in, strm.avail_in ); + if ( offset_in > 0 ) + offset_in -= have_in; + if ( ( offset_in > 0 && offset_in <= have_in ) || + offset_in == 0 ) { + offset_in = 0; + // print all "have" bytes as with offset_in it is not possible + // to know how much output discard (uncompressed != compressed) + output_data_counter += have; + if (fwrite(window, 1, have, stdout) != have || + ferror(stdout)) { + (void)inflateEnd(&strm); + ret.error = Z_ERRNO; + goto build_index_error; + } + fflush(stdout); + // continue extracting data as usual + offset = 0; + // though indx_n_extraction_opts != EXTRACT_FROM_BYTE + } + } + } + /* if at end of block, consider adding an index entry (note that if data_type indicates an end-of-block, then all of the uncompressed data from that block has been delivered, and none @@ -840,25 +1336,80 @@ local struct returned_output build_index( */ if ((strm.data_type & 128) && !(strm.data_type & 64) && (totout == 0 || totout - last > span)) { - index = addpoint(index, strm.data_type & 7, totin, - totout, strm.avail_out, window, 0); - if (index == NULL) { - ret.error = Z_MEM_ERROR; - goto build_index_error; - } - last = totout; - // write added point! - // note that points written are automatically emptied of its window values - // in order to use as less memory a s possible - if ( ! serialize_index_to_file( index_file, index, index_last_written_point ) ) - goto build_index_error; - index_last_written_point = index->have; + // check actual_index_point to see if we've passed + // the end of the passed previous index, and so + // we must addpoint() from now on : + printToStderr( VERBOSITY_MANIAC, "actual_index_point = %ld\n", actual_index_point ); + if ( actual_index_point > 0 ) + ++actual_index_point; + if ( NULL != index && + actual_index_point > (index->have - 1) ) { + actual_index_point = 0; // this checks are not needed any more + } + if ( actual_index_point == 0 && + // addpoint() only if index doesn't yet exist or it is incomplete + ( NULL == index || index->index_complete == 0 ) + ) { + if ( NULL != index ) + printToStderr( VERBOSITY_MANIAC, "addpoint index->have = %ld, index_last_written_point = %ld\n", + index->have, index_last_written_point ); + + index = addpoint(index, strm.data_type & 7, totin, + totout, strm.avail_out, window, 0); + + if (index == NULL) { + ret.error = Z_MEM_ERROR; + goto build_index_error; + } + last = totout; + + // write added point! + // note that points written are automatically emptied of its window values + // in order to use as less memory a s possible + if ( ! serialize_index_to_file( index_file, index, index_last_written_point ) ) + goto build_index_error; + index_last_written_point = index->have; + } } + } while (strm.avail_in != 0); + } while (ret.error != Z_STREAM_END); + + // last opportunity to output tail data + // before deleting strm object + if ( output_data_counter == 0 && + ( indx_n_extraction_opts == EXTRACT_TAIL || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL ) ) { + + unsigned have = WINSIZE - strm.avail_out; + + printToStderr( VERBOSITY_EXCESSIVE, "last extraction: %d\n", have ); + + if ( have > 0 ) { + if (fwrite(strm.next_out, 1, have, stdout) != have || ferror(stdout)) { + ret.error = Z_ERRNO; + } + } else { + // use backup window + if (fwrite(window2, 1, window2_size, stdout) != have || ferror(stdout)) { + ret.error = Z_ERRNO; + } + + } + + output_data_counter += have; + fflush(stdout); + + } + + // print output_data_counter info + if ( output_data_counter > 0 ) + printToStderr( VERBOSITY_NORMAL, "%ld bytes of data extracted.\n", output_data_counter ); + /* clean up and return index (release unused entries in list) */ (void)inflateEnd(&strm); index->list = realloc(index->list, sizeof(struct point) * index->have); @@ -867,17 +1418,22 @@ local struct returned_output build_index( // once all index values are filled, close index file: a last call must be done // with index_last_written_point = index->have - if ( ! serialize_index_to_file( index_file, index, index->have ) ) - goto build_index_error; - fclose(index_file); + if ( index->index_complete == 0 ) + if ( ! serialize_index_to_file( index_file, index, index->have ) ) + goto build_index_error; + if ( NULL != index_file ) + fclose(index_file); - if ( strlen(index_filename) > 0 ) - fprintf(stderr, "Index written to '%s'.\n", index_filename); - else - fprintf(stderr, "Index written to stdout.\n"); + if ( index->index_complete == 0 ) + if ( strlen(index_filename) > 0 ) + printToStderr( VERBOSITY_NORMAL, "Index written to '%s'.\n", index_filename ); + else + printToStderr( VERBOSITY_NORMAL, "Index written to stdout.\n" ); + + index->index_complete = 1; /* index is now complete */ *built = index; - ret.value = index->size; + ret.value = index->have; return ret; /* return error */ @@ -889,6 +1445,10 @@ local struct returned_output build_index( if (index_file != NULL) fclose(index_file); return ret; + + // there's no need to free(here), + // because it pointed to index's values, and they + // will be freed with free_index(). } @@ -922,7 +1482,7 @@ local struct returned_output extract(FILE *in, struct access *index, off_t offse unsigned have; z_stream strm; struct point *here; - unsigned char input[CHUNK]; // TODO: convert to malloc + unsigned char input[CHUNK]; // TODO: convert to malloc unsigned char discard[WINSIZE]; // TODO: convert to malloc unsigned char *decompressed_window; uint64_t initial_len = len; @@ -956,7 +1516,7 @@ local struct returned_output extract(FILE *in, struct access *index, off_t offse /* find where in stream to start */ here = index->list; i = index->have; - while (--i && i!=0 && here[1].out <= offset) + while (--i && i>0 && here[1].out <= offset) here++; /* initialize file and inflate state to start there */ @@ -1004,16 +1564,16 @@ local struct returned_output extract(FILE *in, struct access *index, off_t offse if (NULL == (index_file = fopen(index->file_name, "rb")) || 0 != fseeko(index_file, here->window_beginning, SEEK_SET) ) { - fprintf(stderr, "Error while opening index file. Extraction aborted.\n"); + printToStderr( VERBOSITY_NORMAL, "Error while opening index file. Extraction aborted.\n" ); fclose(index_file); ret.error = Z_ERRNO; goto extract_ret; } - here->window_beginning = 0; + // here->window_beginning = 0; // this is not needed if ( NULL == (here->window = malloc(here->window_size)) || !fread(here->window, here->window_size, 1, index_file) ) { - fprintf(stderr, "Error while reading index file. Extraction aborted.\n"); + printToStderr( VERBOSITY_NORMAL, "Error while reading index file. Extraction aborted.\n" ); fclose(index_file); ret.error = Z_ERRNO; goto extract_ret; @@ -1159,7 +1719,8 @@ local struct returned_output extract(FILE *in, struct access *index, off_t offse struct access *deserialize_index_from_file( FILE *input_file, int load_windows, unsigned char *file_name ) { struct point here; struct access *index = NULL; - uint32_t i, still_growing = 0; + uint32_t i; + uint32_t index_complete = 1; uint64_t index_have, index_size, file_size; char header[GZIP_INDEX_HEADER_SIZE]; struct stat st; @@ -1188,7 +1749,7 @@ struct access *deserialize_index_from_file( FILE *input_file, int load_windows, if (fread(header, 1, GZIP_INDEX_HEADER_SIZE, input_file) < GZIP_INDEX_HEADER_SIZE || *((uint64_t *)header) != 0 || strncmp(&header[GZIP_INDEX_HEADER_SIZE/2], GZIP_INDEX_IDENTIFIER_STRING, GZIP_INDEX_HEADER_SIZE/2) != 0) { - fprintf(stderr, "File is not a valid gzip index file.\n"); + printToStderr( VERBOSITY_NORMAL, "File is not a valid gzip index file.\n" ); return NULL; } @@ -1200,22 +1761,22 @@ struct access *deserialize_index_from_file( FILE *input_file, int load_windows, // index->size equals index->have when the index file is correctly closed // and index->have == UINT64_MAX when the index is still growing if (index_have == 0 && index_size == UINT64_MAX) { - fprintf(stderr, "Index file is still growing!\n"); - still_growing = 1; + printToStderr( VERBOSITY_NORMAL, "Index file is incomplete.\n" ); + index_complete = 0; } - // create the list of points + // read the list of points do { fread_endian(&(here.out), sizeof(here.out), input_file); fread_endian(&(here.in), sizeof(here.in), input_file); fread_endian(&(here.bits), sizeof(here.bits), input_file); fread_endian(&(here.window_size), sizeof(here.window_size), input_file); - + printToStderr( VERBOSITY_MANIAC, "READ window_size = %d\n", here.window_size ); if ( here.window_size == 0 ) { - fprintf(stderr, "Unexpected window of size 0 found in index file '%s' @%ld.\nIgnoring point %ld.\n", - file_name, ftello(input_file), index->have + 1); + printToStderr( VERBOSITY_NORMAL, "Unexpected window of size 0 found in index file '%s' @%ld.\nIgnoring point %ld.\n", + file_name, ftello(input_file), index->have + 1 ); continue; } @@ -1231,12 +1792,12 @@ struct access *deserialize_index_from_file( FILE *input_file, int load_windows, uint64_t position = here.window_size; unsigned char *input = malloc(CHUNK); if ( NULL == input ) { - fprintf(stderr, "Not enough memory to load index from stdin.\n"); + printToStderr( VERBOSITY_NORMAL, "Not enough memory to load index from stdin.\n" ); goto deserialize_index_from_file_error; } while ( pos < position ) { if ( !fread(input, 1, (pos+CHUNK < position)? CHUNK: (position - pos), input_file) ) { - fprintf(stderr, "Could not read index from stdin.\n"); + printToStderr( VERBOSITY_NORMAL, "Could not read index from stdin.\n" ); goto deserialize_index_from_file_error; } pos += CHUNK; @@ -1252,23 +1813,21 @@ struct access *deserialize_index_from_file( FILE *input_file, int load_windows, // a here.window_beginning = 0 (which is impossible with gzipindx format) here.window_beginning = 0; if (here.window == NULL) { - fprintf(stderr, "Not enough memory to load index from file.\n"); + printToStderr( VERBOSITY_NORMAL, "Not enough memory to load index from file.\n" ); goto deserialize_index_from_file_error; } if ( !fread(here.window, here.window_size, 1, input_file) ) { - fprintf(stderr, "Error while reading index file.\n"); + printToStderr( VERBOSITY_NORMAL, "Error while reading index file.\n" ); goto deserialize_index_from_file_error; } } - + printToStderr( VERBOSITY_MANIAC, "(%p, %d, %ld, %ld, %d), ", index, here.bits, here.in, here.out, here.window_size); // increase index structure with a new point // (here.window can be NULL if load_windows==0) - index = addpoint( index, here.bits, here.in, here.out, 0, here.window, here.window_size ); + index = addpoint( index, here.bits, here.in, here.out, 0, NULL, here.window_size ); // after increasing index, copy values which were not passed to addpoint(): index->list[index->have - 1].window_beginning = here.window_beginning; - index->list[index->have - 1].window_size = here.window_size; - // note that even if (here.window != NULL) it MUST NOT be free() here, because // the pointer has been copied in a point of the index structure. @@ -1282,7 +1841,7 @@ struct access *deserialize_index_from_file( FILE *input_file, int load_windows, index->file_size = 0; - if ( still_growing == 0 ){ + if ( index_complete == 1 ){ /* read size of uncompressed file (useful for bgzip files) */ /* this field may not exist (maybe useful for growing gzip files?) */ fread_endian(&(index->file_size), sizeof(index->file_size), input_file); @@ -1290,10 +1849,15 @@ struct access *deserialize_index_from_file( FILE *input_file, int load_windows, index->file_name = malloc( strlen(file_name) + 1 ); if ( NULL == memcpy( index->file_name, file_name, strlen(file_name) + 1 ) ) { - fprintf(stderr, "Not enough memory to load index from file.\n"); + printToStderr( VERBOSITY_NORMAL, "Not enough memory to load index from file.\n" ); goto deserialize_index_from_file_error; } + if ( index_complete == 1 ) + index->index_complete = 1; /* index is now complete */ + else + index->index_complete = 0; + return index; deserialize_index_from_file_error: @@ -1462,94 +2026,170 @@ local int decompress_file(FILE *source, FILE *dest) } -// write index for a gzip file +// Creates an index for a gzip file. +// If index file already exists, it is completed if it weren't complete, +// or directly used if it is complete. // INPUT: // unsigned char *file_name : file name of gzip file for which index will be calculated, // If strlen(file_name) == 0 stdin is used as gzip file. // struct access **index : memory address of index pointer (passed by reference) // unsigned char *index_filename: file name where index will be written // If strlen(index_filename) == 0 stdout is used as output for index. -// int supervise : value passed to build_index() +// enum INDEX_AND_EXTRACTION_OPTIONS indx_n_extraction_opts: +// value passed to build_index(); +// in case of SUPERVISE_DO* (not *DONT), wait here until gzip file exists. +// off_t offset : if supervise == EXTRACT_FROM_BYTE, this is the offset byte in +// the uncompressed stream from which to extract to stdout. +// 0 otherwise. // off_t span_between_points : span between index points in bytes // OUTPUT: // EXIT_* error code or EXIT_OK on success local int action_create_index( unsigned char *file_name, struct access **index, - unsigned char *index_filename, int supervise, off_t span_between_points ) + unsigned char *index_filename, enum INDEX_AND_EXTRACTION_OPTIONS indx_n_extraction_opts, + off_t offset, off_t span_between_points ) { FILE *in; struct returned_output ret; + uint64_t number_of_index_points = 0; int waiting = 0; + // First of all, check that data output and index output do not collide: + if ( strlen(file_name) == 0 && + strlen(index_filename) == 0 && + ( indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL || + indx_n_extraction_opts == EXTRACT_FROM_BYTE || + indx_n_extraction_opts == EXTRACT_TAIL ) + ) { + // input is stdin, output is stdout, and no file name has been + // indicated for index output, so action is not possible: + printToStderr( VERBOSITY_NORMAL, "ERROR: Please, note that extracted data will be output to STDOUT\n" ); + printToStderr( VERBOSITY_NORMAL, " so an index file name is needed (`-I`).\nAborted.\n" ); + return EXIT_GENERIC_ERROR; + } + // open : if ( strlen(file_name) > 0 ) { wait_for_file_creation: in = fopen( file_name, "rb" ); if ( NULL == in ) { - if (supervise == SUPERVISE_DO) { + if ( indx_n_extraction_opts == SUPERVISE_DO || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL ) { if ( waiting == 0 ) { - fprintf( stderr, "Waiting for creation of file '%s'\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Waiting for creation of file '%s'\n", file_name ); waiting++; } sleep( WAITING_TIME ); goto wait_for_file_creation; } - fprintf( stderr, "Could not open %s for reading.\nAborted.\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Could not open '%s' for reading.\nAborted.\n", file_name ); return EXIT_GENERIC_ERROR; } - fprintf( stderr, "Processing '%s' ...\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Processing '%s' ...\n", file_name ); } else { // stdin SET_BINARY_MODE(STDIN); // sets binary mode for stdin in Windows in = stdin; - fprintf( stderr, "Processing stdin ...\n" ); + printToStderr( VERBOSITY_NORMAL, "Processing stdin ...\n" ); } // compute index: - ret = build_index( in, span_between_points, index, supervise, index_filename ); + // but if index_filename already exist, load it and use + // (if it is complete, it'll be used directly, if not it'll be + // completed from last point - all in build_index() ). + if ( strlen( index_filename ) > 0 && + access( index_filename, F_OK ) != -1 ) { + // index_filename already exist: try to load it + FILE *index_file; + index_file = fopen( index_filename, "rb" ); + if ( NULL != index_file ) { + *index = deserialize_index_from_file( index_file, 0, index_filename ); + fclose( index_file ); + if ( NULL == *index ) { + printToStderr( VERBOSITY_NORMAL, "Could not load index from file '%s'.\nAborted.\n", index_filename ); + return EXIT_GENERIC_ERROR; + } + // index ok, continue + number_of_index_points = (*index)->have; + } else { + printToStderr( VERBOSITY_NORMAL, "Could not open '%s' for reading.\nAborted.\n", file_name ); + return EXIT_GENERIC_ERROR; + } + } + + // checks on index read from file: + if ( NULL != (*index) && + strlen( file_name ) > 0 ) { + // (here, index_filename exists and index exists) + check_index_file( (*index), file_name, index_filename ); + // return value is not used - only warn user and continue + } + + // stdout to binary mode if needed + if ( indx_n_extraction_opts == EXTRACT_FROM_BYTE || + indx_n_extraction_opts == EXTRACT_TAIL || + indx_n_extraction_opts == SUPERVISE_DO_AND_EXTRACT_FROM_TAIL + ) { + SET_BINARY_MODE(STDOUT); // sets binary mode for stdout in Windows + } + + ret = build_index( in, file_name, span_between_points, + index, indx_n_extraction_opts, offset, index_filename ); fclose(in); + if ( ret.error < 0 ) { switch ( ret.error ) { case Z_MEM_ERROR: - fprintf( stderr, "ERROR: Out of memory.\n" ); + printToStderr( VERBOSITY_NORMAL, "ERROR: Out of memory.\n" ); break; case Z_DATA_ERROR: if ( strlen(file_name) > 0 ) - fprintf( stderr, "ERROR: Compressed data error in '%s'.\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "ERROR: Compressed data error in '%s'.\n", file_name ); else - fprintf( stderr, "ERROR: Compressed data error in stdin.\n" ); + printToStderr( VERBOSITY_NORMAL, "ERROR: Compressed data error in stdin.\n" ); break; case Z_ERRNO: if ( strlen(file_name) > 0 ) - fprintf( stderr, "ERROR: Read error on '%s'.\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "ERROR: Read error on '%s'.\n", file_name ); else - fprintf( stderr, "ERROR: Read error on stdin.\n" ); + printToStderr( VERBOSITY_NORMAL, "ERROR: Read error on stdin.\n" ); break; default: - fprintf( stderr, "ERROR: Error %d while building index.\n", ret.error ); + printToStderr( VERBOSITY_NORMAL, "ERROR: Error %d while building index.\n", ret.error ); } return EXIT_GENERIC_ERROR; } - fprintf(stderr, "Built index with %ld access points.\n", ret.value); + + if ( number_of_index_points != (*index)->have ) + if ( number_of_index_points > 0 ) { + printToStderr( VERBOSITY_NORMAL, "Updated index with %ld new access points.\n", ret.value - number_of_index_points); + printToStderr( VERBOSITY_NORMAL, "Now index have %ld access points.\n", ret.value); + } else + printToStderr( VERBOSITY_NORMAL, "Built index with %ld access points.\n", ret.value); return EXIT_OK; } -// extract data from a gzip file using its index file (creates it if it doesn't exist) +/* */ +/* Deprecated in favour of build_index() ! */ +/* */ +// extract data from a gzip file using its index file if it exists, +// or FIRST creates the index if it doesn't exist, and then extract the data. // INPUT: // unsigned char *file_name : gzip file name // unsigned char *index_filename: index file name // uint64_t extract_from_byte : uncompressed offset of original data from which to extract // int force_action : if 1 and index file doesn't exist, create it // off_t span_between_points : span between index points in bytes +// enum action type_of_extraction: one of { ACT_EXTRACT_FROM_BYTE, ACT_EXTRACT_TAIL } // OUTPUT: // EXIT_* error code or EXIT_OK on success local int action_extract_from_byte( unsigned char *file_name, unsigned char *index_filename, - uint64_t extract_from_byte, int force_action, off_t span_between_points ) + uint64_t extract_from_byte, int force_action, off_t span_between_points, enum ACTION type_of_extraction ) { FILE *in = NULL; @@ -1561,55 +2201,90 @@ local int action_extract_from_byte( // open : if ( strlen(file_name) > 0 ) { - fprintf(stderr, "Extracting data from uncompressed byte @%ld in file '%s',\nusing index '%s'...\n", - extract_from_byte, file_name, index_filename); + if ( type_of_extraction == ACT_EXTRACT_FROM_BYTE ) + printToStderr( VERBOSITY_NORMAL, "Extracting data from uncompressed byte @%ld in file '%s',\nusing index '%s'...\n", + extract_from_byte, file_name, index_filename); + if ( type_of_extraction == ACT_EXTRACT_TAIL ) + printToStderr( VERBOSITY_NORMAL, "Extracting tail data from file '%s',\nusing index '%s'...\n", + file_name, index_filename); in = fopen( file_name, "rb" ); if ( NULL == in ) { - fprintf( stderr, "Could not open '%s' for reading.\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Could not open '%s' for reading.\n", file_name ); return EXIT_GENERIC_ERROR; } } else { // stdin - fprintf(stderr, "Extracting data from uncompressed byte @%ld on stdin,\nusing index '%s'...\n", - extract_from_byte, index_filename); + if ( type_of_extraction == ACT_EXTRACT_FROM_BYTE ) + printToStderr( VERBOSITY_NORMAL, "Extracting data from uncompressed byte @%ld on stdin,\nusing index '%s'...\n", + extract_from_byte, index_filename); + if ( type_of_extraction == ACT_EXTRACT_TAIL ) + printToStderr( VERBOSITY_NORMAL, "Extracting tail data from stdin,\nusing index '%s'...\n", + index_filename); SET_BINARY_MODE(STDIN); // sets binary mode for stdout in Windows in = stdin; } + // open index file (filename derived from unless indicated with `-I`) open_index_file: index_file = fopen( index_filename, "rb" ); if ( NULL == index_file ) { if ( force_action == 1 && mark_recursion == 0 ) { // before extraction, create index file - ret_value = action_create_index( file_name, &index, index_filename, SUPERVISE_DONT, span_between_points ); + ret_value = action_create_index( file_name, &index, index_filename, JUST_CREATE_INDEX, 0, span_between_points ); if ( ret_value != EXIT_OK ) goto action_extract_from_byte_error; // index file has been created, so it must now be opened mark_recursion = 1; goto open_index_file; } else { - fprintf( stderr, "Index file '%s' not found.\n", index_filename ); + printToStderr( VERBOSITY_NORMAL, "Index file '%s' not found.\n", index_filename ); ret_value = EXIT_GENERIC_ERROR; goto action_extract_from_byte_error; } } + // deserialize_index_from_file index = deserialize_index_from_file( index_file, 0, index_filename ); if ( ! index ) { - fprintf( stderr, "Could not read index from file '%s'\n", index_filename ); + printToStderr( VERBOSITY_NORMAL, "Could not read index from file '%s'\n", index_filename ); ret_value = EXIT_GENERIC_ERROR; goto action_extract_from_byte_error; } + + if ( type_of_extraction == ACT_EXTRACT_TAIL ) { + // on ACT_EXTRACT_TAIL, now that we have the index data loaded, + // we're trying to calculate where we can get a chunk of last data: + if ( index->have > 0 ) { + extract_from_byte = index->list[index->have -1].out; + // get size of compressed file, if not stdin + // and use it to increment extract_from_byte a little more + // by increasing the offset with + 3/4*(gzip size - last .in) + if ( strlen(file_name) > 0 ) { + struct stat st; + stat(file_name, &st); + if ( st.st_size > 0 ) { + // try to calculate a viable increment in extract_from_byte: + // aprox. +3/4 of remaining compressed size == more than that size + // in uncompressed data: + extract_from_byte += (st.st_size - index->list[index->have -1].in)/4*3; + } + } + } else { + extract_from_byte = 0; + } + printToStderr( VERBOSITY_NORMAL, "...extracting data from uncompressed byte @%ld...\n", + extract_from_byte); + } ret = extract( in, index, extract_from_byte, NULL, 0 ); if ( ret.error < 0 ) { - fprintf( stderr, "Data extraction failed: %s error\n", + printToStderr( VERBOSITY_NORMAL, "Data extraction failed: %s error\n", ret.error == Z_MEM_ERROR ? "out of memory" : "input corrupted" ); ret_value = EXIT_GENERIC_ERROR; } else { if ( strlen(file_name) > 0 ) - fprintf( stderr, "Extracted %ld bytes from '%s' to stdout.\n", ret.value, file_name ); + printToStderr( VERBOSITY_NORMAL, "Extracted %ld bytes from '%s' to stdout.\n", ret.value, file_name ); else - fprintf( stderr, "Extracted %ld bytes from stdin to stdout.\n", ret.value ); + printToStderr( VERBOSITY_NORMAL, "Extracted %ld bytes from stdin to stdout.\n", ret.value ); ret_value = EXIT_OK; } @@ -1635,42 +2310,63 @@ local int action_list_info( unsigned char *file_name ) { FILE *in = NULL; struct access *index = NULL; uint64_t j; - int ret_value; + int ret_value = EXIT_OK; + struct stat st; // open index file: - if ( strlen(file_name) > 0 ) { - fprintf( stderr, "Checking index file '%s' ...\n", file_name ); + if ( strlen( file_name ) > 0 ) { + if ( verbosity_level > VERBOSITY_NONE ) fprintf( stdout, "Checking index file '%s' ...\n", file_name ); in = fopen( file_name, "rb" ); if ( NULL == in ) { - fprintf( stderr, "Could not open %s for reading.\nAborted.\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Could not open %s for reading.\nAborted.\n", file_name ); return EXIT_GENERIC_ERROR; } } else { // stdin - fprintf( stderr, "Checking index from stdin ...\n" ); + printToStderr( VERBOSITY_NORMAL, "Checking index from stdin ...\n" ); SET_BINARY_MODE(STDIN); // sets binary mode for stdout in Windows in = stdin; } - // in case in == stdin, file_name == "" but this doesn't matter as windows won't be deconmpressed + // in case in == stdin, file_name == "" but this doesn't matter as windows won't be inflated index = deserialize_index_from_file( in, 0, file_name ); + if ( strlen( file_name ) > 0 ) { + stat( file_name, &st ); + if ( verbosity_level > VERBOSITY_NONE ) + fprintf( stdout, "\tSize of index file: %ld Bytes", st.st_size ); + if ( NULL != index) { + // TODO: this MUST be done with size of COMPRESSED data file + if ( verbosity_level > VERBOSITY_NORMAL && + index->file_size > 0 ) + fprintf( stdout, " (%.2f%%)", (double)st.st_size / (double)index->file_size * 100.0 ); + } + if ( verbosity_level > VERBOSITY_NONE ) + fprintf( stdout, "\n" ); + } + if ( ! index ) { - fprintf(stderr, "Could not read index from file '%s'.\n", file_name); + printToStderr( VERBOSITY_NORMAL, "Could not read index from file '%s'.\n", file_name); ret_value = EXIT_GENERIC_ERROR; goto action_list_info_error; } else { - fprintf( stderr, "\tNumber of index points: %ld\n", index->have ); - if (index->file_size != 0) - fprintf( stderr, "\tSize of uncompressed file: %ld\n", index->file_size ); - fprintf( stderr, "\tList of points:\n\t @ compressed/uncompressed byte (index data size in Bytes), ...\n\t" ); - for (j=0; jhave; j++) { - fprintf( stderr, "@ %ld / %ld ( %d ), ", index->list[j].in, index->list[j].out, index->list[j].window_size ); + if ( verbosity_level > VERBOSITY_NONE ) + fprintf( stdout, "\tNumber of index points: %ld\n", index->have ); + if (index->file_size != 0) { + if ( verbosity_level > VERBOSITY_NONE ) + fprintf( stdout, "\tSize of uncompressed file: %ld Bytes\n", index->file_size ); + } + if ( verbosity_level > VERBOSITY_NORMAL ) { + fprintf( stdout, "\tList of points:\n\t @ compressed/uncompressed byte (index data size in Bytes), ...\n\t" ); + for (j=0; jhave; j++) { + fprintf( stdout, "@ %ld / %ld ( %d ), ", index->list[j].in, index->list[j].out, index->list[j].window_size ); + } } - fprintf( stderr, "\n" ); + if (verbosity_level > VERBOSITY_NONE ) + fprintf( stdout, "\n" ); } @@ -1690,25 +2386,47 @@ local int action_list_info( unsigned char *file_name ) { // print help local void print_help() { - fprintf( stderr, " gztool (v0.2)\n GZIP files indexer and data retriever.\n"); - fprintf( stderr, " Create small indexes for gzipped files and use them\n for quick and random data extraction.\n" ); - fprintf( stderr, " No more waiting when the end of a 10 GiB gzip is needed!\n" ); - fprintf( stderr, " //github.com/circulosmeos/gztool (by Roberto S. Galende)\n" ); - fprintf( stderr, "\n $ gztool [-b #] [-cdefhilsS] [-I ] ...\n\n" ); - fprintf( stderr, " -b #: extract data from indicated byte position number\n of gzip file, using index\n" ); - fprintf( stderr, " -c: raw-gzip-compress indicated file to STDOUT\n" ); - fprintf( stderr, " -d: raw-gzip-decompress indicated file to STDOUT \n" ); - fprintf( stderr, " -e: if multiple files are indicated, continue on error\n" ); - fprintf( stderr, " -f: with `-i` force index overwriting if one exists\n" ); - fprintf( stderr, " with `-b` force index creation if none exists\n" ); + fprintf( stderr, "\n" ); + fprintf( stderr, " gztool (v0.3.14)\n"); + fprintf( stderr, " GZIP files indexer and data retriever.\n" ); + fprintf( stderr, " Create small indexes for gzipped files and use them\n" ); + fprintf( stderr, " for quick and random positioned data extraction.\n" ); + fprintf( stderr, " No more waiting when the end of a 10 GiB gzip is needed!\n" ); + fprintf( stderr, " //github.com/circulosmeos/gztool (by Roberto S. Galende)\n\n" ); + fprintf( stderr, " $ gztool [-b #] [-s #] [-v #] [-cdefFhilStT] [-I ] ...\n\n" ); + fprintf( stderr, " Note that actions `-bStT` proceed to an index file creation (if\n" ); + fprintf( stderr, " none exists) INTERLEAVED with data extraction. As extraction and\n" ); + fprintf( stderr, " index creation occur at the same time there's no waste of time.\n" ); + fprintf( stderr, " Also you can interrupt actions at any moment and the remaining\n" ); + fprintf( stderr, " index file will be reused (and completed if necessary) on the\n" ); + fprintf( stderr, " next gztool run over the same data.\n\n" ); + fprintf( stderr, " -b #: extract data from indicated uncompressed byte position of\n" ); + fprintf( stderr, " gzip file (creating or reusing an index file) to STDOUT.\n" ); + fprintf( stderr, " -c: utility: raw-gzip-compress indicated file to STDOUT\n" ); + fprintf( stderr, " -d: utility: raw-gzip-decompress indicated file to STDOUT\n" ); + fprintf( stderr, " -e: if multiple files are indicated, continue on error (if any)\n" ); + fprintf( stderr, " -f: force index overwriting from scratch, if one exists\n" ); + fprintf( stderr, " -F: force index creation/completion first, and then action: if\n" ); + fprintf( stderr, " `-F` is not used, index is created interleaved with actions.\n" ); fprintf( stderr, " -h: print this help\n" ); fprintf( stderr, " -i: create index for indicated gzip file (For 'file.gz'\n" ); - fprintf( stderr, " the default index file name will be 'file.gzi')\n" ); + fprintf( stderr, " the default index file name will be 'file.gzi').\n" ); fprintf( stderr, " -I INDEX: index file name will be 'INDEX'\n" ); - fprintf( stderr, " -l: list info contained in indicated index file\n" ); - fprintf( stderr, " -s #: span in MiB between index points. By default is 10.\n" ); - fprintf( stderr, " -S: supervise indicated file: create a growing index,\n" ); - fprintf( stderr, " for a still-growing gzip file. (`-i` is implicit).\n" ); + fprintf( stderr, " -l: check and list info contained in indicated index file\n" ); + fprintf( stderr, " -s #: span in uncompressed MiB between index points when\n" ); + fprintf( stderr, " creating the index. By default is `10`.\n" ); + fprintf( stderr, " -S: Supervise indicated file: create a growing index,\n" ); + fprintf( stderr, " for a still-growing gzip file. (`-i` is implicit).\n" ); + fprintf( stderr, " -t: tail (extract last bytes) to STDOUT on indicated gzip file\n" ); + fprintf( stderr, " -T: tail (extract last bytes) to STDOUT on indicated still-growing\n" ); + fprintf( stderr, " gzip file, and continue Supervising & extracting to STDOUT.\n" ); + fprintf( stderr, " -v #: output verbosity: from `0` (none) to `3` (maniac)\n" ); + fprintf( stderr, " Default is `1` (normal).\n" ); + fprintf( stderr, "\n" ); + fprintf( stderr, " Example: Extract data from 1000000000 byte (1 GB) on,\n" ); + fprintf( stderr, " from `myfile.gz` to the file `myfile.txt`. Also gztool will\n" ); + fprintf( stderr, " create (or reuse, or complete) an index file named `myfile.gzi`:\n" ); + fprintf( stderr, " $ gztool -b 1000000000 myfile.gz > myfile.txt\n" ); fprintf( stderr, "\n" ); } @@ -1734,22 +2452,20 @@ int main(int argc, char **argv) int continue_on_error = 0; int index_filename_indicated = 0; int force_action = 0; + int force_strict_order = 0; + int count_errors = 0; enum EXIT_APP_VALUES ret_value; - enum ACTION - { ACT_NOT_SET, ACT_EXTRACT_FROM_BYTE, ACT_COMPRESS_CHUNK, ACT_DECOMPRESS_CHUNK, - ACT_CREATE_INDEX, ACT_LIST_INFO, ACT_HELP, ACT_SUPERVISE } - action; + enum ACTION action; int opt = 0; - int i, j; + int i; int actions_set = 0; - fprintf( stderr, "\n" ); action = ACT_NOT_SET; ret_value = EXIT_OK; - while ((opt = getopt(argc, argv, "b:cdefhiI:ls:S")) != -1) + while ((opt = getopt(argc, argv, "b:cdefFhiI:ls:StTv:")) != -1) switch(opt) { // help case 'h': @@ -1781,6 +2497,10 @@ int main(int argc, char **argv) case 'f': force_action = 1; break; + // First create index, the process indicated action + case 'F': + force_strict_order = 1; + break; // `-i` creates index for case 'i': action = ACT_CREATE_INDEX; @@ -1810,70 +2530,162 @@ int main(int argc, char **argv) action = ACT_SUPERVISE; actions_set++; break; + case 't': + action = ACT_EXTRACT_TAIL; + actions_set++; + break; + case 'T': + action = ACT_EXTRACT_TAIL_AND_CONTINUE; + actions_set++; + break; + case 'v': + verbosity_level = atoi(optarg); + if ( ( optarg[0] != '0' && verbosity_level == 0 ) || + strlen( optarg ) > 1 || + verbosity_level > VERBOSITY_MANIAC ) { + printToStderr( VERBOSITY_NORMAL, "Option `-v %s` ignored (`-v [0..3]`).\n", optarg ); + verbosity_level = VERBOSITY_NORMAL; + } + break; case '?': if ( isprint (optopt) ) { // print warning only if char option is unknown - if ( NULL == strchr("bcdefhiIlS", optopt) ) { - fprintf(stderr, "Unknown option `-%c'.\n", optopt); + if ( NULL == strchr("bcdefFhiIlSstTv", optopt) ) { + printToStderr( VERBOSITY_NORMAL, "Unknown option `-%c'.\n", optopt); print_help(); } } else - fprintf(stderr, "Unknown option character `\\x%x'.\n", optopt); - fprintf( stderr, "\n" ); + printToStderr( VERBOSITY_NORMAL, "Unknown option character `\\x%x'.\n", optopt); + printToStderr( VERBOSITY_NORMAL, "\n" ); return EXIT_INVALID_OPTION; default: - fprintf( stderr, "\n" ); + printToStderr( VERBOSITY_NORMAL, "\n" ); abort (); } // Checking parameter merging and absence if ( actions_set > 1 ) { - fprintf(stderr, "Please, do not merge parameters `-bcdilS`.\nAborted.\n\n" ); + printToStderr( VERBOSITY_NORMAL, "Please, do not merge parameters `-bcdilStT`.\nAborted.\n\n" ); return EXIT_INVALID_OPTION; } + if ( span_between_points != SPAN && action != ACT_CREATE_INDEX && action != ACT_SUPERVISE ) { - fprintf(stderr, "`-s` parameter will be ignored.\n" ); + printToStderr( VERBOSITY_NORMAL, "`-s` parameter will be ignored.\n" ); } + if ( actions_set == 0 ) { // `-I ` is equivalent to `-i -I ` if ( action == ACT_NOT_SET && index_filename_indicated == 1 ) { action = ACT_CREATE_INDEX; if ( (optind + 1) < argc ) { // too much files indicated to use `-I` - fprintf(stderr, "`-I` is incompatible with multiple input files.\nAborted.\n\n" ); + printToStderr( VERBOSITY_NORMAL, "`-I` is incompatible with multiple input files.\nAborted.\n\n" ); return EXIT_INVALID_OPTION; } } else { - fprintf(stderr, "Please, indicate one parameter of `-bcdilS`, or `-h` for help.\nAborted.\n\n" ); + printToStderr( VERBOSITY_NORMAL, "Please, indicate one parameter of `-bcdilStT`, or `-h` for help.\nAborted.\n\n" ); return EXIT_INVALID_OPTION; } } - if (optind == argc || argc == 1) { + if ( force_strict_order == 1 && + ( action == ACT_SUPERVISE || + action == ACT_EXTRACT_TAIL_AND_CONTINUE || + action == ACT_LIST_INFO || + action == ACT_COMPRESS_CHUNK || + action == ACT_DECOMPRESS_CHUNK ) ) { + printToStderr( VERBOSITY_NORMAL, "WARNING: There's no sense in using `-F` with `-cdlST`: ignoring `-F`.\n" ); + force_strict_order = 0; + } - // file input is stdin - switch ( action ) { + { // inform action on stderr: + unsigned char *action_string; + switch ( action ) { case ACT_EXTRACT_FROM_BYTE: - if ( index_filename_indicated == 1 ) { - ret_value = action_extract_from_byte( - "", index_filename, extract_from_byte, force_action, span_between_points ); - fprintf( stderr, "\n" ); + action_string = "Extract from byte"; break; - } else { - fprintf( stderr, "`-I INDEX` must be used when extracting from stdin.\nAborted.\n\n" ); - ret_value = EXIT_GENERIC_ERROR; + case ACT_COMPRESS_CHUNK: + action_string = "Compress chunk"; + break; + case ACT_DECOMPRESS_CHUNK: + action_string = "Decompress chunk"; + break; + case ACT_CREATE_INDEX: + action_string = "Create index"; + break; + case ACT_SUPERVISE: + action_string = "Supervise still-growing file"; + break; + case ACT_LIST_INFO: + action_string = "Check & list info in index file"; + break; + case ACT_EXTRACT_TAIL: + action_string = "Extract tail data"; break; + case ACT_EXTRACT_TAIL_AND_CONTINUE: + action_string = "Extract from tail data from a still-growing file"; + break; + } + printToStderr( VERBOSITY_NORMAL, "ACTION: %s\n\n", action_string ); + } + + + if (optind == argc || argc == 1) { + // file input is stdin + + // check `-f` and execute delete if index file exists + if ( ( action == ACT_CREATE_INDEX || action == ACT_SUPERVISE || + action == ACT_EXTRACT_TAIL_AND_CONTINUE || action == ACT_EXTRACT_FROM_BYTE ) && + index_filename_indicated == 1 && + access( index_filename, F_OK ) != -1 ) { + // index file already exists + + if ( force_action == 0 ) { + printToStderr( VERBOSITY_NORMAL, "Index file '%s' already exists and will be used.\n", index_filename ); + printToStderr( VERBOSITY_NORMAL, "(Use `-f` to force overwriting.)\n" ); + } else { + // force_action == 1 => delete index file + printToStderr( VERBOSITY_NORMAL, "Using `-f` force option: Deleting '%s' ...\n", index_filename ); + // delete it + if ( remove( index_filename ) != 0 ) { + printToStderr( VERBOSITY_NORMAL, "ERROR: Could not delete '%s'.\nAborted.\n", index_filename ); + ret_value = EXIT_GENERIC_ERROR; + } } + } + + // `-F` has no sense with stdin + if ( force_strict_order == 1 ) { + printToStderr( VERBOSITY_NORMAL, "WARNING: There is no sense in using `-F` with stdin input: ignoring `F`.\n" ); + force_strict_order = 0; + } + + // file input is stdin + switch ( action ) { + + case ACT_EXTRACT_FROM_BYTE: + // stdin is a gzip file + if ( index_filename_indicated == 1 ) { + ret_value = action_create_index( "", &index, index_filename, + EXTRACT_FROM_BYTE, extract_from_byte, span_between_points ); + printToStderr( VERBOSITY_NORMAL, "\n" ); + break; + } else { + printToStderr( VERBOSITY_NORMAL, "`-I INDEX` must be used when extracting from stdin.\nAborted.\n\n" ); + ret_value = EXIT_GENERIC_ERROR; + break; + } + case ACT_COMPRESS_CHUNK: // compress chunk reads stdin or indicated file, and deflates in raw to stdout // If we're here it's because stdin will be used SET_BINARY_MODE(STDOUT); // sets binary mode for stdout in Windows SET_BINARY_MODE(STDIN); // sets binary mode for stdout in Windows if ( Z_OK != compress_file( stdin, stdout, Z_DEFAULT_COMPRESSION ) ) { - fprintf( stderr, "Error while compressing stdin.\nAborted.\n\n" ); + printToStderr( VERBOSITY_NORMAL, "Error while compressing stdin.\nAborted.\n\n" ); ret_value = EXIT_GENERIC_ERROR; break; } @@ -1886,7 +2698,7 @@ int main(int argc, char **argv) SET_BINARY_MODE(STDOUT); // sets binary mode for stdout in Windows SET_BINARY_MODE(STDIN); // sets binary mode for stdout in Windows if ( Z_OK != decompress_file( stdin, stdout ) ) { - fprintf( stderr, "Error while decompressing stdin.\nAborted.\n\n" ); + printToStderr( VERBOSITY_NORMAL, "Error while decompressing stdin.\nAborted.\n\n" ); ret_value = EXIT_GENERIC_ERROR; break; } @@ -1894,38 +2706,69 @@ int main(int argc, char **argv) break; case ACT_CREATE_INDEX: - if ( force_action == 0 && - index_filename_indicated == 1 && + if ( index_filename_indicated == 1 && access( index_filename, F_OK ) != -1 ) { // index file already exists - fprintf( stderr, "Index file '%s' already exists.\n", index_filename ); - fprintf( stderr, "Use `-f` to force overwriting.\nAborted.\n\n" ); - ret_value = EXIT_GENERIC_ERROR; - break; + if ( force_action == 1 ) { + // force_action == 1 => delete index file + printToStderr( VERBOSITY_NORMAL, "Using `-f` force option: Deleting '%s' ...\n", index_filename ); + if ( remove( index_filename ) != 0 ) { + printToStderr( VERBOSITY_NORMAL, "ERROR: Could not delete '%s'.\nAborted.\n", index_filename ); + ret_value = EXIT_GENERIC_ERROR; + break; + } + } else { + printToStderr( VERBOSITY_NORMAL, "Index file '%s' already exists and will be used.\n", index_filename ); + printToStderr( VERBOSITY_NORMAL, "(Use `-f` to force overwriting.)\n" ); + } } // stdin is a gzip file that must be indexed if ( index_filename_indicated == 1 ) { - ret_value = action_create_index( "", &index, index_filename, SUPERVISE_DONT, span_between_points ); + ret_value = action_create_index( "", &index, index_filename, JUST_CREATE_INDEX, 0, span_between_points ); } else { - ret_value = action_create_index( "", &index, "", SUPERVISE_DONT, span_between_points ); + ret_value = action_create_index( "", &index, "", JUST_CREATE_INDEX, 0, span_between_points ); } - fprintf( stderr, "\n" ); + printToStderr( VERBOSITY_NORMAL, "\n" ); break; case ACT_LIST_INFO: // stdin is an index file that must be checked ret_value = action_list_info( "" ); - fprintf( stderr, "\n" ); + printToStderr( VERBOSITY_NORMAL, "\n" ); break; case ACT_SUPERVISE: // stdin is a gzip file for which an index file must be created on-the-fly if ( index_filename_indicated == 1 ) { - ret_value = action_create_index( "", &index, index_filename, SUPERVISE_DO, span_between_points ); + ret_value = action_create_index( "", &index, index_filename, SUPERVISE_DO, 0, span_between_points ); + } else { + ret_value = action_create_index( "", &index, "", SUPERVISE_DO, 0, span_between_points ); + } + printToStderr( VERBOSITY_NORMAL, "\n" ); + break; + + case ACT_EXTRACT_TAIL: + // stdin is a gzip file + if ( index_filename_indicated == 1 ) { + ret_value = action_create_index( "", &index, index_filename, + EXTRACT_TAIL, 0, span_between_points ); } else { - ret_value = action_create_index( "", &index, "", SUPERVISE_DO, span_between_points ); + // if an index filename is not indicated, index will not be output + // as stdout is already used for data extraction + printToStderr( VERBOSITY_NORMAL, "ERROR: Index filename is needed if stdin is used as gzip input.\nAborted.\n" ); + ret_value = EXIT_INVALID_OPTION; } - fprintf( stderr, "\n" ); + break; + + case ACT_EXTRACT_TAIL_AND_CONTINUE: + if ( index_filename_indicated == 1 ) { + ret_value = action_create_index( "", &index, index_filename, + SUPERVISE_DO_AND_EXTRACT_FROM_TAIL, 0, span_between_points ); + } else { + ret_value = action_create_index( "", &index, "", + SUPERVISE_DO_AND_EXTRACT_FROM_TAIL, 0, span_between_points ); + } + printToStderr( VERBOSITY_NORMAL, "\n" ); break; } @@ -1935,8 +2778,8 @@ int main(int argc, char **argv) if ( action == ACT_SUPERVISE && ( argc - optind > 1 ) ) { // supervise only accepts one input gz file - fprintf( stderr, "`-S` option only accepts one gzip file parameter: %d indicated.\n", argc - optind ); - fprintf( stderr, "Aborted.\n" ); + printToStderr( VERBOSITY_NORMAL, "`-S` option only accepts one gzip file parameter: %d indicated.\n", argc - optind ); + printToStderr( VERBOSITY_NORMAL, "Aborted.\n" ); return EXIT_GENERIC_ERROR; } @@ -1944,6 +2787,8 @@ int main(int argc, char **argv) file_name = argv[i]; + ret_value = EXIT_OK; + // if no index filename is set (`-I`), it is derived from each parameter if ( 0 == index_filename_indicated ) { if ( NULL != index_filename ) { @@ -1974,39 +2819,64 @@ int main(int argc, char **argv) index_file = NULL; } - if ( force_action == 0 && - ( action == ACT_CREATE_INDEX || action == ACT_SUPERVISE ) && + if ( ( action == ACT_CREATE_INDEX || action == ACT_SUPERVISE || + action == ACT_EXTRACT_TAIL_AND_CONTINUE || action == ACT_EXTRACT_FROM_BYTE ) && access( index_filename, F_OK ) != -1 ) { // index file already exists - fprintf( stderr, "Index file '%s' already exists.\n", index_filename ); + + if ( force_action == 0 ) { + printToStderr( VERBOSITY_NORMAL, "Index file '%s' already exists and will be used.\n", index_filename ); + printToStderr( VERBOSITY_NORMAL, "(Use `-f` to force overwriting.)\n" ); + } else { + // force_action == 1 + // delete index file + printToStderr( VERBOSITY_NORMAL, "Using `-f` force option: Deleting '%s' ...\n", index_filename ); + if ( remove( index_filename ) != 0 ) { + printToStderr( VERBOSITY_NORMAL, "ERROR: Could not delete '%s'.\nAborted.\n", index_filename ); + ret_value = EXIT_GENERIC_ERROR; + } + } + + } + + + // check possible errors and `-e` before proceed + if ( ret_value != EXIT_OK ) { if ( continue_on_error == 1 ) { continue; } else { - fprintf( stderr, "Use `-f` to force overwriting.\nAborted.\n\n" ); - ret_value = EXIT_GENERIC_ERROR; - break; + break; // breaks for() loop } } + + // create index first if `-F` + // (checking of conformity between `-F` and action has been done before) + if ( force_strict_order == 1 ) { + ret_value = action_create_index( file_name, &index, index_filename, + JUST_CREATE_INDEX, 0, span_between_points ); + } + + // "-bil" options can accept multiple files switch ( action ) { case ACT_EXTRACT_FROM_BYTE: - ret_value = action_extract_from_byte( - file_name, index_filename, extract_from_byte, force_action, span_between_points ); + ret_value = action_create_index( file_name, &index, index_filename, + EXTRACT_FROM_BYTE, extract_from_byte, span_between_points ); break; case ACT_COMPRESS_CHUNK: // compress chunk reads stdin or indicated file, and deflates in raw to stdout // If we're here it's because there's an input file_name (at least one) if ( NULL == (in = fopen( file_name, "rb" )) ) { - fprintf( stderr, "Error while opening file '%s'\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Error while opening file '%s'\n", file_name ); ret_value = EXIT_GENERIC_ERROR; break; } SET_BINARY_MODE(STDOUT); // sets binary mode for stdout in Windows if ( Z_OK != compress_file( in, stdout, Z_DEFAULT_COMPRESSION ) ) { - fprintf( stderr, "Error while compressing '%s'\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Error while compressing '%s'\n", file_name ); ret_value = EXIT_GENERIC_ERROR; } break; @@ -2015,19 +2885,22 @@ int main(int argc, char **argv) // compress chunk reads stdin or indicated file, and deflates in raw to stdout // If we're here it's because there's an input file_name (at least one) if ( NULL == (in = fopen( file_name, "rb" )) ) { - fprintf( stderr, "Error while opening file '%s'\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Error while opening file '%s'\n", file_name ); ret_value = EXIT_GENERIC_ERROR; break; } SET_BINARY_MODE(STDOUT); // sets binary mode for stdout in Windows if ( Z_OK != decompress_file( in, stdout ) ) { - fprintf( stderr, "Error while decompressing '%s'\n", file_name ); + printToStderr( VERBOSITY_NORMAL, "Error while decompressing '%s'\n", file_name ); ret_value = EXIT_GENERIC_ERROR; } break; case ACT_CREATE_INDEX: - ret_value = action_create_index( file_name, &index, index_filename, SUPERVISE_DONT, span_between_points ); + if ( force_strict_order == 0 ) + // if force_strict_order == 1 action has already been done! + ret_value = action_create_index( file_name, &index, index_filename, + JUST_CREATE_INDEX, 0, span_between_points ); break; case ACT_LIST_INFO: @@ -2035,17 +2908,33 @@ int main(int argc, char **argv) break; case ACT_SUPERVISE: - ret_value = action_create_index( file_name, &index, index_filename, SUPERVISE_DO, span_between_points ); - fprintf( stderr, "\n" ); + ret_value = action_create_index( file_name, &index, index_filename, + SUPERVISE_DO, 0, span_between_points ); + printToStderr( VERBOSITY_NORMAL, "\n" ); + break; + + case ACT_EXTRACT_TAIL: + ret_value = action_create_index( file_name, &index, index_filename, + EXTRACT_TAIL, 0, span_between_points ); + break; + + case ACT_EXTRACT_TAIL_AND_CONTINUE: + ret_value = action_create_index( file_name, &index, index_filename, + SUPERVISE_DO_AND_EXTRACT_FROM_TAIL, 0, span_between_points ); + printToStderr( VERBOSITY_NORMAL, "\n" ); break; } - fprintf( stderr, "\n" ); + printToStderr( VERBOSITY_NORMAL, "\n" ); + printToStderr( VERBOSITY_MANIAC, "ERROR code = %d\n", ret_value ); - if ( continue_on_error = 0 && + if ( ret_value != EXIT_OK ) + count_errors++; + + if ( continue_on_error == 0 && ret_value != EXIT_OK ) { - fprintf( stderr, "Aborted.\n" ); + printToStderr( VERBOSITY_NORMAL, "Aborted.\n" ); // break the for loop break; } @@ -2054,6 +2943,14 @@ int main(int argc, char **argv) } + if ( (i -optind) >= 1 ) + printToStderr( VERBOSITY_NORMAL, "%d files processed\n", + ( i -optind + ( (count_errors>0 && continue_on_error == 0 )?1:0 ) ) ); + if ( count_errors > 0 ) + printToStderr( VERBOSITY_NORMAL, "%d files processed with errors!\n", count_errors ); + + printToStderr( VERBOSITY_NORMAL, "\n" ); + // final freeing of resources if ( NULL != in ) { free( in );