Skip to content

Latest commit

 

History

History
105 lines (96 loc) · 4.1 KB

README.md

File metadata and controls

105 lines (96 loc) · 4.1 KB

ZStandard Random Access (ZRA) allows random access inside an archive compressed using ZStandard


Note: It isn't recommended to use this library in newer projects as it's no longer being actively maintained, feel free to submit PRs regarding any issues and they'll be merged when possible.

Format

How is this done?

ZSTD has the concept of a Frame which can be decompressed independently from the rest of the file. A ZSTD archive is made of multiple concatenated frames which are decompressed one after another.

We exploit that fact to break the file into uniformly sized frames (Frame Size) and creating a seek-table which contains the offset of each frame within in the file which can be indexed by simply dividing the offset by the frame size.

Header

We store data that's required for decompression or other functionality inside an archive header, that contains the following:

  • ZSTD Skippable Frame - The entire header is inside a ZSTD Skippable Frame so that ZRA is fully compatible with any regular ZSTD decompressor
  • CRC-32 Hash - A CRC-32 hash of the entire header to ensure integrity of the file is always preserved
  • Metadata Section - A section where data which might be used by a ZRA decompressor on the other side but not a part of the archive's contents itself
  • Seek-Table - A table with 40-bit entries containing the offset of individual frames

Usage

Compression

  • In-Memory
    zra::Buffer input = GetInput(); // A `zra::Buffer` full of data to be compressed
    zra::Buffer output = zra::CompressBuffer(input);
  • Streaming
    auto size = input.size();
    zra::Compressor compressor(size);
    output.seek(compressor.GetHeaderSize());
    
    zra::Buffer buffer; // The buffer is reused to prevent constant reallocation
    while (size) {
        auto readSize = std::min(maxChunkSize, size);
        compressor.Compress(input.read(readSize), buffer);
        output.write(buffer);
        size -= readSize;
    }
    
    output.seek(0);
    output.write(compressor.GetHeader());
    
    // Note: `input` and `output` in the example hold an internal offset that is automatically 
    // modified based on operations performed on them, similar to ifstream/ofstream from C++ 
    // Standard Library

Decompression (Entire File)

  • In-Memory
      zra::Buffer input = GetInput(); // A `zra::Buffer` with the entire archive
      zra::Buffer output = zra::DecompressBuffer(input);
  • Streaming
    zra::FullDecompressor decompressor([&input](size_t offset, size_t size, void* output) {
      input.seek(offset);
      input.read(output, size);
    });
    
    auto remaining = decompressor.header.uncompressedSize;
    zra::Buffer buffer(bufferSize); // The buffer is reused to prevent constant reallocation
    while (remaining) {
        auto amount = decompressor.Decompress(buffer);
        output.write(buffer, amount);
        remaining -= amount;
    }

Decompression (Random-Access)

  • In-Memory
    zra::Buffer input = GetInput();
    zra::Buffer output = zra::DecompressRA(input, offset, size);
  • Streaming
    zra::Decompressor decompressor([&input](size_t offset, size_t size, void* output) {
      input.seek(offset);
      input.read(output, size);
    });
    zra::Buffer output = decompressor.Decompress(offset, size);
    // or, to prevent buffer reallocation
    decompressor.Decompress(offset, size, output);

Retrieving Header

  • Using readFunction
    zra::Header header([&input](size_t offset, size_t size, void* output) {
      input.seek(offset);
      input.read(output, size);
    });
  • Using a pointer to the archive
    zra::Header header(input.data());
  • From Decompressor/FullDecompressor
    zra::Decompressor decompressor(...); // or zra::FullDecompressor
    decompressor.header;

License

We use a simple 3-clause BSD license located at LICENSE for easy integration into projects while being compatible with the libraries we utilize