-
Notifications
You must be signed in to change notification settings - Fork 14
Compression
The file contents can be stored compressed to preserve space in the MongoDB database. This happens transparently to the application layer.
File contents are stored in GridFS, keyed by the SHA-1 hash of that data. That GridFS file does not necessarily hold the raw data, it can be compressed. In this case, while the SHA-1 hash (the file id) remains the same, the data in the chunks will be the output of a compression, and there will be some metadata to indicate how to decompress it to restore the original file.
The store
field on the GridFS file is used to indicate the compression (or other storage) method:
-
raw
: the default, just store the original data bytes -
z
: store zlib-deflated bytes -
gz
: store gzip-compressed bytes -
zip
: store a zip file entry (including the header) -
in
: store raw bytes, but not in chunks, instead inlined as a byte array fieldin
on the GridFS file. Useful for very small files. -
zin
: store deflated bytes inline in the fieldin
. -
alt
: no data is stored in the GridFS document at all, but out-of-band (somewhere else). Details are found in thealt
field.
In the case of compression, the length field indicates the compressed data length (the length that GridFS sees when looking at the chunks). This may even be zero (in the case of inline or out-of-band storage).
The length of the real file is available in the metadata collection v7files
.
When the HTTP client can accept "gzip" content encoding, and the file contents are stored in gzip format internally already, v7files will send the compressed contents out directly, saving on-the-fly de-compression (and potential re-compression later on). When the file is stored uncompressed, it will not try to compress it again for transfer (for the file would presumably be stored compressed if that reduced the file size).