You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ZIP archive of the build dir that's written to S3 for use by an AWS Batch job uses the default ZipFilecompression parameter value of ZIP_STORED. This means the files in the build dir are not compressed when added to the ZIP archive, and thus the archive is purely a file container format. This trades increased network transfer for reduced CPU time. I'm not sure that's a trade that's worth making in most situations that nextstrain build --aws-batch is used!
On the other side, when the AWS Batch job finishes, it uses zip to update/refresh the archive contents before re-uploading it to S3. The default compression mode is also used there, but in that case it's deflate (ZIP_DEFLATED in Python) with compression level of 6 (although zip still sometimes infers that a file isn't worth deflating and will store it uncompressed).
I'm pretty sure storing without compression was not intentional on my part when I implemented the AWS Batch runtime. I think it's likely I assumed the defaults for Python matched the canonical defaults for zip.
I only realized this late last week while working on #289.
The text was updated successfully, but these errors were encountered:
If we do start compressing, we could use ZIP_LZMA to get better compression—LZMA is what xz uses—but we'd need to switch the runtime to using a zip that supports LZMA.
The ZIP archive of the build dir that's written to S3 for use by an AWS Batch job uses the default
ZipFile
compression
parameter value ofZIP_STORED
. This means the files in the build dir are not compressed when added to the ZIP archive, and thus the archive is purely a file container format. This trades increased network transfer for reduced CPU time. I'm not sure that's a trade that's worth making in most situations thatnextstrain build --aws-batch
is used!On the other side, when the AWS Batch job finishes, it uses
zip
to update/refresh the archive contents before re-uploading it to S3. The default compression mode is also used there, but in that case it's deflate (ZIP_DEFLATED
in Python) with compression level of 6 (althoughzip
still sometimes infers that a file isn't worth deflating and will store it uncompressed).I'm pretty sure storing without compression was not intentional on my part when I implemented the AWS Batch runtime. I think it's likely I assumed the defaults for Python matched the canonical defaults for
zip
.I only realized this late last week while working on #289.
The text was updated successfully, but these errors were encountered: