[batch] ZIP archive of build dir is stored without compression #295

tsibley · 2023-06-21T18:00:46Z

The ZIP archive of the build dir that's written to S3 for use by an AWS Batch job uses the default ZipFile compression parameter value of ZIP_STORED. This means the files in the build dir are not compressed when added to the ZIP archive, and thus the archive is purely a file container format. This trades increased network transfer for reduced CPU time. I'm not sure that's a trade that's worth making in most situations that nextstrain build --aws-batch is used!

On the other side, when the AWS Batch job finishes, it uses zip to update/refresh the archive contents before re-uploading it to S3. The default compression mode is also used there, but in that case it's deflate (ZIP_DEFLATED in Python) with compression level of 6 (although zip still sometimes infers that a file isn't worth deflating and will store it uncompressed).

I'm pretty sure storing without compression was not intentional on my part when I implemented the AWS Batch runtime. I think it's likely I assumed the defaults for Python matched the canonical defaults for zip.

I only realized this late last week while working on #289.

The text was updated successfully, but these errors were encountered:

tsibley · 2023-06-21T18:17:28Z

If we do start compressing, we could use ZIP_LZMA to get better compression—LZMA is what xz uses—but we'd need to switch the runtime to using a zip that supports LZMA.

nextstrain-bot added this to Nextstrain planning (archived) Jun 22, 2023

github-project-automation bot moved this to New in Nextstrain planning (archived) Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[batch] ZIP archive of build dir is stored without compression #295

[batch] ZIP archive of build dir is stored without compression #295

tsibley commented Jun 21, 2023 •

edited

Loading

tsibley commented Jun 21, 2023

[batch] ZIP archive of build dir is stored without compression #295

[batch] ZIP archive of build dir is stored without compression #295

Comments

tsibley commented Jun 21, 2023 • edited Loading

tsibley commented Jun 21, 2023

tsibley commented Jun 21, 2023 •

edited

Loading