Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] ZIP archive of build dir is stored without compression #295

Open
tsibley opened this issue Jun 21, 2023 · 1 comment
Open

[batch] ZIP archive of build dir is stored without compression #295

tsibley opened this issue Jun 21, 2023 · 1 comment

Comments

@tsibley
Copy link
Member

tsibley commented Jun 21, 2023

The ZIP archive of the build dir that's written to S3 for use by an AWS Batch job uses the default ZipFile compression parameter value of ZIP_STORED. This means the files in the build dir are not compressed when added to the ZIP archive, and thus the archive is purely a file container format. This trades increased network transfer for reduced CPU time. I'm not sure that's a trade that's worth making in most situations that nextstrain build --aws-batch is used!

On the other side, when the AWS Batch job finishes, it uses zip to update/refresh the archive contents before re-uploading it to S3. The default compression mode is also used there, but in that case it's deflate (ZIP_DEFLATED in Python) with compression level of 6 (although zip still sometimes infers that a file isn't worth deflating and will store it uncompressed).

I'm pretty sure storing without compression was not intentional on my part when I implemented the AWS Batch runtime. I think it's likely I assumed the defaults for Python matched the canonical defaults for zip.

I only realized this late last week while working on #289.

@tsibley
Copy link
Member Author

tsibley commented Jun 21, 2023

If we do start compressing, we could use ZIP_LZMA to get better compression—LZMA is what xz uses—but we'd need to switch the runtime to using a zip that supports LZMA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

1 participant