-
Notifications
You must be signed in to change notification settings - Fork 38
bottomless: stream gzip snapshot #585
base: main
Are you sure you want to change the base?
Conversation
66203ad
to
2cfbf01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thanks for contributing, left a few nitpicks. I wonder if error handling for multipart uploads doesn't get handled by itself already -- e.g. when https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html is configured. The temporary leftover file for storing the last part is also handled on failure, since it will get deleted eventually -- and if you switch to tempfile
, it would probably even get deleted as part of its drop
routine.
Yes, but you are still billed by how much you retain the incomplete parts. I also don't know if this can be done for other S3-compatible stores. As a best-effort we can wrap the upload parts in a |
I decided to not abort the multipart upload on panics, that was harder than what I initially thought because Only catching errors should be enough for now, and we can leave the panic case to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this PR should be married with #574 - one cover the issue of asynchronous snapshot upload between checkpoints (which is a must have in order to avoid latency spikes), another covers multipart upload.
0df0ad1
to
fedc6ac
Compare
Stream gzip snaphosts using S3's multipart upload.
S3 requires
Content-Length
to be set when doing aPutObject
operation and since we can't know the size of the gzip stream upfront, we send it in parts to S3 using a fixed-sized buffer that increases every 16 chunks up to 100 MiB, ensuring small allocations for small databases.