Remove Kinesis producer's internal TTL by default #2147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By default the Kinesis producer implementation will time out records that cannot be delivered within 30 seconds. This interferes with Maxwell's back-pressure implementation as more than 30 seconds of records may buffer before Maxwell begins to throttle itself causing the whole daemon to exit.
I initially tried setting the Ttl to the max value mentioned in the Kinesis producer documentation, unfortunately this isn't actually supported by the Kinesis producer since it appears to cause overflow in the backend and makes the records timeout immediately. Opted for just changing the Ttl to 1 hour instead which I imagine is a reasonable compromise that let's Maxwell's back pressure do its thing while still surfacing an error if records long term continue to be undeliverable.
Tested the change and can confirm I'm not seeing failures in the timeout failures anymore / back pressure is working properly.
Reference
See https://javadoc.io/doc/com.amazonaws/amazon-kinesis-producer/0.14.0/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.html#getRecordTtl--