-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom checkpointing #19
Comments
Hi @kokes, Thanks for opening this issue. I don't think that your understanding is correct. Let's look at some more of the code block you referenced: kinesis-python/src/kinesis/consumer.py Lines 201 to 209 in 5305f7b
Since we do not have any exception handling around the Only if you successfully process the item (i.e. no exception is thrown at line 206) will the position be written to the checkpoint. This leaves retry mechanisms up to you to implement. I do agree that this forces you to process items serially from all the shards because of the transparent nature of handling them, but for the use-case that this library was built to solve that was the desired behavior. If you have a proposal for how you'd like the shards to be processed explicitly I would be open to considering it. |
Do I understand it correctly that the code checkpoints as soon as an element is yielded? (
kinesis-python/src/kinesis/consumer.py
Line 209 in 5305f7b
The issue I have with this behaviour is that yielding an item does not imply it has been processed (whatever that means for me). So if my code fails after I've received a record but before I have processed it, it will be marked as processed in DynamoDB, so I will lose this piece of information.
Is my understanding correct? If so, do you think there should be a mechanism to manually checkpoint data? I don't think it's difficult to code up, it's just tricky in terms of API - since you're making shards transparent, it's not quite clear which shard can be checkpointed at what time.
The text was updated successfully, but these errors were encountered: