Custom checkpointing #19

kokes · 2018-12-07T12:55:18Z

Do I understand it correctly that the code checkpoints as soon as an element is yielded? (

Line 209 in 5305f7b

self.state.checkpoint(state_shard_id, item['SequenceNumber'])

)

The issue I have with this behaviour is that yielding an item does not imply it has been processed (whatever that means for me). So if my code fails after I've received a record but before I have processed it, it will be marked as processed in DynamoDB, so I will lose this piece of information.

Is my understanding correct? If so, do you think there should be a mechanism to manually checkpoint data? I don't think it's difficult to code up, it's just tricky in terms of API - since you're making shards transparent, it's not quite clear which shard can be checkpointed at what time.

…sis-python#19) Minor fixes

borgstrom · 2019-04-10T17:41:57Z

Hi @kokes,

Thanks for opening this issue.

I don't think that your understanding is correct.

Let's look at some more of the code block you referenced:

kinesis-python/src/kinesis/consumer.py

Lines 201 to 209 in 5305f7b

    
           for item in resp['Records']: 
        
               if not self.run: 
        
                   break 
        
               log.debug(item) 
        
               yield item 
        
               try: 
        
                   self.state.checkpoint(state_shard_id, item['SequenceNumber'])

Since we do not have any exception handling around the yield it means that if you fail to process an item, then the whole consumer will fail and shutdown BEFORE we checkpoint.

Only if you successfully process the item (i.e. no exception is thrown at line 206) will the position be written to the checkpoint.

This leaves retry mechanisms up to you to implement.

I do agree that this forces you to process items serially from all the shards because of the transparent nature of handling them, but for the use-case that this library was built to solve that was the desired behavior. If you have a proposal for how you'd like the shards to be processed explicitly I would be open to considering it.

whale2 pushed a commit to whale2/async-kinesis-client that referenced this issue Jan 9, 2019

Add callback for custom checkpointing (inspired by NerdWalletOSS/kine…

71ca53e

…sis-python#19) Minor fixes

borgstrom closed this as completed Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom checkpointing #19

Custom checkpointing #19

kokes commented Dec 7, 2018

borgstrom commented Apr 10, 2019

Custom checkpointing #19

Custom checkpointing #19

Comments

kokes commented Dec 7, 2018

borgstrom commented Apr 10, 2019