Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On topic data loss offsets are not committed properly #1558

Closed
4 tasks done
AlexeyRaga opened this issue Nov 28, 2017 · 4 comments
Closed
4 tasks done

On topic data loss offsets are not committed properly #1558

AlexeyRaga opened this issue Nov 28, 2017 · 4 comments

Comments

@AlexeyRaga
Copy link
Contributor

AlexeyRaga commented Nov 28, 2017

Description

When there is a data loss in Kafka (say, data is physically deleted from the broker), the existing consumers will have negative lag until they are restarted.

I can see that librdkafka probably behaves correctly in a way, that when there are new messages are getting received from the topic even if their offsets are smaller than what the consumer remembers.
However, committing offsets doesn't seem to write new "latest" offsets and the lag keeps being reported negative.
This happens at least until the job restarts.

To compare: Java consumers will immediately commit latest offsets and the lag becomes healthy again. It may have something to do with the old discussion about the fact that Java consumers write offsets unconditionally, while rdkafka only writes offsets on commit when it detects the change ;)
But this issue is more severe because now lags are incorrect, it is hard to reason about what is being processed or not and what happens on restart.

Checklist

Please provide the following information:

  • librdkafka version (release number or git tag):
    0d540ab, possibly latest
  • Apache Kafka version: 0.10.x
  • librdkafka client configuration: nothing special
  • Operating system: Ubuntu 16
@AlexeyRaga
Copy link
Contributor Author

I have upgraded the job to the latest released librdkafka (261371d) and the offsets seem to still stay negative:

GROUP             TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             OWNER
consumer-group-id lost-topic      0          49818603        7437            -49811166       rdkafka_/192.169.25.16
consumer-group-id lost-topic      5          49799038        7577            -49791461       rdkafka_/192.169.25.16
consumer-group-id lost-topic      6          49890640        7526            -49883114       rdkafka_/192.169.25.16
consumer-group-id lost-topic      11         49753139        7812            -49745327       rdkafka_/192.169.25.16
consumer-group-id lost-topic      12         49767513        7980            -49759533       rdkafka_/192.169.25.16
consumer-group-id lost-topic      13         49804048        7669            -49796379       rdkafka_/192.169.25.16
consumer-group-id lost-topic      18         49927977        7809            -49920168       rdkafka_/192.169.25.16
consumer-group-id lost-topic      19         49799423        7736            -49791687       rdkafka_/192.169.25.16

@edenhill
Copy link
Contributor

This is simply explained by librdkafka being so fast it is reading future messages.
Case closed.

@edenhill edenhill reopened this Nov 28, 2017
@edenhill
Copy link
Contributor

Changing librdkafka to allow committing older offsets (and identical offsets) is a simple change, but we might want to make it an opt-in configuration for some time to avoid affecting existing users that rely on the current behaviour

@edenhill
Copy link
Contributor

Dup of #1372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants