Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow stream to be interrupted when there is no traffic #1251

Merged
merged 2 commits into from
Jun 10, 2024

Conversation

erikvanoosten
Copy link
Collaborator

@erikvanoosten erikvanoosten commented Jun 7, 2024

When a partition is lost while there is no traffic, PartitionStreamControl is blocked waiting for data. We fix this by racing with the interruptPromise. (Thanks @josdirksen for the analysis! See #1250.)

Note: this situation can only occur with lost partitions. Timeouts (the other reason for interrupts) do not occur when there is no traffic. Currently, we have no way to test lost partitions. Therefore, there are no added tests.

This PR does not change how lost partitions are handled. That is, the stream for the partition that is lost is interrupted, the other streams are closed gracefully, the consumer aborts with an error.

When a partition is lost while there is no traffic, `PartitionStreamControl` is blocked waiting for data. We fix this by racing with the `interruptPromise`. (Thanks @josdirksen for the analysis!)

Note: this situation can only occur with lost partitions. Timeouts (the other reason for interrupts) do not occur when there is no traffic. Currently, we have no way to test lost partitions. Therefore, there are no added tests.

See #1250.
Copy link
Collaborator

@svroonland svroonland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would indeed fix the situation as described in #1250.

So now the only remaining distinction between handling revoked and lost is that for lost we terminate immediately. Like you mentioned before, I think that lost partitions will not have received any data for a long time anyway. But since we can't test that, this is probably a safe implementation.

@erikvanoosten erikvanoosten merged commit d399b84 into master Jun 10, 2024
14 checks passed
@erikvanoosten erikvanoosten deleted the fix-low-traffic-interrupt branch June 10, 2024 08:40
svroonland added a commit that referenced this pull request Nov 10, 2024
…#1371)

The issue became apparent from the tests. It unfortunately means that
#1251 did not fix the issue of blocking when partitions are lost when
there is no more data in the queue.

* Fix: race should be raceFirst, because interruptionPromise will only
ever fail (relates to #1251)
* Reset queue size on lost
* Abstract requesting data for testing purposes
@erikvanoosten
Copy link
Collaborator Author

Note: @svroonland found a bug in this code after adding a unit test. See #1371.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants