-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid 100% CPU usage while socket is closed #2091
base: master
Are you sure you want to change the base?
Avoid 100% CPU usage while socket is closed #2091
Conversation
It could also be possible to do a |
d72a192
to
80e1efa
Compare
After stop/start kafka service, kafka-python may use 100% CPU caused by busy-retry while the socket was closed. This fix the issue by unregister the socket if the fd is negative.
80e1efa
to
09b5574
Compare
Commit updated. Unregister socket instead of sleep. |
Agreed, can you look at where this negative FD might be coming from? If you can consistently repro it, then the eBPF tools would probably make it a lot easier to track down... Or if you have a way to consistently repro it in a test case, I'd be willing to take a look... |
@jeffwidman the scenario where this happens, to the best of my knowledge, is having the same client connected to a cluster that rapidly drops and accepts new members. Having a long-ish metadata refresh time, plus the client code, to my understanding, having a very long grace period to kill idle connections, can lead to the |
@@ -634,6 +634,9 @@ def _poll(self, timeout): | |||
self._sensors.select_time.record((end_select - start_select) * 1000000000) | |||
|
|||
for key, events in ready: | |||
if key.fileobj.fileno() < 0: | |||
self._selector.unregister(key.fileobj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be more robust: we want to close the conn here if it is a BrokerConnection, which would then trigger an unregister. But it could also be the _wake_r socketpair, in which case we need to reset/rebuild the wake socketpair.
I am also seeing this issue. When the broker goes down, the CPU usage percentage of the producer shoots up. In |
…terations for Kafka 0.8.2 and Python 3.12 (dpkp#159) * skip failing tests for PyPy since they work locally * Reconfigure tests for PyPy and 3.12 * Skip partitioner tests in test_partitioner.py if 3.12 and 0.8.2 * Update test_partitioner.py * Update test_producer.py * Timeout tests after ten minutes * Set 0.8.2.2 to be experimental from hereon * Formally support PyPy 3.9
* Test Kafka 0.8.2.2 using Python 3.11 in the meantime * Override PYTHON_LATEST conditionally in python-package.yml * Update python-package.yml * add python annotation to kafka version test matrix * Update python-package.yml * try python 3.10
* Remove support for EOL'ed versions of Python * Update setup.py
Too many MRs to review... so little time.
4c44bfb
to
1511271
Compare
After stop/start kafka service, kafka-python may use 100% CPU caused by
busy-retry while the socket was closed. This fix the issue by unregister
the socket if the fd is negative.
This change is