-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy control plane rate limiter #5785
Conversation
…l-plane-rate-limiter
…l-plane-rate-limiter
2388 tests run: 2272 passed, 0 failed, 116 skipped (full report)Flaky tests (3)Postgres 16Code coverage (full report)
The comment gets automatically updated with the latest test results
466ce49 at 2023-11-14T18:19:34.742Z :recycle: |
Let me give some backstory on the pin-list semaphore idea - Although I see it's not included in this current PR (but the pin-list dependency is still there) why was it necessaryI needed a way to remove available permits at will - eg when the congestion control algorithm wants to decrease the available concurrency.
SolutionWrite our own semaphore. Easier said than done. Inspired by tokio's implementation which uses a I took the code from tokio, and built it on top of pin-list and made a few notable changes:
|
…l-plane-rate-limiter
I see, yes, implementing semaphore is a non-trivial task. Originally I wanted to reuse your implementation with My assumption was that it should be very rare situation when the control plane is overloaded and there is still non-trivial amount of available permits. Than it is not a problem to What do you think about it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some prometheus metrics to monitor the rate limiting would be nice.
If I understand correctly, this is a global rate limit on the number of requests to the control plane. Does it have any "fairness" built into it? If one user sends a lot of requests, can it saturate the limiter easily, effectively causing an outage for everyone else?
In production, we currently run three console/control plane instances behind a load balancer. If one of them is overloaded for some reason and fails all requests, but others are working correctly, how does the rate limiting algorithm behave? In the future, we will also have separate control plane instances in each region.
More tests would be nice. There are unit tests for the algorithm, but I'd also like to see some python tests, testing the throttling in the real proxy.
I see that #5799 addresses that, with a per-endpoint lock. When both of these PRs are merged, I presume we will acquire the per-endpoint lock first, and the global limiter permit after that. That seems OK. Per IP address limiting would be nice too, to avoid DoSsing the control plane with 'get_auth_info' requests or saturating this rate limiter, but that's a different story. |
…l-plane-rate-limiter
…l-plane-rate-limiter
…l-plane-rate-limiter
…l-plane-rate-limiter
…l-plane-rate-limiter
Problem
Proxy might overload the control plane.
Summary of changes
Implement rate limiter for proxy<->control plane connection.
Resolves #5707
Used implementation ideas from https://github.com/conradludgate/squeeze/
Checklist before requesting a review
Checklist before merging