fix: prevent possible slow resolution of rate limiter #433
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By submitting a PR to this repository, you agree to the terms within the Auth0 Code of Conduct. Please see the contributing guidelines for how to create and submit a high-quality PR for this repo.
Description
We have found scenarios at scale where our requests lock up for 10+ seconds and wait for something in the rate limiter while resolving a jwk endpoint. This was found when we had a consistent rate (around 2 per-second) of requests trying to validate a
kid
which didn't exist in the jwk endpoint, and so caused a lot of cache-misses, which caused the rate limiter to kick in.We believe the intention is for the rate limiter to return immediately and not pause to wait for available tokens (hence passing the fireImmediately arg to the RateLimiter constructor).
However, maybe because of a race condition at scale (unfortunately we can't reproduce this locally), it seems possible to end up trying to resolve a token from the bucket even when there are no tokens left:
https://github.com/jhurliman/node-rate-limiter/blob/main/src/RateLimiter.ts#L83
Once we are on this path the rate limiter can trigger a wait for new 'tokens' become available:
https://github.com/jhurliman/node-rate-limiter/blob/main/src/TokenBucket.ts#L96
This fix uses the simpler
tryRemoveTokens
method which synchronously returns a boolean if tokens could be taken (and so can't pause execution to wait for tokens to be available).Although we can't write a specific test for this, the new code seems simpler and passes the existing rate-limit tests, so it seems like a good change anyway.
Testing
There are existing tests to cover this.
Checklist