-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve github pull request call retries #1810
Improve github pull request call retries #1810
Conversation
Retry with fixed 1 second backoff up to 3 retries was added by #1131 to address #1019, but the issue continued to show up (#1453). Increase max attempts to 5 and use exponential backoff for a maximum total retry time of (2^n - n - 1) seconds, which is roughly 30 seconds for current max attempts n = 5. Also move the sleep to the top of the loop so that we never sleep without sending the request again on the last iteration.
retryDelay := 1 * time.Second | ||
for i := 0; i < numRetries; i++ { | ||
// to attempt up to 5 times with exponential backoff. | ||
maxAttempts := 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(first attempt technically isn't a retry so I changed the var names to use attempt)
Is there any maintainers around - this looks relatively straightforward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a follow on to resolve similar issues to runatlantis#1019. In runatlantis#1131 retries were added to GetPullRequest. And in runatlantis#1810 a backoff was included. However, those only resolve one potential request at the very beginning of a PR creation. The other request that happens early on during auto-plan is one to ListFiles to detect the modified files. This too can sometimes result in a 404 due to async updates on the GitHub side.
* Improve github pull request call retries Retry with fixed 1 second backoff up to 3 retries was added by runatlantis#1131 to address runatlantis#1019, but the issue continued to show up (runatlantis#1453). Increase max attempts to 5 and use exponential backoff for a maximum total retry time of (2^n - n - 1) seconds, which is roughly 30 seconds for current max attempts n = 5. Also move the sleep to the top of the loop so that we never sleep without sending the request again on the last iteration. * Fix style with gofmt -s
Retry with fixed 1 second backoff up to 3 retries was added by #1131 to address #1019, but the issue continued to show up (#1453).
Increase max attempts to 5 and use exponential backoff for a maximum total retry time of (2^n - n - 1) seconds, which is roughly 30 seconds for current max attempts n = 5.
Also move the sleep to the top of the loop so that we never sleep without sending the request again on the last iteration.
I decided not to change the test case added in #1131 to avoid sleeping the full time in the test suite - I can bump the retries there to 5 to keep it in sync if that would be better.
I can also add jitter so the backoff isn't deterministic / less vulnerable to thundering herds but didn't for now since there's already existing retry logic in this exact place that is deterministic and linear and making it exponential could be done with a small change without pulling in additional dependencies.
Also would it be good to expose configuration options for this and make the default behavior match current with max wait 3 seconds? (like what is proposed in #1715, though the problem here is not waiting long enough to retry rather than throttling) I'm not entirely sure what the implications are for a longer wait here with concurrent PRs that need planning, etc (I think it's fine for my use case, not as confident it's fine for everyone's)