-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Set a maximum retries for Docker driver to avoid deadlock. #15026
base: main
Are you sure you want to change the base?
Conversation
@@ -153,7 +151,7 @@ func parseConfig(logCtx logger.Info) (*config, error) { | |||
clientConfig.URL = flagext.URLValue{URL: url} | |||
|
|||
// parse timeout | |||
if err := parseDuration(cfgTimeoutKey, logCtx, func(d time.Duration) { clientConfig.Timeout = d }); err != nil { | |||
if err := parseDurationWithDefault(cfgTimeoutKey, logCtx, func(d time.Duration) { clientConfig.Timeout = d }, 10*time.Second); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just set the default on the config struct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean here? That's also used by Promtail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackoffConfig: backoff.Config{ | ||
MinBackoff: client.MinBackoff, | ||
MaxBackoff: client.MaxBackoff, | ||
MaxRetries: client.MaxRetries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. The default is 10
so the driver should not deadlock it can just take almost an hour until the client gives up. For production use this should be fine.
What this PR does / why we need it:
There's a long ticket claiming that the Docker driver would deadlock when the configured Loki endpoint becomes unreachable. The root cause is that the Loki client retries forever until it can reach Loki again. This looks like a deadlock.
This issues is documented with a workaround. However, users still struggle. That's why this change proposes to make the workaround the default behavior.
Which issue(s) this PR fixes:
Fixes #2361
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR