-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: plumb contexts through #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some thoughts, left comments.
Separately: It seems like this should make it easy-ish to address #37 (basically, listen for context ending and trigger informant's server.exit
).
@@ -72,7 +70,7 @@ func (r MainRunner) Run() error { | |||
globalState.Stop() | |||
return nil | |||
case event := <-podEvents: | |||
globalState.handleEvent(event) | |||
globalState.handleEvent(ctx, event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels icky (handleEvent
will spawn threads using the context long after handleEvent
finishes, and we're always using the same context for handleEvent
) - but the only better solution is storing the context in globalstate
itself.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any problem with passing a context to a function that returns before the goroutines it spawns (and indeed having this context means that shutting down the agents main loop will actually cause a shutdown. Eventually/soon the basecontext/shutdown stuff can/will help make some of this more manageable.
// we want shutdown to (potentially) live longer than the request which | ||
// made it, but having a timeout is still good. | ||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) | ||
defer cancel() | ||
|
||
if err := server.unregisterFromInformant(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unregisterFromInformant
already uses a timeout on the request itself; do we need another one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that doInformantRequest
takes a timeout and a context is a bit of a weird API, and I kind of planned to pull that apart in a later PR but I don't feel rushed about that..
it's plausible that we could just pass the the enclosing context to the unregister call, and not worry about it during shutdown, but.
pkg/agent/informant.go
Outdated
runner.spawnBackgroundWorker(ctx, shutdownName, func(context.Context) { | ||
// we want shutdown to (potentially) live longer than the request which | ||
// made it, but having a timeout is still good. | ||
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20 seconds is a super long timeout, compared to other things here (or: compared to our usual configuration of them). I'd either: make this shorter (e.g. 5s), add a config field for it, or calculate it from an existing config field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I think the question is more "how long could an HTTP request handler take to do a thing in normal operation and double(ish) it.
The cap on this is about 30s in my mind (which is probably just what the timeout handler was in System V init scripts between sigterm and sigkill if the process doesn't die, and which has definitely been carried further into the future.
Co-authored-by: sharnoff <[email protected]>
Yep! that's the hope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the unresolved comments are mostly bikeshedding. Worst-case it's a little bit of tech debt that we already have plans to resolve.
Two suggested changes, mostly I think it'd be good to have context.TODO()
so that it's more immediately obvious that the context for the background workers are disconnected.
This appears to be causing the autoscaler-agent to immediately crash on startup. Logs:
AFAICT this is exclusively caused by the dependency bump. If this is indeed an issue with |
Fixes an issue causing autoscaler-agents to crash on startup, introduced by #59.
Upgrading to v0.7.1 fixes the issue. Opened #73 to do so. |
Fixes an issue causing autoscaler-agents to crash on startup, see: #59 (comment).
In persuit of #55 and #12 (but certianly not all of either.)