agent: plumb contexts through #59

tychoish · 2023-02-21T17:45:05Z

In persuit of #55 and #12 (but certianly not all of either.)

sharnoff

Had some thoughts, left comments.

Separately: It seems like this should make it easy-ish to address #37 (basically, listen for context ending and trigger informant's server.exit).

pkg/agent/informant.go

sharnoff · 2023-02-24T02:10:00Z

pkg/agent/entrypoint.go

@@ -72,7 +70,7 @@ func (r MainRunner) Run() error {
 			globalState.Stop()
 			return nil
 		case event := <-podEvents:
-			globalState.handleEvent(event)
+			globalState.handleEvent(ctx, event)


this feels icky (handleEvent will spawn threads using the context long after handleEvent finishes, and we're always using the same context for handleEvent) - but the only better solution is storing the context in globalstate itself.

Thoughts?

I don't think there's any problem with passing a context to a function that returns before the goroutines it spawns (and indeed having this context means that shutting down the agents main loop will actually cause a shutdown. Eventually/soon the basecontext/shutdown stuff can/will help make some of this more manageable.

sharnoff · 2023-02-24T02:10:13Z

pkg/agent/informant.go

+				// we want shutdown to (potentially) live longer than the request which
+				// made it, but having a timeout is still good.
+				ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+				defer cancel()
+
+				if err := server.unregisterFromInformant(ctx); err != nil {


unregisterFromInformant already uses a timeout on the request itself; do we need another one?

I think the fact that doInformantRequest takes a timeout and a context is a bit of a weird API, and I kind of planned to pull that apart in a later PR but I don't feel rushed about that..

it's plausible that we could just pass the the enclosing context to the unregister call, and not worry about it during shutdown, but.

cmd/autoscaler-agent/main.go

sharnoff · 2023-02-24T02:23:58Z

pkg/agent/informant.go

+		runner.spawnBackgroundWorker(ctx, shutdownName, func(context.Context) {
+			// we want shutdown to (potentially) live longer than the request which
+			// made it, but having a timeout is still good.
+			ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)


20 seconds is a super long timeout, compared to other things here (or: compared to our usual configuration of them). I'd either: make this shorter (e.g. 5s), add a config field for it, or calculate it from an existing config field

I mean I think the question is more "how long could an HTTP request handler take to do a thing in normal operation and double(ish) it.

The cap on this is about 30s in my mind (which is probably just what the timeout handler was in System V init scripts between sigterm and sigkill if the process doesn't die, and which has definitely been carried further into the future.

Co-authored-by: sharnoff <[email protected]>

tychoish · 2023-02-24T14:49:06Z

Separately: It seems like this should make it easy-ish to address #37 (basically, listen for context ending and trigger informant's server.exit).

Yep! that's the hope.

sharnoff

I think the unresolved comments are mostly bikeshedding. Worst-case it's a little bit of tech debt that we already have plans to resolve.

Two suggested changes, mostly I think it'd be good to have context.TODO() so that it's more immediately obvious that the context for the background workers are disconnected.

pkg/agent/informant.go

sharnoff · 2023-03-04T00:50:43Z

This appears to be causing the autoscaler-agent to immediately crash on startup. Logs:

I0304 00:36:48.696058       1 main.go:31] Got environment args: {ConfigPath:/etc/autoscaler-agent-config/config.json K8sNodeName:autoscale-sched-worker K8sPodIP:10.244.1.11}
I0304 00:36:48.696600       1 entrypoint.go:29] buildInfo.GitInfo:   bd9595e (2023-03-01 16:33:16 +0000) - agent: plumb contexts through (#59)
I0304 00:36:48.696615       1 entrypoint.go:30] buildInfo.NeonVM:    v0.4.6
I0304 00:36:48.696619       1 entrypoint.go:31] buildInfo.GoVersion: go1.19.6
I0304 00:36:48.696624       1 entrypoint.go:33] Starting pod watcher
I0304 00:36:48.796260       1 entrypoint.go:38] Pod watcher started
I0304 00:36:48.796273       1 entrypoint.go:40] Starting VM watcher
I0304 00:36:48.798530       1 entrypoint.go:45] VM watcher started
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x139c6ed]

goroutine 86 [running]:
github.com/tychoish/fun/seq.(*elemIter[...]).Next(0xc000652ff0?, {0x1940458?, 0xc0000ed080?})
	/go/pkg/mod/github.com/tychoish/[email protected]/seq/list.go:438 +0x2d
github.com/tychoish/fun/set.syncIterImpl[...].Next({{0x19339d8?, 0xc0005993b8?}, {0x1934d10?, 0xc0003c4b30?}}, {0x1940458?, 0xc0000ed080})
	/go/pkg/mod/github.com/tychoish/[email protected]/set/set.go:149 +0xdb
github.com/tychoish/fun/pubsub.(*Broker[...]).dispatchMessage(0x1940458, {0x1940458?, 0xc0000ed080?}, {0x193fc30, 0xc000456640}, {{{{0xc000652ff0, 0x24}, {0xc000564e40, 0xb}}, {0xc000653020, ...}, ...}, ...})
	/go/pkg/mod/github.com/tychoish/[email protected]/pubsub/broker.go:202 +0x2f1
github.com/tychoish/fun/pubsub.(*Broker[...]).startQueueWorkers.func2()
	/go/pkg/mod/github.com/tychoish/[email protected]/pubsub/broker.go:182 +0x145
created by github.com/tychoish/fun/pubsub.(*Broker[...]).startQueueWorkers
	/go/pkg/mod/github.com/tychoish/[email protected]/pubsub/broker.go:175 +0x250

AFAICT this is exclusively caused by the dependency bump.

If this is indeed an issue with github.com/tychoish/fun, this PR should be reverted before merging anything else if a fix is not yet available.

Fixes an issue causing autoscaler-agents to crash on startup, introduced by #59.

sharnoff · 2023-03-05T22:03:00Z

Upgrading to v0.7.1 fixes the issue. Opened #73 to do so.

Fixes an issue causing autoscaler-agents to crash on startup, see: #59 (comment).

tychoish added 2 commits February 21, 2023 12:44

agent: plumb contexts through

7b8da55

fixup

48eecba

tychoish requested a review from sharnoff February 21, 2023 20:53

sharnoff reviewed Feb 24, 2023

View reviewed changes

Update cmd/autoscaler-agent/main.go

2632960

Co-authored-by: sharnoff <[email protected]>

cr1

1381579

sharnoff approved these changes Mar 1, 2023

View reviewed changes

pkg/agent/informant.go Outdated Show resolved Hide resolved

pkg/agent/informant.go Outdated Show resolved Hide resolved

tychoish added 2 commits February 28, 2023 22:15

cr~ish

0d19832

merge

cea5ae1

tychoish merged commit bd9595e into main Mar 1, 2023

sharnoff added a commit that referenced this pull request Mar 5, 2023

Bump tychoish/fun v0.7.0 -> v0.7.1

05bf991

Fixes an issue causing autoscaler-agents to crash on startup, introduced by #59.

sharnoff mentioned this pull request Mar 5, 2023

Bump tychoish/fun v0.7.0 -> v0.7.1 #73

Merged

sharnoff added a commit that referenced this pull request Mar 5, 2023

Bump tychoish/fun v0.7.0 -> v0.7.1 (#73)

333907c

Fixes an issue causing autoscaler-agents to crash on startup, see: #59 (comment).

sharnoff deleted the tychoish/agent-context-handling branch March 7, 2023 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent: plumb contexts through #59

agent: plumb contexts through #59

tychoish commented Feb 21, 2023

sharnoff left a comment

sharnoff Feb 24, 2023

tychoish Feb 24, 2023

sharnoff Feb 24, 2023

tychoish Feb 24, 2023

sharnoff Feb 24, 2023

tychoish Feb 24, 2023

tychoish commented Feb 24, 2023

sharnoff left a comment

sharnoff commented Mar 4, 2023

sharnoff commented Mar 5, 2023

agent: plumb contexts through #59

agent: plumb contexts through #59

Conversation

tychoish commented Feb 21, 2023

sharnoff left a comment

Choose a reason for hiding this comment

sharnoff Feb 24, 2023

Choose a reason for hiding this comment

tychoish Feb 24, 2023

Choose a reason for hiding this comment

sharnoff Feb 24, 2023

Choose a reason for hiding this comment

tychoish Feb 24, 2023

Choose a reason for hiding this comment

sharnoff Feb 24, 2023

Choose a reason for hiding this comment

tychoish Feb 24, 2023

Choose a reason for hiding this comment

tychoish commented Feb 24, 2023

sharnoff left a comment

Choose a reason for hiding this comment

sharnoff commented Mar 4, 2023

sharnoff commented Mar 5, 2023