Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent/executor: Fix DoSleeper deadlock/goroutine leak #553

Merged

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Oct 9, 2023

Introduced by #371. Classic case of blocking on <-timer.C after !timer.Stop() not always being sound if you've already received from timer.C

Example stack trace:

goroutine 281014 [chan receive, 33 minutes]:
runtime.gopark(0x1?, 0x2?, 0x0?, 0x0?, 0xc00007c000?)
	/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0017b0cf0 sp=0xc0017b0cd0 pc=0x43d7d6
runtime.chanrecv(0xc001702840, 0x0, 0x1)
	/usr/local/go/src/runtime/chan.go:583 +0x49d fp=0xc0017b0d80 sp=0xc0017b0cf0 pc=0x408a7d
runtime.chanrecv1(0xc001034340?, 0xc0017b0ddc?)
	/usr/local/go/src/runtime/chan.go:442 +0x18 fp=0xc0017b0da8 sp=0xc0017b0d80 pc=0x408578
github.com/neondatabase/autoscaling/pkg/agent/executor.(*ExecutorCore).DoSleeper(0xc001034340, {0x1b971b0, 0xc00149a1e0}, 0x0?)
	/workspace/pkg/agent/executor/exec_sleeper.go:23 +0xd8 fp=0xc0017b0f18 sp=0xc0017b0da8 pc=0x15630b8
github.com/neondatabase/autoscaling/pkg/agent/executor.(*ExecutorCore).DoSleeper-fm({0x1b971b0?, 0xc00149a1e0?}, 0xc002404fa0?)
	<autogenerated>:1 +0x3c fp=0xc0017b0f48 sp=0xc0017b0f18 pc=0x158f1bc
github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker.func1()
	/workspace/pkg/agent/runner.go:388 +0xe3 fp=0xc0017b0fe0 sp=0xc0017b0f48 pc=0x1577e23
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0017b0fe8 sp=0xc0017b0fe0 pc=0x471a41
created by github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker
	/workspace/pkg/agent/runner.go:354 +0x1ec

Classic case of blocking on <-timer.C after !timer.Stop() not always
being sound if you've already received from timer.C
@sharnoff
Copy link
Member Author

sharnoff commented Oct 9, 2023

e2e failed because of docker pull rate limit, but the change is pretty low-risk. Merging anyways

@sharnoff sharnoff merged commit b7aea0e into main Oct 9, 2023
6 of 7 checks passed
@sharnoff sharnoff deleted the sharnoff/agent-executor-fix-DoSleeper-goroutine-leak branch October 9, 2023 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant