-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client/server: Don't block the main connection loop for transport IO #73
Conversation
This pulls in a new version of github.com/containerd/ttrpc from a fork to fix the deadlock issue in containerd/ttrpc#72. Will revert back to the upstream ttrpc vendor once the fix is merged (containerd/ttrpc#73). Signed-off-by: Kevin Parsons <[email protected]>
This pulls in a new version of github.com/containerd/ttrpc from a fork to fix the deadlock issue in containerd/ttrpc#72. Will revert back to the upstream ttrpc vendor once the fix is merged (containerd/ttrpc#73). This fix also included some vendoring cleanup from running "vndr". Signed-off-by: Kevin Parsons <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// the main loop will return and close done, which will cause us to exit as well. | ||
case <-done: | ||
return | ||
case response := <-responses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be slightly clearer to defer close(responses)
and just have this be for response := range responses
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, maybe that's not practical since responses
might still be referenced in the call
goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, generally you don't want to close from the read side.
client.go
Outdated
} | ||
|
||
go func(streamID uint32, call *callRequest) { | ||
requests <- streamCall{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call c.send
here directly, rather than pop over to another goroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could result in multiple of this goroutines calling c.send
concurrently, couldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do keep this model, do you need to select here on ctx.Done()
so that this goroutine doesn't hang forever?
(Alternatively maybe the other goroutine shouldn't select on ctx.Done()
and should use some other scheme to determine when it's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we don't have any way for these to be cleaned up if the connection closes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep a single sender goroutine that receives messages via channel and calls c.send
. That will ensure we don't interleave the bits from multiple messages on the wire.
ctx.Done()
seems to be the client's equivalent of the done
channel on the server side, so I think that's probably most appropriate to select on to see when we should terminate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering your question, yes, that's true. I thought that was already possible but I was wrong. So that's out, I suppose.
A possible problem with this approach is that you've eliminated the backpressure on calls
--if the socket is busy, we still keep processing messages from calls
, allocating more goroutines without bound. Before, we would stop pulling messages off calls
, which would allow someone select sending on calls
(doubt this happens, though, didn't look yet). Also storing messages on calls
is probably more memory and CPU efficient than storing them in blocked goroutines.
Not sure if that's a practical consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed we do select on sending to calls
. So I think this is a problem worth solving.
I'd suggest trying to process calls
directly in the new goroutine. You'll need to come up with a new scheme for synchronizing waiters
--although you could play more games with channels, perhaps it's reasonable to just use a mutex in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the send to a select with a <-c.ctx.Done()
case, so we at least won't leak goroutines. I'll look at refactoring the rest of the flow to add back-pressure soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tend to agree with @jstarks suggestion here, re: process calls in a new goroutine and use a mutex to sync waiters.
Restructures both the client and server connection management so that sending messages on the transport is done by a separate "sender" goroutine. The receiving end was already split out like this. Without this change, it is possible for a send to block if the other end isn't reading fast enough, which then would block the main connection loop and prevent incoming messages from being processed. Signed-off-by: Kevin Parsons <[email protected]>
@fuweid @crosbymichael PTAL |
@cpuguy83 @jstarks @katiewasnothere ptal (I see the PR was updated since your last review comments) |
I (finally) revisited this PR and took a different approach. The new PR is #94. Going to close this PR, but PTAL at the new one. :) |
Restructures both the client and server connection management so that
sending messages on the transport is done by a separate "sender"
goroutine. The receiving end was already split out like this.
Without this change, it is possible for a send to block if the other end
isn't reading fast enough, which then would block the main connection
loop and prevent incoming messages from being processed.
Signed-off-by: Kevin Parsons [email protected]
Fixes #72
Note: I feel there may be other things that can be cleaned up in the client/server connection code, but with this PR I was focused on fixing this bug specifically since we are seeing it in production.