-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: enable TCP keepalive on database connections #1676
Comments
You didn't mention this in the linked issue. This may be due to cancellation. The async methods on It has some ability to recover, but it depends a lot on where in the process it's cancelled. I can glance through your code and see if there's any obvious sources of cancellation issues. |
I see you're using Axum. It's great, but one thing to note is that it does cancel the task if the client disconnects before the response is returned. This is the expected behavior, and actually happens inside Hyper, not Axum itself. If this is the cause, and the connection is left in a state where the database is stuck waiting for a message that will never come, it'll stall at this This would result in connections being effectively leaked from the pool, and future calls returning The thing is, TCP keepalive wouldn't save you here anyway. That's handled entirely at the transport layer, completely hidden from the application. If the database is stalled on a read but the server's kernel is still properly managing the socket, it'll answer the keepalive messages and the connection won't time out. What we need to do is wrap that In the meantime, you could probably mitigate this by setting an |
My ultimate plan is to fix cancellation issues for good by just having the connection state machine run on a background task. This is what a ton of other crates do. We originally didn't want it to work that way because we thought it might be more efficient for the connection logic to execute directly on the calling task, but it didn't really turn out that way. |
@abonander thanks for your comments and help here. We had some confusion about the So I'm closing this issue for now, since I don't see a specific need for TCP keepalive. We'd definitely enable it if keepalive ends up being supported in sqlx, but it no longer seems at all urgent. |
We've seen an issue where the database is waiting on the agent to send prepared statement parameters, which never arrive. While it's not clear what exactly caused this, enabling TCP keepalive might at least reduce the possibility of that sort of thing happening due to a broken connection.
Unfortunately,
sqlx
doesn't currently have a way to enable keepalive, so we'll need to wait until launchbadge/sqlx#3540 is resolved before we can address this.The text was updated successfully, but these errors were encountered: