-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB returns operation error use of closed network
#1825
Comments
Hi @satorunooshie , I saw that you opened 2 issues but Im going to answer them both here: The reason you are seeing a Regarding your other ticket and PR: Proposed solution: Let me know if you have any other questions and I can answer them here 🙂. Thanks you! |
This issue has not received a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled. |
We are seeing the same issue as well, can we please resurrect this issue? |
I was able to fix this issue but wasn't able to reproduce this error 100% times so I can't create PR. Before deploying my fix we had ~6 errors per minute, after deploy there is no errors. Here is my fork of You can give it a try by simple adding to
|
@RanVaknin or any other contributor, can you review the solution psmarcin is suggesting? We are having the same issue that is mentioned above:
|
Hi everyone, Did not see the action on this thread after it closed. Thank you, |
@RanVaknin I run my fork in high load environment without any problem for about 3 months now. Would love to know the root cause. |
In our case, it is a weird combination of the smithy version with the golang version that causes the issue: The behavior is not consistent, sometimes it fails sometimes it does not, so this is not a proven OK or fail combination, but before our uprade from golang 1.17 to 1.19 we had no problems (and were using smithy v1.13.4...) So anyways, it looks to me like @psmarcin's fix t throught out the closing block could work, but what would the downsides of this missing close statement be? In the original commit in which it came in, it was written:
So not sure what effect this removal would have... |
We probably have same problem on SQS DeleteMessage.
go 1.19.3, aws-sdk-go-v2 v1.17.2 |
I've seen this issue sporadically (4-5 times) in the last few weeks in a failing unit test which connects to DynamoDB Local running in Docker, but I haven't seen it in production.
I have tried to reproduce it locally, but no success. I've tried multithreading calls to DynamoDB local, and stressing it etc. In production, I've got 1M+ daily DynamoDB calls using |
I am using this package to use AWS Rekognition and have the same error. I have the latest version, when will this be a fix? Or is there any workaround? github.com/aws/aws-sdk-go-v2 v1.17.3
github.com/aws/aws-sdk-go-v2/config v1.18.8
github.com/aws/aws-sdk-go-v2/service/rekognition v1.23.0
github.com/aws/smithy-go v1.13.5 |
This is just one of a thousand paper cuts caused by use of smithy generated code. We've already had to "downgrade" a few services to using |
just upgraded my project from sdk v1 to latest sdk v2, started seeing this error.
|
We also are having significant developer pain when running tests with Github Actions. Once we reach a critical mass of tests, they begin to fail consistently not intermittently. As others have suggested, we tried forking the repo and commenting out the suggested block of code. I also concluded (as did @isaiahvita) that the commented-out block should not have an impact on this issue. But with nothing else to go off, it was worth a try. And removing removing the block of code did stop failures (at least until we built up another critical mass of tests-- so for a month or so, a beautiful, consistent month). I spent more time attempting to debug the issue in smithy itself as I had a path to reproduce it 100% of the time and quickly learned that any extra infrastructure I added to help debug (with or without the commented block), tended to "fix" the issue. To me this seems to indicate there is a real race condition somewhere in the code and that simply adding or subtracting work seems to shuffle load just enough that the is less likely to occur. The fact that the issue only occurs after upgrading to sdk2 should also be a clear sign of this. Alas, I did not have time to continue my investigation and now the error is back. Please re-open the issue. |
Well.... other here. It looks incredible that in 2 years there is no an official fix. we are facing same problem. We are trying to read the body response, but we got an "use of closed network connection" error instead. It's clear that closing the body, as @psmarcin and @isaiahvita indicates, causes a race condition. It would be nice to propose a solution better htan make our own fork commenting those lines |
I'm also seeing this. it is making my CI basically unusable |
Hi everyone, Since more and more people are reporting being impacted I have re-opened this issue. In our previous reproduction attempts we were not able to recreate the reported behavior. The next step is to ask one of the customers here on the thread to provide a minimal github repository with code that can reliably reproduce the behavior (even if it means seeing intermittent errors reliably). That would give us a little more to go off of. Thanks again, |
we are also seeing this issue happen intermittently in all of our environments. |
We have been seeing this intermittently in all of our environments too. I'm curious if in the short term, avoiding concurrent use of the ddb client would reduce the frequency of occurrence. Will try to yield a minimal example to reproduce the behavior, but not sure when we can. |
I got the same issue using github.com/aws/aws-sdk-go-v2/service/sqs v1.31.4. |
Hi @CurryFishBalls9527 , @mcblair , @csepulveda , Please provide a complete reproduction code so we can re-attempt to reproduce this behavior. Thanks, |
To those affected by this issue, 2 things-- retrying these errorsYou can make the underlying error type retryErrClosed struct {
aws.Retryer
}
func (r *retryErrClosed) IsErrorRetryable(err error) bool {
return errors.Is(err, net.ErrClosed) || r.Retryer.IsErrorRetryable(err)
}
svc := s3.NewFromConfig(cfg, func(o *s3.Options) {
o.Retryer = &retryErrClosed{o.Retryer}
}) This should be safe to do for get-type operations such as root-causingAs of this writing we have been unable to reproduce this in-house whatsoever. I'm going to ask that anyone affected by this issue implement some basic If you are affected by this issue and are in a position to do so, please implement this tracing logic in your clients and share any of the stack traces produced from it: import (
awshttp "github.com/aws/aws-sdk-go-v2/aws/transport/http"
)
type dialContext func(ctx context.Context, network, addr string) (net.Conn, error)
func traceConn(dc dialContext) dialContext {
return func(ctx context.Context, network, addr string) (net.Conn, error) {
conn, err := dc(ctx, network, addr)
return &tracedConn{Conn: conn}, err
}
}
type tracedConn struct {
net.Conn
closeTrace string
}
func (c *tracedConn) Read(p []byte) (int, error) {
n, err := c.Conn.Read(p)
if len(c.closeTrace) > 0 {
fmt.Println()
fmt.Println("!!! READ ON CLIENT-CLOSED CONN !!!")
fmt.Println(c.closeTrace)
fmt.Println("!!! -(end trace)-------------- !!!")
fmt.Println()
}
return n, err
}
func (c *tracedConn) Close() error {
c.closeTrace = string(debug.Stack())
return c.Conn.Close()
}
func main() {
// ...
svc := s3.NewFromConfig(cfg, func(o *s3.Options) {
client := o.HTTPClient.(*awshttp.BuildableClient)
o.HTTPClient = client.WithTransportOptions(func(t *http.Transport) {
t.DialContext = traceConn(t.DialContext)
})
})
// ...
} |
I'm also seeing this when using aws-sdk-go-v2 v1.27.2 and aws-sdk-go-v2/service/dynamodb v1.32.8 |
We also see this errors (but in SQS ReceiveMessage) in all of our environments.
github.com/aws/aws-sdk-go-v2/service/sqs v1.33.1 |
Hi @iwata @csepulveda @a-h @CurryFishBalls9527 @drshriveer @cenapguldh @minhquang4334 @satorunooshie @cwd-nial @mikkael131 @BananZG @mcblair @sugymt @SaikyoSaru @zapisanchez @jcarter3 @psmarcin @drshriveer and anyone else I might have missed here. First off, thank you all for your continued patience and detailed contributions to this thread. We recognize that this issue has been persistent and challenging. Despite our efforts, we still haven’t been able to reproduce this issue internally, which complicates our ability to pinpoint the exact cause and develop a fix. Both @lucix-aws and I have requested additional information to aid in this investigation, but it seems we need to make these requests more visible. To ensure that all necessary details are clearly visible and to help guide new people running into this issue, we are going to lock this thread, and I'm also going to pin it. We ask that anyone still experiencing this problem to please open a new issue and include all the details outlined in @lucix-aws’s recent comment. This information is crucial for us to attempt a detailed reproduction of the problem. Thank you once again for your efforts and cooperation. |
Describe the bug
SDK often returns the error below after updated SDK version.
acutal error:
Expected Behavior
No errors
Current Behavior
Reproduction Steps
Possible Solution
No response
Additional Information/Context
No response
AWS Go SDK V2 Module Versions Used
Compiler and Version used
1.19
Operating System and version
Linux
The text was updated successfully, but these errors were encountered: