DynamoDB returns operation error `use of closed network` #1825

satorunooshie · 2022-08-31T05:29:45Z

Describe the bug

SDK often returns the error below after updated SDK version.

acutal error:

operation error DynamoDB: Query, $tableName: https response error StatusCode: 200, RequestID: 123456, deserialization failed, failed to decode response body, read tcp $src->$dst: use of closed network connection

Expected Behavior

No errors

Current Behavior

operation error DynamoDB: Query, $tableName: https response error StatusCode: 200, RequestID: 123456, deserialization failed, failed to decode response body, read tcp $src->$dst: use of closed network connection

Reproduction Steps

	ddb := dynamodb.NewFromConfig(defaultCfg)
	result, err := ddb.Query(
		ctx,
		&dynamodb.QueryInput{
			TableName:                 aws.String(tableName),
			ExpressionAttributeNames:  expr.Names(),
			ExpressionAttributeValues: expr.Values(),
			KeyConditionExpression:    expr.KeyCondition(),
		},
	)

Possible Solution

No response

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

	github.com/aws/aws-sdk-go-v2 v1.16.8
	github.com/aws/aws-sdk-go-v2/config v1.15.15
	github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.9.8
	github.com/aws/aws-sdk-go-v2/feature/dynamodb/expression v1.4.14
	github.com/aws/aws-sdk-go-v2/service/dynamodb v1.15.10
	github.com/aws/smithy-go v1.12.0

Compiler and Version used

1.19

Operating System and version

Linux

The text was updated successfully, but these errors were encountered:

RanVaknin · 2022-08-31T18:44:17Z

Hi @satorunooshie ,

I saw that you opened 2 issues but Im going to answer them both here:

The reason you are seeing a status 200 followed by a serialization error is that the DynamoDB server is able to send back a valid response, and before the client is finished reading the body of the successful response, the connection errors out. This can happen for many many reasons and is difficult to pinpoint especially when the pattern is inconsistent like in your case.

Regarding your other ticket and PR:
We cannot accept your PR to override the Retryer behavior because not every API operation may be idempotent. The service API models don't have information to determine if an operation is idempotent, thus the SDKs cannot assume if it's safe to retry.

Proposed solution:
You can implement your own Retryer on the client level, or on operation level and configure it to retry when the connection fails.

Let me know if you have any other questions and I can answer them here 🙂.

Thanks you!
~Ran

github-actions · 2022-09-03T00:08:28Z

This issue has not received a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

SaikyoSaru · 2022-09-29T11:50:17Z

We are seeing the same issue as well, can we please resurrect this issue?

psmarcin · 2022-09-29T12:53:42Z

I was able to fix this issue but wasn't able to reproduce this error 100% times so I can't create PR.

Before deploying my fix we had ~6 errors per minute, after deploy there is no errors.

Here is my fork of smithy-go: https://github.com/psmarcin/smithy-go there is only one small change - remove conditional body closing.

You can give it a try by simple adding to go.mod file:

replace github.com/aws/smithy-go => github.com/psmarcin/smithy-go v0.0.0-20220923084642-9e31178f1156

cwd-nial · 2022-12-08T10:36:41Z

@RanVaknin or any other contributor, can you review the solution psmarcin is suggesting? We are having the same issue that is mentioned above:

operation error DynamoDB: Query, https response error StatusCode: 200, RequestID: 1263a6b7-ebec-4a51-9594-b647545da4fa, deserialization failed, failed to decode response body, read tcp 172.17.0.12:36206->172.17.0.15:8000: use of closed network connection\n

RanVaknin · 2022-12-09T22:38:43Z

Hi everyone,

Did not see the action on this thread after it closed.
My last attempt at reproducing was not fruitful but I will take another look at it and look at the fix @psmarcin suggested.

Thank you,
Ran~

psmarcin · 2022-12-15T07:35:51Z

@RanVaknin I run my fork in high load environment without any problem for about 3 months now. Would love to know the root cause.

cwd-nial · 2022-12-15T21:06:04Z

In our case, it is a weird combination of the smithy version with the golang version that causes the issue:
go 1.19 + smithy 1.10.0 = OK
go 1.19 + smithy 1.11.1 = fails sometimes
go 1.19 + smithy 1.13.5 = fails sometimes

The behavior is not consistent, sometimes it fails sometimes it does not, so this is not a proven OK or fail combination, but before our uprade from golang 1.17 to 1.19 we had no problems (and were using smithy v1.13.4...)

So anyways, it looks to me like @psmarcin's fix t throught out the closing block could work, but what would the downsides of this missing close statement be?

In the original commit in which it came in, it was written:

// HTTP RoundTripper *should* close the request body. But this may not happen in a timely manner.
// So instead Smithy *Request Build wraps the body to be sent in a safe closer that will clear the
// stream reference so that it can be safely reused.
if builtRequest.Body != nil {
	_ = builtRequest.Body.Close()
}

So not sure what effect this removal would have...

sugymt · 2023-01-02T02:43:02Z

We probably have same problem on SQS DeleteMessage.
(Request ID and source IP are masked)

operation error SQS: DeleteMessage, https response error StatusCode: 200, RequestID: xxxx-xxxx-xxxx-xxxx-xxxx, deserialization failed, failed to discard response body, read tcp xxx.xxx.xxx.xxx:xxx->18.183.37.45:443: use of closed network connection

go 1.19.3, aws-sdk-go-v2 v1.17.2

a-h · 2023-01-05T17:42:42Z

I've seen this issue sporadically (4-5 times) in the last few weeks in a failing unit test which connects to DynamoDB Local running in Docker, but I haven't seen it in production.

https response error StatusCode: 200, RequestID: 94f949c6-46fe-47ad-9a8a-6227339b616c, deserialization failed, failed to 
decode response body, read tcp 172.18.0.2:57392->172.18.0.3:8000: use of closed network connection

I have tried to reproduce it locally, but no success. I've tried multithreading calls to DynamoDB local, and stressing it etc.

In production, I've got 1M+ daily DynamoDB calls using github.com/aws/aws-sdk-go-v2/service/dynamodb v1.17.8 running without issue.

iwata · 2023-01-19T01:59:46Z

I am using this package to use AWS Rekognition and have the same error. I have the latest version, when will this be a fix? Or is there any workaround?

	github.com/aws/aws-sdk-go-v2 v1.17.3
	github.com/aws/aws-sdk-go-v2/config v1.18.8
	github.com/aws/aws-sdk-go-v2/service/rekognition v1.23.0
	github.com/aws/smithy-go v1.13.5

jcarter3 · 2023-10-09T19:56:30Z

This is just one of a thousand paper cuts caused by use of smithy generated code. We've already had to "downgrade" a few services to using aws-sdk-v1 to work around a bunch of s3 edge cases - Is downgrading the preferred solution for all of these issues?

Testing out: aws/aws-sdk-go-v2#1825

BananZG · 2024-02-19T05:29:23Z

just upgraded my project from sdk v1 to latest sdk v2, started seeing this error.

github.com/aws/aws-sdk-go-v2/service/dynamodb v1.28.1
github.com/aws/smithy-go v1.20.0

drshriveer · 2024-02-23T22:10:31Z

We also are having significant developer pain when running tests with Github Actions. Once we reach a critical mass of tests, they begin to fail consistently not intermittently. As others have suggested, we tried forking the repo and commenting out the suggested block of code. I also concluded (as did @isaiahvita) that the commented-out block should not have an impact on this issue. But with nothing else to go off, it was worth a try. And removing removing the block of code did stop failures (at least until we built up another critical mass of tests-- so for a month or so, a beautiful, consistent month).

I spent more time attempting to debug the issue in smithy itself as I had a path to reproduce it 100% of the time and quickly learned that any extra infrastructure I added to help debug (with or without the commented block), tended to "fix" the issue.

To me this seems to indicate there is a real race condition somewhere in the code and that simply adding or subtracting work seems to shuffle load just enough that the is less likely to occur. The fact that the issue only occurs after upgrading to sdk2 should also be a clear sign of this. Alas, I did not have time to continue my investigation and now the error is back.

Please re-open the issue.

zapisanchez · 2024-03-08T08:00:57Z

go 1.21
github.com/aws/aws-sdk-go-v2 v1.25.2
github.com/aws/aws-sdk-go-v2/config v1.27.5
github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.13.7
github.com/aws/aws-sdk-go-v2/feature/dynamodb/expression v1.7.7
github.com/aws/aws-sdk-go-v2/service/dynamodb v1.30.2
github.com/aws/aws-sdk-go-v2/service/s3 v1.51.2
github.com/aws/smithy-go v1.20.1

Well.... other here. It looks incredible that in 2 years there is no an official fix.

we are facing same problem. We are trying to read the body response, but we got an "use of closed network connection" error instead.

It's clear that closing the body, as @psmarcin and @isaiahvita indicates, causes a race condition.

It would be nice to propose a solution better htan make our own fork commenting those lines

stan-stately · 2024-04-01T23:13:26Z

I'm also seeing this. it is making my CI basically unusable

RanVaknin · 2024-04-05T21:42:50Z

Hi everyone,

Since more and more people are reporting being impacted I have re-opened this issue.

In our previous reproduction attempts we were not able to recreate the reported behavior. The next step is to ask one of the customers here on the thread to provide a minimal github repository with code that can reliably reproduce the behavior (even if it means seeing intermittent errors reliably).

That would give us a little more to go off of.

Thanks again,
Ran~

CurryFishBalls9527 · 2024-04-05T22:40:28Z

we are also seeing this issue happen intermittently in all of our environments.

mcblair · 2024-04-08T16:57:18Z

We have been seeing this intermittently in all of our environments too. I'm curious if in the short term, avoiding concurrent use of the ddb client would reduce the frequency of occurrence. Will try to yield a minimal example to reproduce the behavior, but not sure when we can.

csepulveda · 2024-04-23T14:07:37Z

I got the same issue using github.com/aws/aws-sdk-go-v2/service/sqs v1.31.4.
the issue shows up after we upgrade from sdk-v1 to sdk-v2.

RanVaknin · 2024-05-02T19:25:02Z

Hi @CurryFishBalls9527 , @mcblair , @csepulveda ,

Please provide a complete reproduction code so we can re-attempt to reproduce this behavior.

#1825 (comment)

Thanks,
Ran~

lucix-aws · 2024-05-16T15:49:11Z

To those affected by this issue, 2 things--

retrying these errors

You can make the underlying error net.ErrClosed retryable in your client with the following:

type retryErrClosed struct {
	aws.Retryer
}

func (r *retryErrClosed) IsErrorRetryable(err error) bool {
	return errors.Is(err, net.ErrClosed) || r.Retryer.IsErrorRetryable(err)
}
	
svc := s3.NewFromConfig(cfg, func(o *s3.Options) {
	o.Retryer = &retryErrClosed{o.Retryer}	
})

This should be safe to do for get-type operations such as dynamodb.Query.

root-causing

As of this writing we have been unable to reproduce this in-house whatsoever. I'm going to ask that anyone affected by this issue implement some basic net.Conn tracing to help grant more visibility into where the underlying Close() calls are coming from. The following DialContext implementation, when provided to a transport, will save the stack trace on a Close() call, and dump it to stdout when a Read() on the underlying conn catches this.

If you are affected by this issue and are in a position to do so, please implement this tracing logic in your clients and share any of the stack traces produced from it:

import (
	awshttp "github.com/aws/aws-sdk-go-v2/aws/transport/http"
)

type dialContext func(ctx context.Context, network, addr string) (net.Conn, error)

func traceConn(dc dialContext) dialContext {
	return func(ctx context.Context, network, addr string) (net.Conn, error) {
		conn, err := dc(ctx, network, addr)
		return &tracedConn{Conn: conn}, err
	}
}

type tracedConn struct {
	net.Conn
	closeTrace string
}

func (c *tracedConn) Read(p []byte) (int, error) {
	n, err := c.Conn.Read(p)
	if len(c.closeTrace) > 0 {
		fmt.Println()
		fmt.Println("!!! READ ON CLIENT-CLOSED CONN !!!")
		fmt.Println(c.closeTrace)
		fmt.Println("!!! -(end trace)-------------- !!!")
		fmt.Println()
	}
	return n, err
}

func (c *tracedConn) Close() error {
	c.closeTrace = string(debug.Stack())
	return c.Conn.Close()
}

func main() {
	// ...

	svc := s3.NewFromConfig(cfg, func(o *s3.Options) {
		client := o.HTTPClient.(*awshttp.BuildableClient)
		o.HTTPClient = client.WithTransportOptions(func(t *http.Transport) {
			t.DialContext = traceConn(t.DialContext)
		})
	})

	// ...
}

minhquang4334 · 2024-06-23T09:19:20Z

I'm also seeing this when using aws-sdk-go-v2 v1.27.2 and aws-sdk-go-v2/service/dynamodb v1.32.8

cenapguldh · 2024-06-27T15:37:50Z

We also see this errors (but in SQS ReceiveMessage) in all of our environments.

operation error SQS: ReceiveMessage, https response error StatusCode: 200, RequestID: xxxx-xxx-xxxxx--xxxx, deserialization failed, failed to decode response body, read tcp src->dst: use of closed network connection

github.com/aws/aws-sdk-go-v2/service/sqs v1.33.1
github.com/aws/aws-sdk-go-v2 v1.30.0
github.com/aws/smithy-go v1.20.2

RanVaknin · 2024-07-11T20:00:45Z

Hi @iwata @csepulveda @a-h @CurryFishBalls9527 @drshriveer @cenapguldh @minhquang4334 @satorunooshie @cwd-nial @mikkael131 @BananZG @mcblair @sugymt @SaikyoSaru @zapisanchez @jcarter3 @psmarcin @drshriveer and anyone else I might have missed here.

First off, thank you all for your continued patience and detailed contributions to this thread. We recognize that this issue has been persistent and challenging.

Despite our efforts, we still haven’t been able to reproduce this issue internally, which complicates our ability to pinpoint the exact cause and develop a fix. Both @lucix-aws and I have requested additional information to aid in this investigation, but it seems we need to make these requests more visible.

To ensure that all necessary details are clearly visible and to help guide new people running into this issue, we are going to lock this thread, and I'm also going to pin it. We ask that anyone still experiencing this problem to please open a new issue and include all the details outlined in @lucix-aws’s recent comment. This information is crucial for us to attempt a detailed reproduction of the problem.

Thank you once again for your efforts and cooperation.
Ran~

satorunooshie added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 31, 2022

satorunooshie mentioned this issue Aug 31, 2022

aws/retry: Allows use of closed network connection to be retryable #1826

Closed

RanVaknin self-assigned this Aug 31, 2022

RanVaknin added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 31, 2022

github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Sep 3, 2022

github-actions bot closed this as completed Sep 8, 2022

RanVaknin reopened this Dec 9, 2022

RanVaknin added needs-review This issue or pull request needs review from a core team member. p2 This is a standard priority issue and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. closed-for-staleness labels Dec 9, 2022

RanVaknin assigned isaiahvita and unassigned RanVaknin Dec 12, 2022

RanVaknin added p1 This is a high priority issue and removed p2 This is a standard priority issue labels Jan 4, 2023

lucix-aws added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Aug 24, 2023

github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Aug 29, 2023

github-actions bot closed this as completed Aug 29, 2023

drshriveer added a commit to drshriveer/smithy-go that referenced this issue Jan 30, 2024

Comment out buildRequest body close

d55378b

Testing out: aws/aws-sdk-go-v2#1825

RanVaknin reopened this Apr 5, 2024

RanVaknin added bug This issue is a bug. needs-reproduction This issue needs reproduction. p2 This is a standard priority issue and removed guidance Question that needs advice or information. labels Apr 5, 2024

lucix-aws removed the closed-for-staleness label Apr 10, 2024

RanVaknin mentioned this issue May 23, 2024

Random MultipartUpload -> RequestError -> "use of closed network connection" when uploading a lot of data to S3 aws/aws-sdk-go#3406

Closed

3 tasks

RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2024

aws locked as off-topic and limited conversation to collaborators Jul 11, 2024

RanVaknin pinned this issue Jul 11, 2024

lucix-aws unpinned this issue Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DynamoDB returns operation error `use of closed network` #1825

DynamoDB returns operation error `use of closed network` #1825

satorunooshie commented Aug 31, 2022

RanVaknin commented Aug 31, 2022

github-actions bot commented Sep 3, 2022

SaikyoSaru commented Sep 29, 2022

psmarcin commented Sep 29, 2022 •

edited

Loading

cwd-nial commented Dec 8, 2022

RanVaknin commented Dec 9, 2022

psmarcin commented Dec 15, 2022

cwd-nial commented Dec 15, 2022 •

edited

Loading

sugymt commented Jan 2, 2023

a-h commented Jan 5, 2023 •

edited

Loading

iwata commented Jan 19, 2023 •

edited

Loading

jcarter3 commented Oct 9, 2023

BananZG commented Feb 19, 2024 •

edited

Loading

drshriveer commented Feb 23, 2024

zapisanchez commented Mar 8, 2024

stan-stately commented Apr 1, 2024

RanVaknin commented Apr 5, 2024

CurryFishBalls9527 commented Apr 5, 2024

mcblair commented Apr 8, 2024

csepulveda commented Apr 23, 2024

RanVaknin commented May 2, 2024

lucix-aws commented May 16, 2024

minhquang4334 commented Jun 23, 2024 •

edited

Loading

cenapguldh commented Jun 27, 2024

RanVaknin commented Jul 11, 2024

DynamoDB returns operation error use of closed network #1825

DynamoDB returns operation error use of closed network #1825

Comments

satorunooshie commented Aug 31, 2022

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

AWS Go SDK V2 Module Versions Used

Compiler and Version used

Operating System and version

RanVaknin commented Aug 31, 2022

github-actions bot commented Sep 3, 2022

SaikyoSaru commented Sep 29, 2022

psmarcin commented Sep 29, 2022 • edited Loading

cwd-nial commented Dec 8, 2022

RanVaknin commented Dec 9, 2022

psmarcin commented Dec 15, 2022

cwd-nial commented Dec 15, 2022 • edited Loading

sugymt commented Jan 2, 2023

a-h commented Jan 5, 2023 • edited Loading

iwata commented Jan 19, 2023 • edited Loading

jcarter3 commented Oct 9, 2023

BananZG commented Feb 19, 2024 • edited Loading

drshriveer commented Feb 23, 2024

zapisanchez commented Mar 8, 2024

stan-stately commented Apr 1, 2024

RanVaknin commented Apr 5, 2024

CurryFishBalls9527 commented Apr 5, 2024

mcblair commented Apr 8, 2024

csepulveda commented Apr 23, 2024

RanVaknin commented May 2, 2024

lucix-aws commented May 16, 2024

retrying these errors

root-causing

minhquang4334 commented Jun 23, 2024 • edited Loading

cenapguldh commented Jun 27, 2024

RanVaknin commented Jul 11, 2024

DynamoDB returns operation error `use of closed network` #1825

DynamoDB returns operation error `use of closed network` #1825

psmarcin commented Sep 29, 2022 •

edited

Loading

cwd-nial commented Dec 15, 2022 •

edited

Loading

a-h commented Jan 5, 2023 •

edited

Loading

iwata commented Jan 19, 2023 •

edited

Loading

BananZG commented Feb 19, 2024 •

edited

Loading

minhquang4334 commented Jun 23, 2024 •

edited

Loading