Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp_http_client fails due to base_poll_read() select() usage (IDFGH-13821) #14673

Open
3 tasks done
gonzzor opened this issue Oct 4, 2024 · 9 comments
Open
3 tasks done
Labels
Status: Reviewing Issue is being reviewed Type: Bug bugs in IDF

Comments

@gonzzor
Copy link

gonzzor commented Oct 4, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.3.1

Espressif SoC revision.

ESP32-S3

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

Custom board

Power Supply used.

External 3.3V

What is the expected behavior?

Reliably download files with esp_http_client() even when the last IP packets arrive quickly.

What is the actual behavior?

esp_http_client_read() return an error instead of the last data.

Steps to reproduce.

  1. Use a slow network uplink. I'm using PPP over an uart.
  2. Try to use esp_http_client_open() and esp_http_client_read() to download files.
  3. Sometimes, depending on the file size and buffer size to esp_http_client_read() the call fails.

Debug Logs.

D (15:15:01.618) HTTP_CLIENT: Begin connect to: http://10.0.0.1:41351
D (15:15:01.633) HTTP_CLIENT: Write header[3]: GET /foo.bin HTTP/1.1
User-Agent: ESP32 HTTP Client/1.0
Host: 10.0.0.1:41351
Content-Length: 0


D (15:15:01.735) HTTP_CLIENT: on_message_begin
D (15:15:01.736) HTTP_CLIENT: HEADER=Server:SimpleHTTP/0.6 Python/3.12.6
D (15:15:01.737) HTTP_CLIENT: HEADER=Date:Fri, 27 Sep 2024 15:15:01 GMT
D (15:15:01.737) HTTP_CLIENT: HEADER=Content-type:application/octet-stream
D (15:15:01.738) HTTP_CLIENT: HEADER=Content-Length:129792
D (15:15:01.739) HTTP_CLIENT: HEADER=Last-Modified:Fri, 27 Sep 2024 14:56:50 GMT
D (15:15:01.739) HTTP_CLIENT: http_on_headers_complete, status=200, offset=204, nread=204
D (15:15:01.740) HTTP_CLIENT: http_on_body 308
D (15:15:01.740) HTTP_CLIENT: Body received in fetch header state, 0x3fcebb9c, 308
D (15:15:01.741) HTTP_CLIENT: content_length = 129792
D (15:15:01.742) HTTP_CLIENT: Data processed 308 != Data specified in content length 129792
D (15:15:01.742) HTTP_CLIENT: Data processed 308 != Data specified in content length 129792
D (15:15:01.743) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:01.744) HTTP_CLIENT: need_read=76, byte_to_read=76, rlen=76, ridx=52
D (15:15:01.745) HTTP_CLIENT: http_on_body 76
D (15:15:01.894) HTTP_CLIENT: Data processed 384 != Data specified in content length 129792
D (15:15:01.895) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:01.896) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=128, ridx=0
D (15:15:01.897) HTTP_CLIENT: http_on_body 128
D (15:15:01.898) HTTP_CLIENT: Data processed 512 != Data specified in content length 129792
D (15:15:01.898) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:01.900) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=128, ridx=0
D (15:15:01.900) HTTP_CLIENT: http_on_body 128
D (15:15:01.904) HTTP_CLIENT: Data processed 640 != Data specified in content length 129792
D (15:15:01.905) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:01.906) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=128, ridx=0
D (15:15:01.907) HTTP_CLIENT: http_on_body 128

.... snip ...

D (15:15:20.333) HTTP_CLIENT: Data processed 124416 != Data specified in content length 129792
D (15:15:20.333) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:20.335) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=128, ridx=0
D (15:15:20.335) HTTP_CLIENT: http_on_body 128
D (15:15:20.340) HTTP_CLIENT: Data processed 124544 != Data specified in content length 129792
D (15:15:20.341) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:20.342) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=128, ridx=0
D (15:15:20.343) HTTP_CLIENT: http_on_body 128
D (15:15:20.344) HTTP_CLIENT: Data processed 124672 != Data specified in content length 129792
D (15:15:20.344) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
D (15:15:20.346) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=128, ridx=0
D (15:15:20.346) HTTP_CLIENT: http_on_body 128
D (15:15:20.446) HTTP_CLIENT: Data processed 124800 != Data specified in content length 129792
D (15:15:20.447) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=129792
E (15:15:20.449) transport_base: poll_read select error 128, errno = Socket is not connected, fd = 57
D (15:15:20.450) HTTP_CLIENT: need_read=128, byte_to_read=128, rlen=-2, ridx=0
E (15:15:20.451) HTTP_CLIENT: transport_read: error - 57347 | ERROR
D (15:15:20.451) HTTP_CLIENT: Data processed 124800 != Data specified in content length 129792
D (15:15:20.452) HTTP_CLIENT: Data processed 124800 != Data specified in content length 129792

More Information.

The select() call might return 2. That means two bits are set in total
in the sets used. As there is only one fd it means that the fd both has
data and an error.

This happens when the last data is received and the peer has notified
that it won't send any more data.

If this happens one should use the information in the readset first
before handling any exceptions.

This caused the http_client to fail at the last packet depending on the
timing.

Change this so that it only considers the errorset if it's only one bit
set in the returned sets.

--- a/components/tcp_transport/transport_ssl.c
+++ b/components/tcp_transport/transport_ssl.c
@@ -169,7 +169,7 @@ static int base_poll_read(esp_transport_handle_t t, int timeout_ms)
         return remain;
     }
     ret = select(ssl->sockfd + 1, &readset, NULL, &errset, esp_transport_utils_ms_to_timeval(timeout_ms, &timeout));
-    if (ret > 0 && FD_ISSET(ssl->sockfd, &errset)) {
+    if (ret == 1 && FD_ISSET(ssl->sockfd, &errset)) {
         int sock_errno = 0;
         uint32_t optlen = sizeof(sock_errno);
         getsockopt(ssl->sockfd, SOL_SOCKET, SO_ERROR, &sock_errno, &optlen);
@gonzzor gonzzor added the Type: Bug bugs in IDF label Oct 4, 2024
@github-actions github-actions bot changed the title esp_http_client fails due to base_poll_read() select() usage esp_http_client fails due to base_poll_read() select() usage (IDFGH-13821) Oct 4, 2024
@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 4, 2024
@AxelLin
Copy link
Contributor

AxelLin commented Oct 13, 2024

@david-cermak
Any chance to take a look? Thanks.

@euripedesrocha
Copy link
Collaborator

@AxelLin the fix is already under review. Should be on master soon.

@AxelLin
Copy link
Contributor

AxelLin commented Nov 5, 2024

@AxelLin the fix is already under review. Should be on master soon.

@euripedesrocha
So you confirm the bug is real and you have a fix for it?
Could you share the patch and test code?
(The fix is not yet in master, it's so slow)

@nileshkale123
Copy link
Collaborator

Hello @AxelLin ,

We've already submitted a merge request for this fix, and we're currently testing it across additional scenarios to ensure stability. It should be merged soon.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: Opened Issue is new labels Nov 12, 2024
@AxelLin
Copy link
Contributor

AxelLin commented Nov 14, 2024

@nileshkale123 @euripedesrocha

I'm testing on v5.2 branch.
I just realize this is a big issue because now my device cannot update
firmware via PPP due to this issue.
Can you fix this ASAP?

UPDATE:
The error I got seems different, and this issue only happens on PPP.
W (123704) HTTP_CLIENT: esp_transport_read returned:-1 and errno:128

@AxelLin
Copy link
Contributor

AxelLin commented Nov 25, 2024

Hello @AxelLin ,

We've already submitted a merge request for this fix, and we're currently testing it across additional scenarios to ensure stability. It should be merged soon.

Your reply gives people the expect that this will be fix soon.
However, having wait for 3 weeks, still cannot get the fix.

@nileshkale123
Copy link
Collaborator

Hello @AxelLin,

Could you please provide the exact testing scenario in detail to help us reproduce this issue? Specifically, I would like to know how you are starting the server and which example you are running. Previously, I raised an MR based on your explanation, as it seemed logically correct. However, I have not been able to reproduce the issue exactly as you described so far, so any additional details would be really helpful.

Additionally, could you try applying the following patch on the release/v5.2 branch and test it again? Please share any updates or progress on your results.

patch_14673.txt

@AxelLin
Copy link
Contributor

AxelLin commented Nov 26, 2024

@nileshkale123 This issue was reported by @gonzzor, not me.
@gonzzor Could you verify if the patch from #14673 (comment) works?

@gonzzor
Copy link
Author

gonzzor commented Nov 26, 2024

@AxelLin I'm using my suggested change in transport_ssl.c in production and it for sure improved things, haven't seen any more problems with it.

The second part mentioned in #14673 (comment) seem to have been merged already in #14724 so I don't see any problems with that change.

In addition, when testing this I'm using the python http.server module to simply serve files locally to my board. I didn't have problems all the time and it seem to be timing related as the size of file impacted things. I could see the issue when connected via uart and pppd running on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Reviewing Issue is being reviewed Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

5 participants