Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The connection to MQTT Backend is not working when trying to use mutual TLS connection (IDFGH-9828) #272

Closed
3 tasks done
franz-ms-muc opened this issue Apr 8, 2023 · 30 comments

Comments

@franz-ms-muc
Copy link
Contributor

Answers checklist.

  • I have read the documentation for esp-protocols components and the issue is not addressed there.
  • I have updated my esp-protocols branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

What component are you using? If you choose Other, provide details in More Information.

esp_modem

component version

1.0.0

IDF version.

ESP-IDF Pre-release v5.0-rc1

More Information.

maybe a similar issue to this one:
#250

Tested on a SIMCOM A7672

Code: https://gist.github.com/franz-ms-muc/158ed5a1208c053403634eb5e3250a59

Log1: https://gist.github.com/franz-ms-muc/1e3699c361b82f92b399b268751f2ceb
Log2: https://gist.github.com/franz-ms-muc/3ddabeb29698d23eea524adb21b951c2
Log3: https://gist.github.com/franz-ms-muc/ed5179199a52c53d6361ff902b5c8070

@github-actions github-actions bot changed the title The connection to MQTT Backend is not working when trying to use mutual TLS connection The connection to MQTT Backend is not working when trying to use mutual TLS connection (IDFGH-9828) Apr 8, 2023
@franz-ms-muc
Copy link
Contributor Author

2 Topics

1 similar comment
@franz-ms-muc
Copy link
Contributor Author

2 Topics

@franz-ms-muc
Copy link
Contributor Author

  1. Topic: why can we have a UART Queue full when we use the 4-wire Mode ? i was thinking RTS/CTS is exactly for this.

@franz-ms-muc
Copy link
Contributor Author

2. Topic: why does it crash ? i mean a full Quque can bring up a Error, Exception what else, but not a Crash ?

@david-cermak
Copy link
Collaborator

I was able to reproduce the UART queue full event (connecting to test.mosquitto.org with 2048 bit RSA keys), but not the crash. Also, it completed the handshake and continued with publishing eventually.
Are you seeing the crash with just the simple_cmux_example (of course including the client and server side verification) or running it as a part of a bigger project?
Which ESP32 chip are you using?

Topic: why can we have a UART Queue full when we use the 4-wire Mode ? i was thinking RTS/CTS is exactly for this.

Flow control signals (RTS/CTS) work together with the UART peripheral HW. But the UART driver needs to pass the incoming data to user task (via FreeRTOS queues) and that seemed to overflow. And the reason for that is IMO the extensive logging on verbose level, e.g. for every byte that we send to one UART (and we send a lot during TLS handshake with mutual authentication) we print ~ 3 characters to another UART.
This error goes away if I switched to DEBUG level or adjusting the queue size to

    dte_config.uart_config.event_queue_size = 128;
  1. Topic: why does it crash ? i mean a full Quque can bring up a Error, Exception what else, but not a Crash ?

This looks like a bug, somewhere, but wasn't able to reproduce it.

@MartinPatarinski
Copy link

MartinPatarinski commented May 2, 2023

When TLS is disabled and the embedded MQTT client uses just authentication with username and password, then the problem is not observed.

@david-cermak
Copy link
Collaborator

When TLS is disabled and the embedded MQTT client uses just authentication with username and password, then the problem is not observed.

This is clear as the TLS connection requires far more bandwidth. It is also expected to see some SW event queue overflows as explained above.
What is not expected, though, is the crash that you've experienced and reported. To help you with that I would need more details and some steps to reproduce it. Without this I am afraid I could not provide much help, and assume that the issue might have been cause by an application/project code rather than this library.

@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented May 19, 2023

@david-cermak
Copy link
Collaborator

Hi @diplfranzhoepfinger

How often do you see the failure? I ran your sample application (4a6e55d51a097c44c346160a00e6d3ddb47a14bf with IDF v5.0.2), ran it several times, tried to decrease stack sizes, played a bit, but no crashes seen (the only thing I changed was Tx/Rx and my APN, tested with SIM7600)

  • Could you please check if you have any local modifications, or forgot to check some files in?
  • Also, could you please remove the urc handler before switching modes? (I'm actually surprised it works, since the urc sample application was meant to work only in command mode -- but once you get to CMUX you can add the handler again as it's again a kind of command mode...)

@franz-ms-muc
Copy link
Contributor Author

Hi @diplfranzhoepfinger

How often do you see the failure?

immediately and every time. 

 I ran your sample application (4a6e55d51a097c44c346160a00e6d3ddb47a14bf with IDF v5.0.2), ran it several times, tried to decrease stack sizes, played a bit, but no crashes seen (the only thing I changed was Tx/Rx and my APN, tested with SIM7600)

OK. i could give it a Test too with 7600 ... 

worth it ? 

maybe we have some speciality in the A7276

  • Could you please check if you have any local modifications, or forgot to check some files in?

no, all checked in. 100% sure. 

did you test with A7276 or a different Modem ? 

  • Also, could you please remove the urc handler before switching modes? (I'm actually surprised it works, since the urc sample application was meant to work only in command mode -- but once you get to CMUX you can add the handler again as it's again a kind of command mode...)

i can. 

could it then also work with the 2nd Modem Model ?

@david-cermak
Copy link
Collaborator

@diplfranzhoepfinger Tested with A7672 and was able to reproduce the issue.

The problem is an internal buffer overflow, which is not currently checked for in esp_modem. The reason why it's not reproducible with other devices is that A7672 supports 2 bytes sized payload (which not many devices do, and that's the reason for me not checking the sizes -- the max CMUX payload would make 128 bytes).
I'll provide a fix.

Thank you for taking the time and effort to create this simple example!

You can disable CONFIG_ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD as a workaround. (the issue is only present if the modem tries to reconstruct the original payload)

@franz-ms-muc
Copy link
Contributor Author

@diplfranzhoepfinger Tested with A7672 and was able to reproduce the issue.

cooooooooool

The problem is an internal buffer overflow, which is not currently checked for in esp_modem. The reason why it's not reproducible with other devices is that A7672 supports 2 bytes sized payload (which not many devices do, and that's the reason for me not checking the sizes -- the max CMUX payload would make 128 bytes).

ahhhhhhhhhhhhh.

 I'll provide a fix.

very good. 

Thank you for taking the time and effort to create this simple example!

yes, i was also thinking this is the only way forward. 

You can disable CONFIG_ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD as a workaround. (the issue is only present if the modem tries to reconstruct the original payload)

what negative side effects does this bring ?

diplfranzhoepfinger pushed a commit to diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client that referenced this issue May 29, 2023
@franz-ms-muc
Copy link
Contributor Author

@franz-ms-muc
Copy link
Contributor Author

@diplfranzhoepfinger
Copy link
Contributor

> Hi @diplfranzhoepfinger
> * Also, could you please remove the urc handler before switching modes? (I'm actually surprised it works, since the urc sample application was meant to work only in command mode -- but once you get to CMUX you can add the handler again as it's again a kind of command mode...)

done with Commit

diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client@b018980

lets see if this helps

@diplfranzhoepfinger
Copy link
Contributor

#180 <-- Discuss URC there.

@diplfranzhoepfinger
Copy link
Contributor

Hi,
CONFIG_ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD

seems to fragment Answers from the Modem in the CMUX Terminal which is in AT Mode,
especially when we use GNSS, where the Answers are long.

we will provide you a log about this.

@david-cermak
Copy link
Collaborator

david-cermak commented Jun 1, 2023

Hi Franz,

I'm sorry, haven't thought of that. This is quite possible if the command terminal outputs longer payload.
Could you please apply this quick fix:

diff --git a/components/esp_modem/src/esp_modem_cmux.cpp b/components/esp_modem/src/esp_modem_cmux.cpp
index 0c480f8..283d486 100644
--- a/components/esp_modem/src/esp_modem_cmux.cpp
+++ b/components/esp_modem/src/esp_modem_cmux.cpp
@@ -120,13 +120,17 @@ void CMux::data_available(uint8_t *data, size_t len)
         if (virtual_term < MAX_TERMINALS_NUM && read_cb[virtual_term]) {
             // Post partial data (or defragment to post on CMUX footer)
 #ifdef DEFRAGMENT_CMUX_PAYLOAD
-            if (payload_start == nullptr) {
-                payload_start = data;
-                total_payload_size = 0;
+            if (virtual_term == 0) {   // no need to defragment term network data (term=0), PPP takes care of that
+                read_cb[virtual_term](data, len);
+            } else {
+                if (payload_start == nullptr) {
+                    payload_start = data;
+                    total_payload_size = 0;
+                }
+                total_payload_size += len;
             }
-            total_payload_size += len;
 #else
-            read_cb[virtual_term](data, len);
+
 #endif
         }
     } else if (data == nullptr && type == 0x73 && len == 0) { // notify the initial SABM command
@@ -136,7 +140,9 @@ void CMux::data_available(uint8_t *data, size_t len)
         int virtual_term = dlci - 1;
         if (virtual_term < MAX_TERMINALS_NUM && read_cb[virtual_term]) {
 #ifdef DEFRAGMENT_CMUX_PAYLOAD
-            read_cb[virtual_term](payload_start, total_payload_size);
+            if (payload_start) {
+                read_cb[virtual_term](payload_start, total_payload_size);
+            }
 #endif
         }
     } else if ((type & FT_UIH) == FT_UIH && dlci == 0) { // notify the internal DISC command

and test again? With CONFIG_ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD=y

This change makes the modem to defragment only command data, as the network data are in PPP format which already uses start/end markers.

Alternatively we can extend buffer sizes and defragment everything, but that would probably mean an unnecessary waste of RAM, especially if your device could post few kilobytes each fragment of CMUX payload.

diplfranzhoepfinger pushed a commit to diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client that referenced this issue Jun 3, 2023
@diplfranzhoepfinger
Copy link
Contributor

@david-cermak
Copy link
Collaborator

This is looks like a memory corruption and is probably caused by the proposed patch in #272 (comment) (there might still be issue when defragmenting two byte payloads on static buffer which is far less than the max payload) you can increase the buffer size (dte_config.dte_buffer_size) to help with the problem.
To resolve it correctly however, we'd have to allocate (or drop) the buffered data. I'm working on a minor refactor in #312 which should allow users to configure what to do in this case (either drop the packet, or inflate the current buffer and continue)

david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Jun 27, 2023
As a configurable option, if disabled we report an error.

Closes espressif#272
@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented Jul 18, 2023

dte_buffer_size = 512

is the Standard now ...

@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented Jul 18, 2023

in #272 (comment)

refactor in #312 which

does it mean we should combine #272 (comment) and #312 or should we take back #272 (comment) ???

@david-cermak
Copy link
Collaborator

david-cermak commented Jul 19, 2023

Please, use just the fixes in #312 -- this is the official fix of addressing issues with CMUX protocols with 2 byte payloads.
And, use this configuration, pls:

  • ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD=n
  • ESP_MODEM_USE_INFLATABLE_BUFFER_IF_NEEDED=y

That would basically disable defragmentation on CMUX layer (which won't work reliably with your device), but enable defragmenting of AT replies on DTE layer)

(#272 (comment) was just a quick patch, which might fix some problems, but may introduce others)

@diplfranzhoepfinger
Copy link
Contributor

Tnx.

@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented Jul 21, 2023

@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented Jul 21, 2023

Crazy:
it get a IP if Verbose logging:
https://gist.github.com/diplfranzhoepfinger/6c1a2d05ff8de2474b79a3c3bfce324a

somtimes ...
so most times not get a IP ...
hmm ...

diplfranzhoepfinger pushed a commit to diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client that referenced this issue Jul 22, 2023
@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented Jul 22, 2023

so, i compare now 2 Stages:

I.

https://github.com/diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client/tree/Test_again_old

and

II.

https://github.com/diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client/tree/feature/NewCommands

at I. i get never the cmux_example: Cannot get IP within specified timeout... exiting

at II. i get the cmux_example: Cannot get IP within specified timeout... exiting in about 3 of 4 Trys.

so something must be not OK with #312 .

@david-cermak any idea ?

@diplfranzhoepfinger
Copy link
Contributor

did a cross test after testing I. with II. and immediately same Problem, after 5 Times no Problem with I. 

https://gist.github.com/diplfranzhoepfinger/c7124d0536702eea5d6ac3090b0ff7b8

@diplfranzhoepfinger
Copy link
Contributor

diplfranzhoepfinger commented Jul 22, 2023

we have another Project:

there we had #312 integrated, 

and courious: there we get a IP always, but after getting IP the AT Commands which reply more than a OK are no more forwarded. 

see Log: https://gist.github.com/diplfranzhoepfinger/79907f9f5254b58872dd564c8e2889e7

the line https://github.com/diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client/blob/2b7a1db59ad0dcc087a07d300ad5835fd7d5f5df/simple_cmux_client/components/esp_modem/src/esp_modem_command_library.cpp#L67 is not reached

here i have committed the Loggers diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client@2b7a1db

but the Project is a different one, which cannot be shared. 

so we have:

Project II. like mentioned above: seldom get IP

Project "secret" which always get IP but no AT Communication after got IP. 

Thanks

Franz

diplfranzhoepfinger pushed a commit to diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client that referenced this issue Jul 22, 2023
@diplfranzhoepfinger
Copy link
Contributor

gave you another Log:

https://gist.github.com/diplfranzhoepfinger/b7b9b0366ec2897edf24edddec614185

with more Logging:
https://github.com/diplfranzhoepfinger/esp-protocols-tree-master-components-esp_modem-examples-simple_cmux_client/blob/ca75d1fd9966567476d5a5e2867963a1798dba49/simple_cmux_client/components/esp_modem/src/esp_modem_command_library.cpp#L62

we see that the Log-Output

I (00:00:14.620) command_lib: Received **raw** Response: {AT+CGMR
+CGMR: A011B07A7672M7_F

OK
}


does come in the Beginning,
but after got IP we think that the Line is not called anymore

Thanks,
Franz

david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Aug 17, 2023
As a configurable option, if disabled we report an error.

Closes espressif#272
david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Aug 17, 2023
As a configurable option, if disabled we report an error.

Closes espressif#272
david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Aug 28, 2023
As a configurable option, if disabled we report an error.

Closes espressif#272
david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Sep 25, 2023
1.0.2
Features
- Add factory method for simple creation of custom DCEs (4cf9e50)
- Support custom transport in AT TCP client example (ae629ed)
Bug Fixes
- Fix host test race with async read and d'structor (c89d42f)
- More error handling in cmux protocol (8edbac6)
- Fix netif to set PPP config in C++ way (5287432)
- Fix vfs terminal not to reset read_cb() automatically (25a35e2)
- Fix netif data race causing PPP startup delays (c8c0507, espressif#308)
- Added support for inflatable buffer (cb6e03a, espressif#272)
- Fix LoadProhibited after failed CMUX initialization (IDFGH-10845) (60c87dd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants