Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Related Crash on Long Running MQTT connections #263

Open
Spinnaker-design opened this issue Feb 14, 2024 · 7 comments
Open

Network Related Crash on Long Running MQTT connections #263

Spinnaker-design opened this issue Feb 14, 2024 · 7 comments
Labels
topic: code Related to content of the project itself type: imperfection Perceived defect in any part of project

Comments

@Spinnaker-design
Copy link

I am seeing a crash on the Portenta C33 when using an MQTT client for a long duration (~15 minutes). The crash occurs within the delay call and occurs within the lwip_task of CNetIF.cpp. It certainly looks like we are seeing a memory management issue with the networking code.

We are using an SSL Client and certificates for our server authentication.

@per1234 per1234 added type: imperfection Perceived defect in any part of project topic: code Related to content of the project itself labels Feb 14, 2024
@Spinnaker-design
Copy link
Author

Spinnaker-design commented Feb 14, 2024

Here is the call stack for the crash:

_free_r@0x00060a0a (/_free_r.dbgasm:51)
__gnu_cxx::new_allocator<CMsg>::deallocate@0x0005ae5e (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/ext/new_allocator.h:125)
std::allocator_traits<std::allocator<CMsg> >::deallocate@0x0005ae5e (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/bits/alloc_traits.h:462)
std::_Deque_base<CMsg, std::allocator<CMsg> >::_M_deallocate_node@0x0005ae5e (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/bits/stl_deque.h:609)
std::_Deque_base<CMsg, std::allocator<CMsg> >::_M_destroy_nodes@0x0005ae5e (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/bits/stl_deque.h:743)
std::_Deque_base<CMsg, std::allocator<CMsg> >::~_Deque_base@0x0005ae74 (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/bits/stl_deque.h:665)
std::deque<CMsg, std::allocator<CMsg> >::~deque@0x0005b1c4 (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/bits/stl_deque.h:1045)
std::queue<CMsg, std::deque<CMsg, std::allocator<CMsg> > >::~queue@0x0005b1c4 (/Users/kylevisner/.platformio/packages/[email protected]/arm-none-eabi/include/c++/7.2.1/bits/stl_queue.h:96)
CEspCom::clearToEspQueue@0x0005b1c4 (/CEspCom::clearToEspQueue.dbgasm:109)
esp_host_there_are_data_to_be_tx@0x0005a6e4 (/esp_host_there_are_data_to_be_tx.dbgasm:12)
esp_host_spi_transaction@0x0005a6f8 (/esp_host_spi_transaction.dbgasm:5)
esp_host_perform_spi_communication@0x0005a73e (/esp_host_perform_spi_communication.dbgasm:7)
CEspControl::communicateWithEsp@0x00058ed8 (/CEspControl::communicateWithEsp.dbgasm:10)
CLwipIf::lwip_task@0x0004c0a8 (/CLwipIf::lwip_task.dbgasm:30)
CLwipIf::timer_cb@0x0004c10a (/CLwipIf::timer_cb.dbgasm:4)
r_gpt_call_callback@0x0002e174 (Unknown Source:1719)
<signal handler called>@0xffffffe9 (Unknown Source:0)
bsp_prv_software_delay_loop@0x0002f864 (/bsp_prv_software_delay_loop.dbgasm:1)
delay@0x00023c0a (/delay.dbgasm:4)
SSLClient::read@0x0001f628 (/SSLClient::read.dbgasm:8)
SSLClient::connected@0x0001f5b8 (/SSLClient::connected.dbgasm:10)

@andreagilardoni
Copy link
Contributor

Thanks for your report, I got the same error while working on #234. In that PR I am trying to deal with all the network related issues, for the time being Ethernet and WiFi. I will try to address this issue with that PR.

@Spinnaker-design
Copy link
Author

Thanks, @andreagilardoni, Is there a workaround in the mean time to unblock us until that PR is done?

@andreagilardoni
Copy link
Contributor

You can try using my PR and disable the timer inside the network stack.

  • taking as reference the example here
  • You need to comment this line
  • You need to call CLwipIf::getInstance().task() inside the loop() function
  • Design your application to avoid blocking calls as much as possible

Any kind of feedback on this work is appreciated.

@Spinnaker-design
Copy link
Author

@andreagilardoni was able to build with you PR, 2 items

  • if you comment out line 30 of CNetIf.h, you'll get a build error.
  • if you attempt to build it with CLwipIf::getInstance().task(), you'll get the following error:

Compilation error: 'class CLwipIf' has no member named 'task'

@zsnave
Copy link

zsnave commented Jun 19, 2024

Well, after many weeks of wireless networking problems on the C33 platform, it looks like there are no fixes anytime soon. On our system we even "disable" networking after power-on (and brief use to access NTP), but the networking still causes a system hang after many hours of running (rare but fatal). It appears that there is something the class destructors are not doing correctly, since fragments of "WiFi" functionality are left operating after disconnection/shutdown. I think the advertisements for the Arduino C33 should NOT list networking, since it doesn't work correctly as yet.

@jeremypy972
Copy link

Hello
Have you find a way to fix this issue which is very annoying ?

Jérémy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: code Related to content of the project itself type: imperfection Perceived defect in any part of project
Projects
None yet
Development

No branches or pull requests

5 participants