Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU2 crash report during SHCI_C2_BLE_Init(...) #99

Open
tim-nordell-nimbelink opened this issue Aug 23, 2024 · 3 comments
Open

CPU2 crash report during SHCI_C2_BLE_Init(...) #99

tim-nordell-nimbelink opened this issue Aug 23, 2024 · 3 comments
Assignees
Labels
ble Bluetooth Low Energy bug Something isn't working internal bug tracker Issue confirmed and logged into the internal bug tracking system projects Projects-related (demos, applications, examples) issue or pull-request. st community Also reported by users on the community.st.com
Milestone

Comments

@tim-nordell-nimbelink
Copy link

tim-nordell-nimbelink commented Aug 23, 2024

This is a cross post from the community forum, but I haven't gotten a response there to my bug report yet and since this is crashing within the CPU2 side of things an actual bug report here makes sense. It's incredibly difficult to debug the CPU2 side since the code is delivered as an encrypted blob.

Describe the set-up

  • Nucleo-WB55 or our custom board utilizing a STM32WB55
  • gcc-arm-none-eabi-10-2020-q4-major

Describe the bug
Upon invoking SHCI_C2_BLE_Init(...) CPU2 enters a hard fault within the BLE HCI stacks

How To Reproduce
I'm not quite sure what it is from our codebase that causes this yet. I could maybe provide a reduced pre-compiled binary to ST, but I cannot provide the full source code from our proprietary project. It's 100% reproducible with our code running on CPU1, and from what I can tell, the HSEM/IPCC/RCC peripherals are all in the same state as the example projects so I'm currently at a loss as to why this occurs. I've also copied all of the SHCI_C2_BLE_Init(...) parameters from some of the newer examples to no avail and validated with gdb that I had the exact same buffer contents being sent to CPU2 in the shared memory through the mailbox mechanism as the transparent mode example codebase.

Within our codebase, I can run v1.11.x through v1.15.0 of the BLE stack and successfully scan for BLE packets. v1.16.x through v1.19.x report a "security attack" upon SHCI_C2_BLE_Init(...) invocation, and v1.20.x has a hard fault. These variations in BLE stack behavior are all without changing the CPU1 firmware.

Here are the hard fault codes of the various v1.20.x HCI stacks as soon as I invoke SHCI_C2_BLE_Init(...) in our codebase:

v1.20.0 of stm32wb5x_BLE_HCILayer_fw.bin has a hard fault:

0x20030000 <TL_RefTable>:       0x1170fd0f      0x00003284      0x00002a33      0x2003f198

v1.20.0 of stm32wb5x_BLE_HCI_AdvScan_fw.bin has a hard fault:

0x20030000 <TL_RefTable>:       0x1170fd0f      0x00003160      0x00001f6f      0x2003ef50

v1.20.0 of stm32wb5x_BLE_HCILayer_extended_fw.bin has a hard fault:

0x20030000 <TL_RefTable>:       0x1170fd0f      0x00003390      0x00002b3f      0x2003f6f8

Please let me know if the PC, SP, and LR inside CPU2 works is sufficient to get an initial fault analysis, or if I need to prepare a minimal pre-compiled binary for CPU1 exhibiting this. I'm still attempting to narrow down what's different between the examples and our codebase - we had integrated the BLE portions of the v1.11.x STM32WBCube codebase quite a while ago but are trying to migrate to the newer version of the BLE stack so we can address the errata around necessitating calling the relatively new SHCI_C2_SetSystemClock(...) command.

@RJMSTM
Copy link
Contributor

RJMSTM commented Aug 27, 2024

ST Internal Reference: 189569

@RJMSTM RJMSTM added bug Something isn't working internal bug tracker Issue confirmed and logged into the internal bug tracking system projects Projects-related (demos, applications, examples) issue or pull-request. ble Bluetooth Low Energy labels Aug 27, 2024
@tim-nordell-nimbelink
Copy link
Author

tim-nordell-nimbelink commented Aug 28, 2024

I spent some time yesterday/today and figured out how to dump out the code for CPU2 for these encrypted firmware binaries so that I could properly debug this.

This is what I found:

  • v1.16.0:

    • This firmware started to always validate p_ble_table->phci_acl_data_buffer as pointing to non-secure SRAM. Previously, this validation was dependent on if SHCI_C2_Ble_Init_Cmd_Param_t's Options byte had bit 0 set to 1. I had this bit set to 0 in our code, so the value of p_ble_table->phci_acl_data_buffer was ignored prior to v1.16.0.

      This is sort of mentioned in the release notes of v1.16.0 as:

      ID 136949 : For ACL_DATA activation, the BLE options flag has to be configured with SHCI_C2_BLE_INIT_OPTIONS_LL_ONLY with Full and Full extended stack binaries and no special BLE options flag required in “HCI_ONLY” (ie Light, HCI layer, ext HCI layer binaries)

      but the way it's read makes you think it's required only if you want to use ACL_DATA, and it doesn't mention that this is required with the BLE_HCI_AdvScan_fw now too.

    • (The validation in v1.15.0 of stm32wb5x_BLE_HCI_AdvScan_fw.bin was skipped at instruction offset 0xca8 and jumped to 0xcb8, which skipped the call to the validation function for this parameter.)

  • v1.20.0:

    • This firmware invokes a software breakpoint when it has a security fault just after the security attack key is set into SRAM2A_BASE. The software breakpoint in turn caused a hard fault, which ultimately resulted in SRAM2A_BASE being set twice in a row (once with a security fault, and then once with a hard fault).

I see 2 bugs as a result:

  • Hard fault instead of security error inside the CPU2 firmware in v1.20.0

  • The STM32CubeWB's hci_init(...) call does not initialize phci_acl_data_buffer, and instead, whatever value that was on the stack at the time of invocation ends up in this pointer. This leads to semi-unpredictable behavior in the initialization routine CPU2 side given the change in v1.16.0 since it's validating variables that inherit whatever was on the stack at the time:

    void hci_register_io_bus(tHciIO* fops)
    {
      /* Register IO bus services */
      fops->Init    = TL_BLE_Init;
      fops->Send    = TL_BLE_SendCmd;
    
      return;
    }
    
    void hci_init(void(* UserEvtRx)(void* pData), void* pConf)
    {
      StatusNotCallBackFunction = ((HCI_TL_HciInitConf_t *)pConf)->StatusNotCallBack;
      hciContext.UserEvtRx = UserEvtRx;
    
      hci_register_io_bus (&hciContext.io);
    
      TlInit((TL_CmdPacket_t *)(((HCI_TL_HciInitConf_t *)pConf)->p_cmdbuffer));
    
      return;
    }
    
    typedef struct
    {
      void (* IoBusEvtCallBack) ( TL_EvtPacket_t *phcievt );
      void (* IoBusAclDataTxAck) ( void );
      uint8_t *p_cmdbuffer;
      uint8_t *p_AclDataBuffer;
    } TL_BLE_InitConf_t;
    
    static void TlInit( TL_CmdPacket_t * p_cmdbuffer )
    {
      TL_BLE_InitConf_t Conf;
      ...
    
      /* Initialize low level driver */
      if (hciContext.io.Init)
      {
    
        Conf.p_cmdbuffer = (uint8_t *)p_cmdbuffer;
        /* Several values in Conf are left uninitialized, including phci_acl_data_buffer */
        Conf.IoBusEvtCallBack = TlEvtReceived;
        hciContext.io.Init(&Conf);
      }
    
      return;
    }
    

    I'd suggest at least initializing these fields to 0. This could simply by done by changing TL_BLE_InitConf_t Conf -> TL_BLE_InitConf_t Conf = {}.

    The wireless application manual states this of TL_BLE_Init(...), indicating that the current behavior of hci_init(...) using the stack values to fill in the remaining values is not adhering to the spec given:

    When not in HCI only mode, both p_AclDataBuffer and IoBusAclDataTxAck are not used and must be set to 0.

So follow up questions beyond the 3 bugs noted above:

  • Is it expected that the validation changed for phci_acl_data_buffer?
  • Is it expected to be able to utilize the hci_* APIs from ble_hci_le.h to interface with the HCI firmware stack variants, especially the beacon/scan only variant? I don't need passthrough, and I don't need BLE ACL support since I'm only doing BLE advertising packet scans, so I'd have to allocate a completely unused buffer for doing this with v1.16.x and up given what I discovered here.
  • Does the stm32wb5x_BLE_HCI_AdvScan_fw.bin variant even support HCI ACL packets, especially since HCI_LE_CREATE_CONNECTION isn't supported in this stack? Given this, I wouldn't expect this variant of the stack to require phci_aci_data_buffer as being allocated.

@ALABSTM ALABSTM added the st community Also reported by users on the community.st.com label Sep 6, 2024
@tim-nordell-nimbelink
Copy link
Author

I got some follow-up answers within the community forum that:

  1. The stm32wb5x_BLE_HCI_AdvScan_fw.bin variant does not support HCI ACL packets, but requires the ACL buffer allocation as the firmware checks "is it a HCI variant?".
  2. The validation change for phci_cal_data_buffer is expected.

@RJMSTM RJMSTM added this to the v1.21.0 milestone Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ble Bluetooth Low Energy bug Something isn't working internal bug tracker Issue confirmed and logged into the internal bug tracking system projects Projects-related (demos, applications, examples) issue or pull-request. st community Also reported by users on the community.st.com
Projects
Development

No branches or pull requests

3 participants