ARM CMSIS-NN / TFLite Rounding Mode #3

fpedd · 2022-07-07T06:40:23Z

It's extremely hard (or at least very verbose) to reproduce the bit-exact rounding behaviour of ARMs CMSIS-NN library. This is because their integer C implementation mimics ARM instructions. Taking the arm_nn_requantize() function as an example. It in turn calls arm_nn_doubling_high_mult_no_sat() and arm_nn_divide_by_power_of_two(). These instructions translate, more or less, directly into ARM instructions, when using the ARM vector extension (Helium / MVE). However, I was unable to reproduce the behaviour using RISC-V vector instructions and the available RISC-V vector rounding modes. I am, in about 5% of the results of the test, off by one bit. This is, as far as I can judge, due to the different

Some similar issues were faced by TVM, see here, here, and here. It appears that they have not yet solved the issue.

This recent PR in CMSIS-NN as a response to this PR in TF has made the whole thing even more interesting. More links with similar content: ruy matrix multiplication library PR, TF issue on this.

I will need to dig into this rabbit hole some more. But the way the rounding is currently implemented in muRISCV-NN using the vector intrinsics is far from optimal. In terms of both readability/maintainability and performance!

However: In how far is it actually important that our kernels are bit-exact to CMSIS-NN kernels? According to this comment it appears that it is not that critical.

The text was updated successfully, but these errors were encountered:

PhilippvK added question Further information is requested sync labels Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM CMSIS-NN / TFLite Rounding Mode #3

ARM CMSIS-NN / TFLite Rounding Mode #3

fpedd commented Jul 7, 2022

ARM CMSIS-NN / TFLite Rounding Mode #3

ARM CMSIS-NN / TFLite Rounding Mode #3

Comments

fpedd commented Jul 7, 2022