You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's extremely hard (or at least very verbose) to reproduce the bit-exact rounding behaviour of ARMs CMSIS-NN library. This is because their integer C implementation mimics ARM instructions. Taking the arm_nn_requantize() function as an example. It in turn calls arm_nn_doubling_high_mult_no_sat() and arm_nn_divide_by_power_of_two(). These instructions translate, more or less, directly into ARM instructions, when using the ARM vector extension (Helium / MVE). However, I was unable to reproduce the behaviour using RISC-V vector instructions and the available RISC-V vector rounding modes. I am, in about 5% of the results of the test, off by one bit. This is, as far as I can judge, due to the different
Some similar issues were faced by TVM, see here, here, and here. It appears that they have not yet solved the issue.
This recent PR in CMSIS-NN as a response to this PR in TF has made the whole thing even more interesting. More links with similar content: ruy matrix multiplication library PR, TF issue on this.
I will need to dig into this rabbit hole some more. But the way the rounding is currently implemented in muRISCV-NN using the vector intrinsics is far from optimal. In terms of both readability/maintainability and performance!
However: In how far is it actually important that our kernels are bit-exact to CMSIS-NN kernels? According to this comment it appears that it is not that critical.
The text was updated successfully, but these errors were encountered:
It's extremely hard (or at least very verbose) to reproduce the bit-exact rounding behaviour of ARMs CMSIS-NN library. This is because their integer C implementation mimics ARM instructions. Taking the
arm_nn_requantize()
function as an example. It in turn callsarm_nn_doubling_high_mult_no_sat()
andarm_nn_divide_by_power_of_two()
. These instructions translate, more or less, directly into ARM instructions, when using the ARM vector extension (Helium / MVE). However, I was unable to reproduce the behaviour using RISC-V vector instructions and the available RISC-V vector rounding modes. I am, in about 5% of the results of the test, off by one bit. This is, as far as I can judge, due to the differentSome similar issues were faced by TVM, see here, here, and here. It appears that they have not yet solved the issue.
This recent PR in CMSIS-NN as a response to this PR in TF has made the whole thing even more interesting. More links with similar content: ruy matrix multiplication library PR, TF issue on this.
I will need to dig into this rabbit hole some more. But the way the rounding is currently implemented in muRISCV-NN using the vector intrinsics is far from optimal. In terms of both readability/maintainability and performance!
However: In how far is it actually important that our kernels are bit-exact to CMSIS-NN kernels? According to this comment it appears that it is not that critical.
The text was updated successfully, but these errors were encountered: