do you have plan to optimize leaky_relu and tanh op for tflm? #527

nyadla-sys · 2022-08-17T00:09:39Z

We are using himax board to run our custom model that uses leaky_relu and tanh ops on arc processor and currently we are running on tflm with C reference code and it takes lot of cycles to run inference, so could you please accelerate these ops on TFLM.

nyadla-sys · 2022-08-29T23:06:21Z

We are in the process of enabling the below kernel for TFLM on HiMax WE1 board as one of our NN models uses the Tanh kernel heavily.
So we like to accelerate Tanh on ARC processors.
mli_status mli_krn_tanh_fx8(const mli_tensor * in, mli_tensor * out);
Do you plan to support this kernel for TFLM? If not please give us guidance to accelerate this kernel on TFLM for ARC EM9 processor(Himax WE1 board).

nyadla-sys · 2022-09-09T01:44:44Z

@mfarag13, @JaccovG, @Hakim7267 Please comment on this..
Actually I have ported mli_krn_leaky_relu_fx8(const mli_tensor * in, mli_tensor * slope_coeffs,mli_tensor * out) to tflm,however it is not outputting expected result.
if you are open to provide the feedback, I can share the patch which does leaky_relu acceleration on ARC processor on TFLM.

JaccovG · 2022-09-12T09:49:59Z

Hi,
Feel free to share the patch, I can review it.

nyadla-sys · 2022-09-13T00:26:22Z

@JaccovG
Please refer the attached patch along with changed leaky_relu files
https://drive.google.com/drive/folders/1lzRuglfxr4QXm_H2NRj3bwYyux_ZL42t?usp=sharing

nyadla-sys · 2022-09-13T00:40:49Z

Here is output of one test from tflm leaky_relu_test.cc

Entering to prepare of is_mli_applicable
params->alpha 1.0*2^-1

fixed 8: 0x40

tensor->data.int8:0x40

Exiting to prepare of is_mli_applicable
Inside LeakyReluEval params->alpha:1.0*2^-1
Converted to Q7 fixed point tensor->data.int8:0x40

Entering EvalMLI

params->alpha:1.0*2^-1
mli tensor of slope coeffs Q7 fixed point :0x40
res:0

Exiting EvalMLI

expected_data[i] (1.02^0) near output_data[i] (1.59999932^1) failed at examples/kernel_add_test/add_test.cc:103
expected_data[i] (1.49999992^1) near output_data[i] (1.14999942^3) failed at examples/kernel_add_test/add_test.cc:103
expected_data[i] (1.02^0) near output_data[i] (1.59999932^-3) failed at examples/kernel_add_test/add_test.cc:103
expected_data[i] (-1.02^-1) near output_data[i] (1.59999932^1) failed at examples/kernel_add_test/add_test.cc:103
expected_data[i] (-1.02^0) near output_data[i] (1.02^-127) failed at examples/kernel_add_test/add_test.cc:103
Testing QuantizedActivationsOpTestLeakyReluInt8_2

nyadla-sys · 2022-09-15T19:38:45Z

@JaccovG Could you please review patch and let us know your inputs

nyadla-sys · 2022-09-19T19:53:23Z

@JaccovG Gentle reminder.!

JaccovG · 2022-09-20T06:56:02Z

I'm not able to access the google drive. could you share it as a github commit? or as a PR?

nyadla-sys · 2022-09-20T17:53:31Z

@JaccovG
https://github.com/usefulsensors/for_synopsys_review/blob/main/0001-UsefulSensors-Leanky_relu-optimization-for-ARC.patch
Please review leaky_relu.cc,leaky_relu_common.cc and leaky_relu.h
https://github.com/usefulsensors/for_synopsys_review

nyadla-sys · 2022-09-27T02:01:57Z

@JaccovG Gentle reminder.!

nyadla-sys · 2022-10-04T23:50:18Z

@JaccovG Gentle reminder.!

JaccovG · 2022-10-05T06:43:55Z

sorry for my late reply, I was very busy.
I had a look at your code, and when you set the slope tensor, you force the exponent to 7.
https://github.com/usefulsensors/for_synopsys_review/blob/main/leaky_relu.cc#L91
I don't know the reason for setting it to 7, but maybe the problem is related to how the slope tensor is constructed.

nyadla-sys · 2022-10-05T15:17:45Z

@JaccovG I have created q7 format for slope tensor,if this is something wrong could you please suggest.
What is the correct implementation for this

JaccovG · 2022-10-05T15:24:30Z

I couldn't quickly find how you did the conversion. it is fine to use q7 format as long as you shift the mantissa to match the exponent of 7. So what you need to check is if the slope value is correctly converted to the fixedpoint value.

nyadla-sys · 2022-10-05T15:30:29Z

@JaccovG Pls refer here for conversion part
https://github.com/usefulsensors/for_synopsys_review/blob/main/0001-UsefulSensors-Leanky_relu-optimization-for-ARC.patch#L983

nyadla-sys · 2022-10-05T15:36:53Z

@JaccovG as well pls refer the below code to construct slope_tensor
https://github.com/usefulsensors/for_synopsys_review/blob/main/0001-UsefulSensors-Leanky_relu-optimization-for-ARC.patch#L998

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do you have plan to optimize leaky_relu and tanh op for tflm? #527

do you have plan to optimize leaky_relu and tanh op for tflm? #527

nyadla-sys commented Aug 17, 2022 •

edited

Loading

nyadla-sys commented Aug 29, 2022

nyadla-sys commented Sep 9, 2022 •

edited

Loading

JaccovG commented Sep 12, 2022

nyadla-sys commented Sep 13, 2022 •

edited

Loading

nyadla-sys commented Sep 13, 2022

nyadla-sys commented Sep 15, 2022

nyadla-sys commented Sep 19, 2022

JaccovG commented Sep 20, 2022

nyadla-sys commented Sep 20, 2022

nyadla-sys commented Sep 27, 2022

nyadla-sys commented Oct 4, 2022 •

edited

Loading

JaccovG commented Oct 5, 2022

nyadla-sys commented Oct 5, 2022

JaccovG commented Oct 5, 2022

nyadla-sys commented Oct 5, 2022

nyadla-sys commented Oct 5, 2022

do you have plan to optimize leaky_relu and tanh op for tflm? #527

do you have plan to optimize leaky_relu and tanh op for tflm? #527

Comments

nyadla-sys commented Aug 17, 2022 • edited Loading

nyadla-sys commented Aug 29, 2022

nyadla-sys commented Sep 9, 2022 • edited Loading

JaccovG commented Sep 12, 2022

nyadla-sys commented Sep 13, 2022 • edited Loading

nyadla-sys commented Sep 13, 2022

Here is output of one test from tflm leaky_relu_test.cc

nyadla-sys commented Sep 15, 2022

nyadla-sys commented Sep 19, 2022

JaccovG commented Sep 20, 2022

nyadla-sys commented Sep 20, 2022

nyadla-sys commented Sep 27, 2022

nyadla-sys commented Oct 4, 2022 • edited Loading

JaccovG commented Oct 5, 2022

nyadla-sys commented Oct 5, 2022

JaccovG commented Oct 5, 2022

nyadla-sys commented Oct 5, 2022

nyadla-sys commented Oct 5, 2022

nyadla-sys commented Aug 17, 2022 •

edited

Loading

nyadla-sys commented Sep 9, 2022 •

edited

Loading

nyadla-sys commented Sep 13, 2022 •

edited

Loading

nyadla-sys commented Oct 4, 2022 •

edited

Loading