How to quantize Linear/LN/ReLU-like structures with int8. #4242
Labels
Embedded
issues when using TensorRT on embedded platforms
Performance
General performance issues
triaged
Issue has been triaged by maintainers
My TensorRT version is 8.6.10 on Orin.
My model is Linear/LN/ReLU-like structure as below:
I add Q/DQ nodes before MatMul node to do INT8 as below:
However, INT8 is slower than FP16.
I draw the INT8 engine figure as below.
What is the best practice for quantize Linear/LN/ReLU-like structures? which takes about 50% latency in my model.
The text was updated successfully, but these errors were encountered: