How to quantize Linear/LN/ReLU-like structures with int8. #4242

WeixiangXu · 2024-11-08T13:11:43Z

My TensorRT version is 8.6.10 on Orin.

My model is Linear/LN/ReLU-like structure as below:

I add Q/DQ nodes before MatMul node to do INT8 as below:

However, INT8 is slower than FP16.

I draw the INT8 engine figure as below.

What is the best practice for quantize Linear/LN/ReLU-like structures? which takes about 50% latency in my model.

lix19937 · 2024-11-09T15:15:52Z

You can export onnx with ops=17, which make ln as one node.
On the other hand, usually ln in int8 data-type will greatly affected the accuracy of the model.

lix19937 · 2024-11-09T15:35:39Z

Also you can refer to trt-llm or ft to impl custom layer.

WeixiangXu · 2024-11-12T05:17:48Z

You can export onnx with ops=17, which make ln as one node. On the other hand, usually ln in int8 data-type will greatly affected the accuracy of the model.

@lix19937 Thanks for your reply!

I upgrade opset to 17.

However, int8 with Q/DQ nodes is still slower than fp16. (int8: 7.5 ms v.s. fp16: 6 ms)

WeixiangXu · 2024-11-12T08:49:29Z

@ttyio @zerollzeng Could you please share any thoughts you might have?

lix19937 · 2024-11-12T11:58:57Z

However, int8 with Q/DQ nodes is still slower than fp16. (int8: 7.5 ms v.s. fp16: 6 ms)

You can try to test a onnx which only include transpose + matmul + ln + add + relu, then compare the latency.

poweiw added Performance General performance issues triaged Issue has been triaged by maintainers Embedded issues when using TensorRT on embedded platforms labels Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantize Linear/LN/ReLU-like structures with int8. #4242

How to quantize Linear/LN/ReLU-like structures with int8. #4242

WeixiangXu commented Nov 8, 2024

lix19937 commented Nov 9, 2024

lix19937 commented Nov 9, 2024

WeixiangXu commented Nov 12, 2024

WeixiangXu commented Nov 12, 2024

lix19937 commented Nov 12, 2024

How to quantize Linear/LN/ReLU-like structures with int8. #4242

How to quantize Linear/LN/ReLU-like structures with int8. #4242

Comments

WeixiangXu commented Nov 8, 2024

lix19937 commented Nov 9, 2024

lix19937 commented Nov 9, 2024

WeixiangXu commented Nov 12, 2024

WeixiangXu commented Nov 12, 2024

lix19937 commented Nov 12, 2024