how to choose which layers to quant for faster performace? #3763

luoshiyong · 2024-04-02T07:14:34Z

in the process of yolov8 int8 quant, i find that some layers(int8) is slower than fp16, and the reformat operation is very time-consuming, for best presion, we can do sensitive-layer analysise to get the proper layer to quant , but for best speed, how should i du to identify which layer to quant?
(some screenshot blow)

lix19937 · 2024-04-02T07:32:03Z

ref ptq svg of engine.

luoshiyong · 2024-04-03T02:47:07Z

ref ptq svg of engine.

By the time you see the graph above, I've already referenced svg. My problem is how to efficiently find which layers are faster when quantised, rather than taking my time to go through the graphs

lix19937 · 2024-04-04T11:17:49Z

ref https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#qdq-placement-recs @luoshiyong

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to choose which layers to quant for faster performace? #3763

how to choose which layers to quant for faster performace? #3763

luoshiyong commented Apr 2, 2024

lix19937 commented Apr 2, 2024

luoshiyong commented Apr 3, 2024

lix19937 commented Apr 4, 2024 •

edited

Loading

how to choose which layers to quant for faster performace? #3763

how to choose which layers to quant for faster performace? #3763

Comments

luoshiyong commented Apr 2, 2024

lix19937 commented Apr 2, 2024

luoshiyong commented Apr 3, 2024

lix19937 commented Apr 4, 2024 • edited Loading

lix19937 commented Apr 4, 2024 •

edited

Loading