Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to choose which layers to quant for faster performace? #3763

Open
luoshiyong opened this issue Apr 2, 2024 · 3 comments
Open

how to choose which layers to quant for faster performace? #3763

luoshiyong opened this issue Apr 2, 2024 · 3 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@luoshiyong
Copy link

in the process of yolov8 int8 quant, i find that some layers(int8) is slower than fp16, and the reformat operation is very time-consuming, for best presion, we can do sensitive-layer analysise to get the proper layer to quant , but for best speed, how should i du to identify which layer to quant?
(some screenshot blow)
image

@lix19937
Copy link

lix19937 commented Apr 2, 2024

ref ptq svg of engine.

@luoshiyong
Copy link
Author

ref ptq svg of engine.

By the time you see the graph above, I've already referenced svg. My problem is how to efficiently find which layers are faster when quantised, rather than taking my time to go through the graphs

@lix19937
Copy link

lix19937 commented Apr 4, 2024

ref https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#qdq-placement-recs @luoshiyong

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants