-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get closer accuracy with FP16 in demo/Diffusion 8-bits PTQ? #3723
Comments
@ttyio ^ ^ |
@jingyu-ml |
I don't have the release date, but I would say its in the near future and the team is working on it. |
TRT 10 EA has been released. |
Excellent, thanks for your work! |
@TheBge12138 please check the GA release instead of the EA release |
thanks! |
Hello, notice that tensorrt 9.3 had updated the 8-bit quantization to accelerate diffusion models, it's excellent !
But when running the demo, i can't get the same result with fp16, I have modified the get_pipeline in model.py to my own model, and used over 1000 prompts for calibration, there is a big gap under some prmopts, and some prmopts are close but still not completely similar.
my sdxl model pipeline has 30steps, I compared the cosine similarity of the unet output tensor. In 60 prompt tests, compared with fp16, the similarity fluctuated between 89% and 97%.
in https://developer.nvidia.com/blog/tensorrt-accelerates-stable-diffusion-nearly-2x-faster-with-8-bit-post-training-quantization/
I saw you can get images that are nearly identical to original FP16 precision.
Is there anything I can do to improve the accuracy of my in8 model?
Thanks!
Tasks
The text was updated successfully, but these errors were encountered: