Forcing layernorm layers to run in FP32 precision #2781

de1star · 2023-03-17T11:08:34Z

Hi, when I build tensorRT engine, there was a warning:
[W] Running layernorm after self-attention in FP16 may cause overflow. Forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
But I did not found an approach to force layernorm run in fp32 precision, could you help me with that? Thanks a lot!

rajeevsrao · 2023-03-18T01:27:16Z

@de1star is this an ONNX model you are trying to run? If so can you try exporting to opset 17 (which added LayerNormalization operator) and running with TRT 8.6? Precision requirements for LayerNormalization operator is handled automatically by TensorRT optimizer in 8.6.

de1star · 2023-03-20T04:44:33Z

Thank you, I will have try!

de1star · 2023-03-20T10:37:54Z

Hi, I tried to set opset_version as 17, but an error was raised:
ValueError: Unsupported ONNX opset version: 17.
It seems like torch does not support opset_version=17. Any suggestions?

rajeevsrao · 2023-03-20T16:27:28Z

@de1star you will need to use torch v1.13.0 or newer version.
https://github.com/pytorch/pytorch/blob/v1.13.0-rc1/torch/onnx/symbolic_opset17.py

ttyio · 2023-04-24T23:40:44Z

Closing since no activity for more than 3 weeks, pls reopen if you still have question. thanks!

monsterlyg · 2023-11-06T04:25:10Z

I've exported model to opset 17 onnx. The warning still exists. @rajeevsrao

jinluyang · 2023-11-08T12:51:46Z

Sorry, my bad. I found TensorRT8.6 to be working fine. I got the following error because I had previously a TensorRT8.4 installed, I missed the libnvonnxparser.so while removing TRT8.4. now with TRT8.6 and ONNX opset17, everything works fine, thank you.

Also the way to manually set the layernorm layer to fp32 through tensorrt python api, can be figured out by this link #1196 (comment)

1193700079 · 2024-02-01T09:15:27Z

How can TensorRT be implemented in C++?

w1005444804 · 2024-03-02T06:13:42Z

@rajeevsrao How to Forcing layernorm layers to run in FP32 precision with c++?? I have set "config->setFlag(BuilderFlag::kFP16);"

focusunsink · 2024-08-18T03:33:47Z

still no solution

lantudou · 2024-11-23T10:31:56Z

trtexec --onnx=sim_cnn.onnx --saveEngine=model.trt --fp16 --verbose
you will find something like that in output:

[11/23/2024-18:16:34] [W] [TRT] Detected layernorm nodes in FP16.
[11/23/2024-18:16:34] [V] [TRT] /downsample_layers.0/downsample_layers.0.1/ReduceMean_1,/downsample_layers.0/downsample_layers.0.1/ReduceMean,/downsample_layers.0/downsample_layers.0.1/Pow,/downsample_layers.1/downsample_layers.1.0/ReduceMean,/downsample_layers.1/downsample_layers.1.0/Pow,/downsample_layers.1/downsample_layers.1.0/ReduceMean_1,/downsample_layers.2/downsample_layers.2.0/ReduceMean,/downsample_layers.2/downsample_layers.2.0/Pow,/downsample_layers.2/downsample_layers.2.0/ReduceMean_1,/downsample_layers.3/downsample_layers.3.0/ReduceMean,/downsample_layers.3/downsample_layers.3.0/Pow,/downsample_layers.3/downsample_layers.3.0/ReduceMean_1
[11/23/2024-18:16:34] [W] [TRT] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.

Copy this layer name,and set them to fp32 percision like this command:

trtexec --onnx=sim_cnn.onnx --saveEngine=model.trt --fp16 --precisionConstraints=obey --layerPrecisions=/downsample_layers.0/downsample_layers.0.1/ReduceMean_1:fp32,/downsample_layers.0/downsample_layers.0.1/ReduceMean:fp32,/downsample_layers.0/downsample_layers.0.1/Pow:fp32,/downsample_layers.1/downsample_layers.1.0/ReduceMean:fp32,/downsample_layers.1/downsample_layers.1.0/Pow:fp32,/downsample_layers.1/downsample_layers.1.0/ReduceMean_1:fp32,/downsample_layers.2/downsample_layers.2.0/ReduceMean:fp32,/downsample_layers.2/downsample_layers.2.0/Pow:fp32,/downsample_layers.2/downsample_layers.2.0/ReduceMean_1:fp32,/downsample_layers.3/downsample_layers.3.0/ReduceMean:fp32,/downsample_layers.3/downsample_layers.3.0/Pow:fp32,/downsample_layers.3/downsample_layers.3.0/ReduceMean_1:fp32

rajeevsrao added triaged Issue has been triaged by maintainers Accuracy Output mismatch between TensorRT and other frameworks labels Mar 18, 2023

ttyio closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forcing layernorm layers to run in FP32 precision #2781

Forcing layernorm layers to run in FP32 precision #2781

de1star commented Mar 17, 2023

rajeevsrao commented Mar 18, 2023

de1star commented Mar 20, 2023

de1star commented Mar 20, 2023

rajeevsrao commented Mar 20, 2023

ttyio commented Apr 24, 2023

monsterlyg commented Nov 6, 2023 •

edited

Loading

jinluyang commented Nov 8, 2023 •

edited

Loading

1193700079 commented Feb 1, 2024

w1005444804 commented Mar 2, 2024

focusunsink commented Aug 18, 2024

lantudou commented Nov 23, 2024 •

edited

Loading

Forcing layernorm layers to run in FP32 precision #2781

Forcing layernorm layers to run in FP32 precision #2781

Comments

de1star commented Mar 17, 2023

rajeevsrao commented Mar 18, 2023

de1star commented Mar 20, 2023

de1star commented Mar 20, 2023

rajeevsrao commented Mar 20, 2023

ttyio commented Apr 24, 2023

monsterlyg commented Nov 6, 2023 • edited Loading

jinluyang commented Nov 8, 2023 • edited Loading

1193700079 commented Feb 1, 2024

w1005444804 commented Mar 2, 2024

focusunsink commented Aug 18, 2024

lantudou commented Nov 23, 2024 • edited Loading

monsterlyg commented Nov 6, 2023 •

edited

Loading

jinluyang commented Nov 8, 2023 •

edited

Loading

lantudou commented Nov 23, 2024 •

edited

Loading