Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditions / example of DepSepConvolution fusion #3237

Open
vadimkantorov opened this issue Aug 18, 2023 · 21 comments
Open

Conditions / example of DepSepConvolution fusion #3237

vadimkantorov opened this issue Aug 18, 2023 · 21 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@vadimkantorov
Copy link

vadimkantorov commented Aug 18, 2023

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#fusion-types says:

Depthwise Separable Convolution
A depthwise convolution with activation followed by a convolution with activation may sometimes be fused into a single optimized DepSepConvolution layer. The precision of both convolutions must be INT8 and the device's computes capability must be 7.2 or later.

Are there any other conditions? What types of activations are admissible?

Is there example of fusable graphs? (this is important especially given that convs must already be int8)

There is almost no example or mentions of DepSepConvolution/TRT in Google Search.

Wonder about constraints of Q-DQ and qparams.

Thank you :)

@zerollzeng
Copy link
Collaborator

@nvpohanh ^ ^

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Aug 22, 2023
@nvpohanh
Copy link
Collaborator

You need: a pair of Q/DQ before depthwise Conv, a pair of Q/DQ before the 1x1 Conv, and a pair of Q/DQ before the next Conv (after 1x1 Conv's activation).

Here is an illustration:
2023-08-23 12_11_45-C__Users_phuan_AppData_Local_Temp_MicrosoftEdgeDownloads_9490e67b-d4dc-45b7-9364

@vadimkantorov
Copy link
Author

vadimkantorov commented Aug 25, 2023

Thank you! We will try this pattern!

It would be awesome to have this fusion example as an .onnx file and maybe a .svg output from trex (to have a feel how it looks like after fusion).

@ttyio
Copy link
Collaborator

ttyio commented Sep 26, 2023

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

@ttyio ttyio closed this as completed Sep 26, 2023
@vadimkantorov
Copy link
Author

vadimkantorov commented Sep 27, 2023

The thing is I cannot reopen if it was the third party (you) who closed the question :) but yeah, I will add a comment when we have some feedbacks

@nvpohanh
Copy link
Collaborator

reopen for now. thanks

@nvpohanh nvpohanh reopened this Sep 27, 2023
@aboubezari
Copy link

aboubezari commented Jan 15, 2024

Hey @nvpohanh, I tried the above graph in a small example as attatched below. I got the following error:
[01/15/2024-15:26:23] [E] Error[10]: Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D. [01/15/2024-15:26:23] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D.) [01/15/2024-15:26:23] [E] Engine could not be created from network [01/15/2024-15:26:23] [E] Building engine failed
I'm using TensorRT version 8.6 and onnx opset 17.
Screenshot 2024-01-15 at 3 36 17 PM

@nvpohanh
Copy link
Collaborator

@aboubezari Could you provide the ONNX file so that we can repro and debug this issue? Thanks

@aboubezari
Copy link

Yes, I've attached the ONNX file as a zip file with just the onnx model in it.
Let me know if you would like me to export different shapes or activations on the Convs. I have already tried using Relu activations instead of BatchNorm with no luck.
aboubezari_debug.zip

@nvpohanh
Copy link
Collaborator

Filed internal tracker 4454538. Will let you know if we have any findings.

@aboubezari
Copy link

Awesome, thanks.

@nzmora-nvidia
Copy link

@aboubezari unrelated to the problem you've reported, I recommend placing the first BatchNorm after the first convolution (as it appears in the diagram above).
The ONNX file in aboubezari_debug.zip looks like so:
image

@aboubezari
Copy link

@nzmora-nvidia I realized that I exported the model after tweaking it a bit to figure out the issue, my bad.
Let me know if you need me to export you a new model.

@nzmora-nvidia
Copy link

@aboubezari Thank you, we can recreate the error and do not need the new model.

@vadimkantorov
Copy link
Author

The ONNX file in aboubezari_debug.zip looks like so:

I guess it would be awesome to have such example ONNX files (or even complete PyTorch + torch-tensorrt) examples in the docs of TRT, especially when fusion is discussed (and given that fusion patterns are often fragile, especially together with quantization)!

@nzmora-nvidia
Copy link

@vadimkantorov That's a fair request. I'll provide some pytorch examples in the next TREx release.

@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 1, 2024

This issue has been fixed in TRT 10.0.0 EA. https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-10-0-0-EA

Thanks for reporting this issue.

@vadimkantorov
Copy link
Author

@nvpohanh Please add somewhere in the docs an example *.onnx file or PyTorch example of properly getting DepSepConv to be used in TRT :) This is a very important module for speed-ups, it's important for users to know how export recognizable patterns for it...

E.g. a complete example of export of MobileNetV3 (making use of DepSep) https://pytorch.org/vision/stable/models/generated/torchvision.models.quantization.mobilenet_v3_large.html#mobilenet-v3-large would be great

@aboubezari
Copy link

Thank you @nvpohanh! Look forward to trying it out.

@ttyio
Copy link
Collaborator

ttyio commented Apr 16, 2024

I will close this since this is solved, thanks all!

@ttyio ttyio closed this as completed Apr 16, 2024
@vadimkantorov
Copy link
Author

@ttyio I think it's still important to provide in the docs ONNX files with examples of fusable graphs and ideally some complete examples of PyTorch code exporting these ONNX graphs

@ttyio ttyio reopened this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants