Conditions / example of DepSepConvolution fusion #3237

vadimkantorov · 2023-08-18T09:28:27Z

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#fusion-types says:

Depthwise Separable Convolution
A depthwise convolution with activation followed by a convolution with activation may sometimes be fused into a single optimized DepSepConvolution layer. The precision of both convolutions must be INT8 and the device's computes capability must be 7.2 or later.

Are there any other conditions? What types of activations are admissible?

Is there example of fusable graphs? (this is important especially given that convs must already be int8)

There is almost no example or mentions of DepSepConvolution/TRT in Google Search.

Wonder about constraints of Q-DQ and qparams.

Thank you :)

The text was updated successfully, but these errors were encountered:

zerollzeng · 2023-08-22T15:24:29Z

@nvpohanh ^ ^

nvpohanh · 2023-08-23T04:13:18Z

You need: a pair of Q/DQ before depthwise Conv, a pair of Q/DQ before the 1x1 Conv, and a pair of Q/DQ before the next Conv (after 1x1 Conv's activation).

Here is an illustration:

vadimkantorov · 2023-08-25T11:52:18Z

Thank you! We will try this pattern!

It would be awesome to have this fusion example as an .onnx file and maybe a .svg output from trex (to have a feel how it looks like after fusion).

ttyio · 2023-09-26T20:42:56Z

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

vadimkantorov · 2023-09-27T18:08:57Z

The thing is I cannot reopen if it was the third party (you) who closed the question :) but yeah, I will add a comment when we have some feedbacks

nvpohanh · 2023-09-27T23:59:57Z

reopen for now. thanks

aboubezari · 2024-01-15T23:06:03Z

Hey @nvpohanh, I tried the above graph in a small example as attatched below. I got the following error:
[01/15/2024-15:26:23] [E] Error[10]: Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D. [01/15/2024-15:26:23] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D.) [01/15/2024-15:26:23] [E] Engine could not be created from network [01/15/2024-15:26:23] [E] Building engine failed
I'm using TensorRT version 8.6 and onnx opset 17.

nvpohanh · 2024-01-16T01:28:35Z

@aboubezari Could you provide the ONNX file so that we can repro and debug this issue? Thanks

aboubezari · 2024-01-16T01:46:21Z

Yes, I've attached the ONNX file as a zip file with just the onnx model in it.
Let me know if you would like me to export different shapes or activations on the Convs. I have already tried using Relu activations instead of BatchNorm with no luck.
aboubezari_debug.zip

nvpohanh · 2024-01-16T03:16:02Z

Filed internal tracker 4454538. Will let you know if we have any findings.

aboubezari · 2024-01-16T03:28:23Z

Awesome, thanks.

nzmora-nvidia · 2024-01-17T12:53:44Z

@aboubezari unrelated to the problem you've reported, I recommend placing the first BatchNorm after the first convolution (as it appears in the diagram above).
The ONNX file in aboubezari_debug.zip looks like so:

aboubezari · 2024-01-17T16:17:02Z

@nzmora-nvidia I realized that I exported the model after tweaking it a bit to figure out the issue, my bad.
Let me know if you need me to export you a new model.

nzmora-nvidia · 2024-01-18T13:58:41Z

@aboubezari Thank you, we can recreate the error and do not need the new model.

vadimkantorov · 2024-01-19T21:17:56Z

The ONNX file in aboubezari_debug.zip looks like so:

I guess it would be awesome to have such example ONNX files (or even complete PyTorch + torch-tensorrt) examples in the docs of TRT, especially when fusion is discussed (and given that fusion patterns are often fragile, especially together with quantization)!

nzmora-nvidia · 2024-01-19T23:08:58Z

@vadimkantorov That's a fair request. I'll provide some pytorch examples in the next TREx release.

nvpohanh · 2024-04-01T09:02:00Z

This issue has been fixed in TRT 10.0.0 EA. https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-10-0-0-EA

Thanks for reporting this issue.

vadimkantorov · 2024-04-01T09:10:30Z

@nvpohanh Please add somewhere in the docs an example *.onnx file or PyTorch example of properly getting DepSepConv to be used in TRT :) This is a very important module for speed-ups, it's important for users to know how export recognizable patterns for it...

E.g. a complete example of export of MobileNetV3 (making use of DepSep) https://pytorch.org/vision/stable/models/generated/torchvision.models.quantization.mobilenet_v3_large.html#mobilenet-v3-large would be great

aboubezari · 2024-04-01T14:44:41Z

Thank you @nvpohanh! Look forward to trying it out.

ttyio · 2024-04-16T16:05:17Z

I will close this since this is solved, thanks all!

vadimkantorov · 2024-04-16T16:23:36Z

@ttyio I think it's still important to provide in the docs ONNX files with examples of fusable graphs and ideally some complete examples of PyTorch code exporting these ONNX graphs

zerollzeng assigned nvpohanh Aug 22, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Aug 22, 2023

ttyio closed this as completed Sep 26, 2023

nvpohanh reopened this Sep 27, 2023

ttyio closed this as completed Apr 16, 2024

ttyio reopened this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditions / example of DepSepConvolution fusion #3237

Conditions / example of DepSepConvolution fusion #3237

vadimkantorov commented Aug 18, 2023 •

edited

Loading

zerollzeng commented Aug 22, 2023

nvpohanh commented Aug 23, 2023

vadimkantorov commented Aug 25, 2023 •

edited

Loading

ttyio commented Sep 26, 2023

vadimkantorov commented Sep 27, 2023 •

edited

Loading

nvpohanh commented Sep 27, 2023

aboubezari commented Jan 15, 2024 •

edited

Loading

nvpohanh commented Jan 16, 2024

aboubezari commented Jan 16, 2024

nvpohanh commented Jan 16, 2024

aboubezari commented Jan 16, 2024

nzmora-nvidia commented Jan 17, 2024

aboubezari commented Jan 17, 2024

nzmora-nvidia commented Jan 18, 2024

vadimkantorov commented Jan 19, 2024

nzmora-nvidia commented Jan 19, 2024

nvpohanh commented Apr 1, 2024

vadimkantorov commented Apr 1, 2024

aboubezari commented Apr 1, 2024

ttyio commented Apr 16, 2024

vadimkantorov commented Apr 16, 2024

Conditions / example of DepSepConvolution fusion #3237

Conditions / example of DepSepConvolution fusion #3237

Comments

vadimkantorov commented Aug 18, 2023 • edited Loading

zerollzeng commented Aug 22, 2023

nvpohanh commented Aug 23, 2023

vadimkantorov commented Aug 25, 2023 • edited Loading

ttyio commented Sep 26, 2023

vadimkantorov commented Sep 27, 2023 • edited Loading

nvpohanh commented Sep 27, 2023

aboubezari commented Jan 15, 2024 • edited Loading

nvpohanh commented Jan 16, 2024

aboubezari commented Jan 16, 2024

nvpohanh commented Jan 16, 2024

aboubezari commented Jan 16, 2024

nzmora-nvidia commented Jan 17, 2024

aboubezari commented Jan 17, 2024

nzmora-nvidia commented Jan 18, 2024

vadimkantorov commented Jan 19, 2024

nzmora-nvidia commented Jan 19, 2024

nvpohanh commented Apr 1, 2024

vadimkantorov commented Apr 1, 2024

aboubezari commented Apr 1, 2024

ttyio commented Apr 16, 2024

vadimkantorov commented Apr 16, 2024

vadimkantorov commented Aug 18, 2023 •

edited

Loading

vadimkantorov commented Aug 25, 2023 •

edited

Loading

vadimkantorov commented Sep 27, 2023 •

edited

Loading

aboubezari commented Jan 15, 2024 •

edited

Loading