Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of TensorRT 8.6 on the PyTorch version of Faster-RCNN #3034

Closed
micheleantonazzi opened this issue Jun 3, 2023 · 11 comments
Closed
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@micheleantonazzi
Copy link

micheleantonazzi commented Jun 3, 2023

Description

I'm trying to convert the Pytorch Implementation of Faster-RCNN in TensorRT 8.6.
The procedure that I followed:

  • Load the Faster-RCNN from TorchHub
  • Export to onnx
  • Build the TensorRT engine with the trtexec tool
    That procedure fails on the if node, which is generated from the MultiScaleRoIAlign class of torchvision.
    The error is the following
[E] Error[4]: /roi_heads/box_roi_pool/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [-1] and [-1,1].
[06/03/2023-17:38:40] [E] [TRT] ModelImporter.cpp:771: While parsing node number 1579 [If -> "/roi_heads/box_roi_pool/If_output_0"]:
[.... node ....]
ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph:
[6] Invalid Node - /roi_heads/box_roi_pool/If
/roi_heads/box_roi_pool/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [-1] and [-1,1].

Steps to reproduce:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    model.eval()
    dummy_input = torch.randn(1,3,320,320)
    torch.onnx.export(model,
            dummy_input,
            "model_onnx.onnx",
            export_params=True,
            )
trtexec --onnx=model_onnx.onnx --saveEngine=resnet_engine_pytorch.trt  --explicitBatch

you can also download the onnx model from here

Environment

TensorRT Version: 8.6

NVIDIA GPU: rtx 3050 mobile

NVIDIA Driver Version: 530

CUDA Version: 11.7 or 12 (both tested)

CUDNN Version: latest

Operating System: ubuntu 20.04

Python Version (if applicable): 3.8

PyTorch Version (if applicable): 2.0.0

Relevant Files

Model link: link

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): YES

Could you help me to solve this issue? Thank you so much in advance

@zerollzeng
Copy link
Collaborator

Looks like it trigger the TRT limitation

IIfConditionalOutputLayer inputs must have the same shape. Shapes are [-1] and [-1,1].

See https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_if_conditional.html and https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-conditionals

@zerollzeng
Copy link
Collaborator

If the model has static shapes, have you tried constant folding, this may eliminate the error node.

@zerollzeng
Copy link
Collaborator

Also we have a old Faster RCNN sample(deprecated), see https://github.com/NVIDIA/TensorRT/tree/release/8.4/samples/sampleFasterRCNN

@zerollzeng zerollzeng self-assigned this Jun 4, 2023
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jun 4, 2023
@micheleantonazzi
Copy link
Author

Hi, thank you for your suggestion.
I have tried to work

If the model has static shapes, have you tried constant folding, this may eliminate the error node.

Yes, I'm working with static shapes (the onnx model is exported specifying a dummy input and the trtexec is run with the argument --explicitBatch). It is correct?
I also tried to sanitize the model with polygraphy, running the command:

polygraphy surgeon sanitize --fold-constants model_onnx.onnx -o folded.onnx

but, the conversion of the sanitized model in tensorrt failed with the following error:

Error[4]: [shapeContext.cpp::operator()::3602] Error Code 4: Shape Error (reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [0,12] to [0,-1].)
[06/05/2023-10:05:11] [E] [TRT] ModelImporter.cpp:771: While parsing node number 545 [Reshape -> "/roi_heads/Reshape_output_0"]:
[06/05/2023-10:05:11] [E] [TRT] ModelImporter.cpp:772: --- Begin node ---
[06/05/2023-10:05:11] [E] [TRT] ModelImporter.cpp:773: input: "/roi_heads/box_predictor/bbox_pred/Gemm_output_0"
input: "/roi_heads/Concat_output_0"
output: "/roi_heads/Reshape_output_0"
name: "/roi_heads/Reshape"
op_type: "Reshape"
attribute {
  name: "allowzero"
  i: 0
  type: INT
}

[06/05/2023-10:05:11] [E] [TRT] ModelImporter.cpp:774: --- End node ---
[06/05/2023-10:05:11] [E] [TRT] ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph:
[6] Invalid Node - /roi_heads/Reshape
[shapeContext.cpp::operator()::3602] Error Code 4: Shape Error (reshape wildcard -1 has infinite number of solutions or no solution. Reshaping [0,12] to [0,-1].)

Any other suggestions on it?
Thank you so much again

@zerollzeng
Copy link
Collaborator

I check the onnx you provided, there are a lot of redundant ops, it comes from the pytorch source code. which make onnx folding hard to optimize and the error you seen, I think there should be many other problems with this onnx, so I would suggest using a new model, at least looks clean in ONNX, it will make the work much simple.

@micheleantonazzi
Copy link
Author

I will try a simpler model or try to re-implement portions of the FasterRCNN architecture provided by PyTorch.
Thank you so much again @zerollzeng

@zerollzeng
Copy link
Collaborator

Good luck :-D

@ttyio
Copy link
Collaborator

ttyio commented Jul 5, 2023

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!

@ttyio ttyio closed this as completed Jul 5, 2023
@zhurou603
Copy link

micheleantonazzi

@micheleantonazzi hi !Have you solved this problem? I encountered the same error as yours, which is also an error reported by the reshape operator。

@micheleantonazzi
Copy link
Author

Hi @zhurou603
No, I didn't manage to solve the problem. The PyTorch implementation of Faster R-CNN is not compatible with TensorRT and fixing it is very complex. I suggest using ONNXRUNTIME, with this inference engine the onnx export of Faster works well.

@xuebuaa
Copy link

xuebuaa commented Jul 11, 2024

facing same problem, the reshape op not support Shape[-1], cannot convert to trt model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

5 participants