Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run int8 model failure of TensorRT 8.4.12 when running yolo on orin DLA #3799

Open
mayulin0206 opened this issue Apr 15, 2024 · 6 comments
Open
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@mayulin0206
Copy link

Description

For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.

Environment

TensorRT Version :8.4.12

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

@zerollzeng
Copy link
Collaborator

  1. Could you please try the latest DriveOS/JP release?
  2. We have a yolov5 dla sample: https://github.com/NVIDIA-AI-IOT/cuDLA-samples maybe is helpful to you.
  3. Please provide a minimal reproduce if the latest release still fail.

@zerollzeng zerollzeng self-assigned this Apr 18, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 18, 2024
@lix19937
Copy link

For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.

You should do QAT2PTQ, get scales of qat onnx and then save as a calib table to run int8 of dla.

@mayulin0206
Copy link
Author

For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.

You should do QAT2PTQ, get scales of qat onnx and then save as a calib table to run int8 of dla.

yes, I did this, but the result is still completely wrong. the inference results are correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8

@mayulin0206
Copy link
Author

mayulin0206 commented Apr 22, 2024

  1. Could you please try the latest DriveOS/JP release?
  2. We have a yolov5 dla sample: https://github.com/NVIDIA-AI-IOT/cuDLA-samples maybe is helpful to you.
  3. Please provide a minimal reproduce if the latest release still fail.

@zerollzeng
According to your advice, I ran the yolov5 dla sample(https://github.com/NVIDIA-AI-IOT/cuDLA-samples) on the Orin DLA, but encountered the following issue as shown in below.
image

  • Under cuDLA hybrid mode I run under command and meet the below problem

make run

/usr/local/cuda//bin/nvcc -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu
g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/yolov5.o src/yolov5.cpp
g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp
g++ --std=c++14 -Wno-deprecated-declarations -Wall -O2 -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -o ./build/cudla_yolov5_app build/decode_nms.o build/validate_coco.o build/yolov5.o build/cudla_context_hybrid.o -l cudla -L/usr/local/cuda//lib64 -l cuda -l cudart -l nvinfer -L /usr/lib/aarch64-linux-gnu/ -l opencv_objdetect -l opencv_highgui -l opencv_imgproc -l opencv_core -l opencv_imgcodecs -L ./src/matx_reformat/build/ -l matx_reformat -l jsoncpp -lnvscibuf -lnvscisync
././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8
[hybrid mode] create cuDLA device SUCCESS
[hybrid mode] load cuDLA module from memory FAILED in src/cudla_context_hybrid.cpp:96, CUDLA ERR: 7
make: *** [Makefile:80: run] Error 1

  • Under buildDLAStandalone mode, I run under command and meet the below problem

#Build INT8 and FP16 loadable from ONNX in this project
bash data/model/build_dla_standalone_loadable.sh

[04/22/2024-19:51:27] [E] Error[3]: [builderConfig.cpp::setFlag::65] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::setFlag::65, condition: builderFlag != BuilderFlag::kPREFER_PRECISION_CONSTRAINTS || !flags[BuilderFlag::kOBEY_PRECISION_CONSTRAINTS]. kPREFER_PRECISION_CONSTRAINTS cannot be set if kOBEY_PRECISION_CONSTRAINTS is set.
)
[04/22/2024-19:51:27] [E] Error[2]: [nvmRegionOptimizer.cpp::forceToUseNvmIO::175] Error Code 2: Internal Error (Assertion std::all_of(a->consumers.begin(), a->consumers.end(), [](Node* n) { return isDLA(n->backend); }) failed. )
[04/22/2024-19:51:27] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[04/22/2024-19:51:27] [E] Engine could not be created from network
[04/22/2024-19:51:27] [E] Building engine failed
[04/22/2024-19:51:27] [E] Failed to create engine from model or file.
[04/22/2024-19:51:27] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --minShapes=images:1x3x672x672 --maxShapes=images:1x3x672x672 --optShapes=images:1x3x672x672 --shapes=images:1x3x672x672 --onnx=data/model/yolov5_trimmed_qat.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --inputIOFormats=int8:dla_hwc4 --outputIOFormats=fp16:chw16 --int8 --fp16 --calib=data/model/qat2ptq.cache --precisionConstraints=obey --layerPrecisions=/model.24/m.0/Conv:fp16,/model.24/m.1/Conv:fp16,/model.24/m.2/Conv:fp16

@mayulin0206
Copy link
Author

mayulin0206 commented Apr 22, 2024

@zerollzeng @lix19937
I also have another questions about DLA.
Under the DLA INT8 mode,

  1. Is the default tensor format for computation kDLA_HWC4?
  2. Since the tensor format for computation on my GPU is kLINEAR, is a format conversion necessary under the DLA INT8 mode?
  3. If the default tensor format for computation under the DLA INT8 mode is kDLA_HWC4, and some layers in the model fall back to the GPU, will there be an automatic format conversion for the computations that fall back to the GPU, and will it automatically convert to kLINEAR?

@e2crawfo
Copy link

@mayulin0206 Did you ever solve this? I'm facing the exact same pattern of issues: things are working on GPU int8 and DLA fp16, but producing nonsense for DLA int8. I'm also on an Orin with Jetpack 5.1, running TensorRT 8.5.2.2 through python. Upgrading Jetpack isn't an option for me, though I could try an updated TensorRT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants