run int8 model failure of TensorRT 8.4.12 when running yolo on orin DLA #3799

mayulin0206 · 2024-04-15T11:33:48Z

Description

For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.

Environment

TensorRT Version :8.4.12

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

zerollzeng · 2024-04-18T08:27:55Z

Could you please try the latest DriveOS/JP release?
We have a yolov5 dla sample: https://github.com/NVIDIA-AI-IOT/cuDLA-samples maybe is helpful to you.
Please provide a minimal reproduce if the latest release still fail.

lix19937 · 2024-04-21T15:26:21Z

For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.

You should do QAT2PTQ, get scales of qat onnx and then save as a calib table to run int8 of dla.

mayulin0206 · 2024-04-22T02:30:48Z

For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.

You should do QAT2PTQ, get scales of qat onnx and then save as a calib table to run int8 of dla.

yes, I did this, but the result is still completely wrong. the inference results are correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8

mayulin0206 · 2024-04-22T02:31:22Z

Could you please try the latest DriveOS/JP release?

We have a yolov5 dla sample: https://github.com/NVIDIA-AI-IOT/cuDLA-samples maybe is helpful to you.

Please provide a minimal reproduce if the latest release still fail.

@zerollzeng
According to your advice, I ran the yolov5 dla sample(https://github.com/NVIDIA-AI-IOT/cuDLA-samples) on the Orin DLA, but encountered the following issue as shown in below.

Under cuDLA hybrid mode I run under command and meet the below problem

make run

/usr/local/cuda//bin/nvcc -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu
g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/yolov5.o src/yolov5.cpp
g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp
g++ --std=c++14 -Wno-deprecated-declarations -Wall -O2 -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -o ./build/cudla_yolov5_app build/decode_nms.o build/validate_coco.o build/yolov5.o build/cudla_context_hybrid.o -l cudla -L/usr/local/cuda//lib64 -l cuda -l cudart -l nvinfer -L /usr/lib/aarch64-linux-gnu/ -l opencv_objdetect -l opencv_highgui -l opencv_imgproc -l opencv_core -l opencv_imgcodecs -L ./src/matx_reformat/build/ -l matx_reformat -l jsoncpp -lnvscibuf -lnvscisync
././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8
[hybrid mode] create cuDLA device SUCCESS
[hybrid mode] load cuDLA module from memory FAILED in src/cudla_context_hybrid.cpp:96, CUDLA ERR: 7
make: *** [Makefile:80: run] Error 1

Under buildDLAStandalone mode, I run under command and meet the below problem

#Build INT8 and FP16 loadable from ONNX in this project
bash data/model/build_dla_standalone_loadable.sh

[04/22/2024-19:51:27] [E] Error[3]: [builderConfig.cpp::setFlag::65] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::setFlag::65, condition: builderFlag != BuilderFlag::kPREFER_PRECISION_CONSTRAINTS || !flags[BuilderFlag::kOBEY_PRECISION_CONSTRAINTS]. kPREFER_PRECISION_CONSTRAINTS cannot be set if kOBEY_PRECISION_CONSTRAINTS is set.
)
[04/22/2024-19:51:27] [E] Error[2]: [nvmRegionOptimizer.cpp::forceToUseNvmIO::175] Error Code 2: Internal Error (Assertion std::all_of(a->consumers.begin(), a->consumers.end(), [](Node* n) { return isDLA(n->backend); }) failed. )
[04/22/2024-19:51:27] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[04/22/2024-19:51:27] [E] Engine could not be created from network
[04/22/2024-19:51:27] [E] Building engine failed
[04/22/2024-19:51:27] [E] Failed to create engine from model or file.
[04/22/2024-19:51:27] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --minShapes=images:1x3x672x672 --maxShapes=images:1x3x672x672 --optShapes=images:1x3x672x672 --shapes=images:1x3x672x672 --onnx=data/model/yolov5_trimmed_qat.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --inputIOFormats=int8:dla_hwc4 --outputIOFormats=fp16:chw16 --int8 --fp16 --calib=data/model/qat2ptq.cache --precisionConstraints=obey --layerPrecisions=/model.24/m.0/Conv:fp16,/model.24/m.1/Conv:fp16,/model.24/m.2/Conv:fp16

mayulin0206 · 2024-04-22T03:54:29Z

@zerollzeng @lix19937
I also have another questions about DLA.
Under the DLA INT8 mode,

Is the default tensor format for computation kDLA_HWC4?
Since the tensor format for computation on my GPU is kLINEAR, is a format conversion necessary under the DLA INT8 mode?
If the default tensor format for computation under the DLA INT8 mode is kDLA_HWC4, and some layers in the model fall back to the GPU, will there be an automatic format conversion for the computations that fall back to the GPU, and will it automatically convert to kLINEAR?

e2crawfo · 2024-11-20T20:48:28Z

@mayulin0206 Did you ever solve this? I'm facing the exact same pattern of issues: things are working on GPU int8 and DLA fp16, but producing nonsense for DLA int8. I'm also on an Orin with Jetpack 5.1, running TensorRT 8.5.2.2 through python. Upgrading Jetpack isn't an option for me, though I could try an updated TensorRT.

zerollzeng self-assigned this Apr 18, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 18, 2024

mayulin0206 closed this as completed Apr 22, 2024

mayulin0206 reopened this Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run int8 model failure of TensorRT 8.4.12 when running yolo on orin DLA #3799

run int8 model failure of TensorRT 8.4.12 when running yolo on orin DLA #3799

mayulin0206 commented Apr 15, 2024

zerollzeng commented Apr 18, 2024

lix19937 commented Apr 21, 2024

mayulin0206 commented Apr 22, 2024

mayulin0206 commented Apr 22, 2024 •

edited

Loading

mayulin0206 commented Apr 22, 2024 •

edited

Loading

e2crawfo commented Nov 20, 2024

run int8 model failure of TensorRT 8.4.12 when running yolo on orin DLA #3799

run int8 model failure of TensorRT 8.4.12 when running yolo on orin DLA #3799

Comments

mayulin0206 commented Apr 15, 2024

Description

Environment

zerollzeng commented Apr 18, 2024

lix19937 commented Apr 21, 2024

mayulin0206 commented Apr 22, 2024

mayulin0206 commented Apr 22, 2024 • edited Loading

mayulin0206 commented Apr 22, 2024 • edited Loading

e2crawfo commented Nov 20, 2024

mayulin0206 commented Apr 22, 2024 •

edited

Loading

mayulin0206 commented Apr 22, 2024 •

edited

Loading