Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix refinedet trt8 #1329

Merged
merged 3 commits into from
Aug 26, 2021
Merged

Fix refinedet trt8 #1329

merged 3 commits into from
Aug 26, 2021

Conversation

fantes
Copy link
Contributor

@fantes fantes commented Aug 17, 2021

This PR addresses #1324

it also updates dependencies to TRT 8.x
BIG FAT WARNING :
TRT 8.0.1.x is subject to this bug : https://forums.developer.nvidia.com/t/build-engine-error-when-use-pointnet-like-structure-and-tensorrt-8-0-1-6/183569, which makes ssd models not working !!!

CMakeLists.txt Outdated
@@ -916,26 +916,22 @@ if (USE_TENSORRT)
set(TENSORRT_INC_DIR /usr/include/x86_64-linux-gnu)
endif()

if (NOT EXISTS "${TRTTESTDIR}/libnvinfer.so.7")
if (NOT EXISTS "${TRTTESTDIR}/libnvinfer.so.8")
message(FATAL_ERROR "Could not find TensorRT ${TENSORRT_LIB_DIR}/libnvinfer.so.7, please provide tensorRT location as TENSORRT_DIR or (TENSORRT_LIB_DIR _and_ TENSORRT_INC_DIR)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libnvinfer.so.8 instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was already done, not pushed, sorry

@beniz
Copy link
Collaborator

beniz commented Aug 19, 2021

@fantes I do confirme the docker does build with the latest NVidia+tensorrt container image, patch is below that could be integrated to this PR:

diff --git a/docker/gpu_tensorrt.Dockerfile b/docker/gpu_tensorrt.Dockerfile
index 59ce5317..03de8d62 100644
--- a/docker/gpu_tensorrt.Dockerfile
+++ b/docker/gpu_tensorrt.Dockerfile
@@ -1,5 +1,5 @@
 # syntax = docker/dockerfile:1.0-experimental
-FROM nvcr.io/nvidia/tensorrt:21.04-py3 AS build
+FROM nvcr.io/nvidia/tensorrt:21.07-py3 AS build
 
 ARG DEEPDETECT_RELEASE=OFF
 ARG DEEPDETECT_ARCH=gpu
@@ -110,7 +110,7 @@ RUN --mount=type=cache,target=/ccache/ mkdir build && cd build && ../build.sh
 RUN ./docker/get_libs.sh
 
 # Build final Docker image
-FROM nvcr.io/nvidia/tensorrt:21.04-py3 AS runtime
+FROM nvcr.io/nvidia/tensorrt:21.07-py3 AS runtime
 
 ARG DEEPDETECT_ARCH=gpu

CMakeLists.txt Outdated
if (EXISTS "${TRTTESTDIR}/libnvinfer.so.8")
set(TENSORRT_VERSION 21.08)
message(STATUS "Found TensorRT libraries version 8.x")
elseif (EXISTS "${TRTTESTDIR}/libnvinfer.so.8")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same test twice, no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, fixed

@beniz
Copy link
Collaborator

beniz commented Aug 23, 2021

At the moment, several models appear to be broken:

  • Squeezenet SSD (and probably other SSDs as well), from the unit tests:
[2021-08-23 11:48:46.032] [imgserv] [error] [resources.cpp::~ScopedCudaEvent::438] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[2021-08-23 11:48:46.032] [imgserv] [error] [resources.cpp::~ScopedCudaEvent::438] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[2021-08-23 11:48:46.032] [imgserv] [error] [resources.cpp::~ScopedCudaEvent::438] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[2021-08-23 11:48:46.032] [imgserv] [error] /data1/beniz/code/deepdetect/build_trt/tensorrt-oss/src/tensorrt-oss/plugin/priorBoxPlugin/priorBoxPlugin.cpp (253) - Cuda Error in destroy: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of 'nvinfer1::plugin::CudaError'
  what():  std::exception
Aborted (core dumped)
  • refinedet in fp16:
[2021-08-23 11:51:40.440] [imgserv] [info] --------------- Timing Runner: detection_out (PluginV2)
[2021-08-23 11:51:40.453] [imgserv] [info] Deleting timing cache: 489 entries, 505 hits
[2021-08-23 11:51:40.466] [imgserv] [info] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 2745 (MiB)
[2021-08-23 11:51:40.467] [imgserv] [error] 2: [pluginV2Runner.cpp::execute::267] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)
[2021-08-23 11:51:40.467] [imgserv] [error] 2: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
  • model saving ? can't find the .bs model after tensorrt has compiled the model.

@fantes fantes force-pushed the fix_refinedet_trt8 branch 3 times, most recently from 469b066 to e166442 Compare August 23, 2021 14:17
@mergify mergify bot merged commit bdff2ae into jolibrain:master Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants