[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

noahzn · 2024-11-13T15:23:18Z

I have already generated some trt cache when infering my ONNX model using TRT Execution Provider. Then, for the online testing of my model, I set so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL, but it seems that still new caches are generated. I only want to reuse the old cache while not generating new cache. How can I do that? Thanks in advance!

providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
trt_engine_cache_path = "weights/.trtcache_engines"
trt_timing_cache_path = "weights/.trtcache_timings"

# Create the 'weights' directory if it doesn't exist
os.makedirs(os.path.dirname(trt_engine_cache_path), exist_ok=True)

if conf.trt:
    providers = [
                        (
                            "TensorrtExecutionProvider",
                            {
                                "trt_max_workspace_size": 2 * 1024 * 1024 * 1024,
                                "trt_fp16_enable": True,
                                "trt_engine_cache_enable": True,
                                'trt_timing_cache_enable': True,
                                "trt_engine_cache_path": trt_engine_cache_path,
                                "trt_timing_cache_path": trt_timing_cache_path,
                           
                            }
                        )
                    ] + providers

The text was updated successfully, but these errors were encountered:

yf711 · 2024-11-13T23:50:13Z

Hi @noahzn Your old engine/profile might not be reused by TRTEP if current inference param/cache name/env variables/HW env changes.

Here's more info about engine reusability: https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_engine_cache_enable

I wonder if you update your old engine/profile with newly generated ones, is that new engine going to be reused? or a newer engine need to be generated

noahzn · 2024-11-14T04:11:44Z

@yf711 Thanks for your reply!
My networks are keypoints detection and matching. I think the issue is that we cannot guarantee to extract the same numbers of keypoints on both images. I have warmed up the networks using about 10k paired of images, but it still generates new engines for some paired of images. The old generated engines are still used I think, because it indeed accelerates the inference.
What can I do in this case? will trt_profile_min_shapes and trt_profile_max_shapes help? I tried setting this for input dimensions, but it's not enough.
Following input(s) has no associated shape profiles provided: /Reshape_3_output_0,/norm/Div_output_0,/Resize_output_0,/Unsqueeze_18_output_0,/NonZero_output_0. Maybe some intermediate layers also need to be given dimension ranges?

chilo-ms · 2024-11-21T18:42:50Z

@noahzn

It's not related to dimension ranges in the intermediate layers input.

The engine cache name,
e.g. TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_17097719564268968195_0_0_sm80.engine,
contains a hash value which is the return of the hash function that takes following metadata as input:

model/graph
model's file name (Use the model's file name instead of the entire path to avoid cache regeneration if path changes)
input names of the graph
output name of each node
TRT version (determined at build time)
ORT version (determined at build time)
CUDA version (determined at build time)

Also, the cache name contains compute capability, e.g. sm80.

Does any of metadata above change between the run that generated the cache and the run that supposed to use the old cache?
If so, TRT EP won't use the old cache, and it will generate a new one instead.

noahzn · 2024-11-22T06:09:50Z

@chilo-ms Thanks for your reply. I don't think the above metadata changes. The model's file name is never changed, the input names of the graph are fixed in the onnx model. Concerning the graph, since the numbers of keypoints are different, it may be changed. So now I try to set min. and max. shape of some middle layers and seems now it generates new caches less frequently than before.

For example, these are the cached files in the folder.

-rw-r--r-- 1 root root  165668 Nov 22 07:33 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_0_0_fp16_sm87.engine
-rw-r--r-- 1 root root 1894027 Nov 22 11:07 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_1_1_fp16_sm87.engine
-rw-r--r-- 1 root root      38 Nov 22 11:07 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_1_1_fp16_sm87.profile
-rw-r--r-- 1 root root  392336 Nov 22 11:07 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_2_2_fp16_sm87.engine
-rw-r--r-- 1 root root     129 Nov 22 11:07 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_2_2_fp16_sm87.profile
-rw-r--r-- 1 root root  387743 Nov 22 07:40 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_3_3_fp16_sm87.engine
-rw-r--r-- 1 root root     139 Nov 22 07:40 TensorrtExecutionProvider_TRTKernel_graph_main_graph_5143105182468268169_3_3_fp16_sm87.profile

chilo-ms · 2024-11-22T22:54:47Z

Concerning the graph, since the numbers of keypoints are different

I assume the shape of input/output tensor reflects the numbers of keypoints, right?
But shape is not the metadata to be hashed.

I suspected it's the model's file name. Could you confirm you use the exact same path for the first run and test run?
Could you also paste the new engine file name here?
If you could share the model as well as the repro code, we can try our side to repro.

For setting the trt_profile_min_shapes, trt_profile_max_shapes and trt_profile_opt_shapes to the range of the minimum and maximum of the input image, it doesn't help for the issue, but it can prevent TRT engine being rebuilt during multiple inference run with different input images.

Following input(s) has no associated shape profiles provided: /Reshape_3_output_0,/norm/Div_output_0,/Resize_output_0,/Unsqueeze_18_output_0,/NonZero_output_0. Maybe some intermediate layers also need to be given dimension ranges?

Yes, in your case, the model is being partitioned into multiple subgraphs that run by TRT EP and several other nodes run by CUDA EP or CPU.
The shape could be one of the input of the subgraph to be run by TRT EP and it requires shape info.

github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Nov 13, 2024

noahzn changed the title ~~how can I disable generating cache when using trt execution provider~~ [TensorRT EP] How can I disable generating cache when using trt execution provider Nov 13, 2024

noahzn mentioned this issue Nov 15, 2024

long inference time of using TensorrtExecutionProvider fabio-sim/LightGlue-ONNX#97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

noahzn commented Nov 13, 2024

yf711 commented Nov 13, 2024

noahzn commented Nov 14, 2024

chilo-ms commented Nov 21, 2024 •

edited

Loading

noahzn commented Nov 22, 2024 •

edited

Loading

chilo-ms commented Nov 22, 2024

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

Comments

noahzn commented Nov 13, 2024

yf711 commented Nov 13, 2024

noahzn commented Nov 14, 2024

chilo-ms commented Nov 21, 2024 • edited Loading

noahzn commented Nov 22, 2024 • edited Loading

chilo-ms commented Nov 22, 2024

chilo-ms commented Nov 21, 2024 •

edited

Loading

noahzn commented Nov 22, 2024 •

edited

Loading