You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My first question is why I have set tactic_sources = ['CUBLAS_LT', 'CUDNN'] ( I think my model is fp16, so I don't need CUBLAS, I want to reduce gpu memoey consumption), from the log, it still uses the CUBLAS tactic source ?
Then I use trtexec to run the converted trt model : trtexec --loadEngine=weights/model_bs4_fp16_1.7GB_CUBLAS_LT_CUDNN.trt --shapes=audio_seqs__0:4x1x80x16,img_seqs__1:4x6x256x256, the log is :
[10/21/2024-21:45:10] [I] === Model Options ===
[10/21/2024-21:45:10] [I] Format: *
[10/21/2024-21:45:10] [I] Model:
[10/21/2024-21:45:10] [I] Output:
[10/21/2024-21:45:10] [I] === Build Options ===
[10/21/2024-21:45:10] [I] Max batch: explicit
[10/21/2024-21:45:10] [I] Workspace: 16 MiB
[10/21/2024-21:45:10] [I] minTiming: 1
[10/21/2024-21:45:10] [I] avgTiming: 8
[10/21/2024-21:45:10] [I] Precision: FP32
[10/21/2024-21:45:10] [I] Calibration:
[10/21/2024-21:45:10] [I] Refit: Disabled
[10/21/2024-21:45:10] [I] Sparsity: Disabled
[10/21/2024-21:45:10] [I] Safe mode: Disabled
[10/21/2024-21:45:10] [I] Restricted mode: Disabled
[10/21/2024-21:45:10] [I] Save engine:
[10/21/2024-21:45:10] [I] Load engine: weights/wav2lip/wav2lip_bs4_fp16_1.5GB_CUBLAS_LT_CUDNN.trt
[10/21/2024-21:45:10] [I] NVTX verbosity: 0
[10/21/2024-21:45:10] [I] Tactic sources: Using default tactic sources
[10/21/2024-21:45:10] [I] timingCacheMode: local
[10/21/2024-21:45:10] [I] timingCacheFile:
[10/21/2024-21:45:10] [I] Input(s)s format: fp32:CHW
[10/21/2024-21:45:10] [I] Output(s)s format: fp32:CHW
[10/21/2024-21:45:10] [I] Input build shape: audio_seqs__0=4x1x80x16+4x1x80x16+4x1x80x16
[10/21/2024-21:45:10] [I] Input build shape: img_seqs__1=4x6x256x256+4x6x256x256+4x6x256x256
[10/21/2024-21:45:10] [I] Input calibration shapes: model
[10/21/2024-21:45:10] [I] === System Options ===
[10/21/2024-21:45:10] [I] Device: 0
[10/21/2024-21:45:10] [I] DLACore:
[10/21/2024-21:45:10] [I] Plugins:
[10/21/2024-21:45:10] [I] === Inference Options ===
[10/21/2024-21:45:10] [I] Batch: Explicit
[10/21/2024-21:45:10] [I] Input inference shape: img_seqs__1=4x6x256x256
[10/21/2024-21:45:10] [I] Input inference shape: audio_seqs__0=4x1x80x16
[10/21/2024-21:45:10] [I] Iterations: 10
[10/21/2024-21:45:10] [I] Duration: 3s (+ 200ms warm up)
[10/21/2024-21:45:10] [I] Sleep time: 0ms
[10/21/2024-21:45:10] [I] Streams: 1
[10/21/2024-21:45:10] [I] ExposeDMA: Disabled
[10/21/2024-21:45:10] [I] Data transfers: Enabled
[10/21/2024-21:45:10] [I] Spin-wait: Disabled
[10/21/2024-21:45:10] [I] Multithreading: Disabled
[10/21/2024-21:45:10] [I] CUDA Graph: Disabled
[10/21/2024-21:45:10] [I] Separate profiling: Disabled
[10/21/2024-21:45:10] [I] Time Deserialize: Disabled
[10/21/2024-21:45:10] [I] Time Refit: Disabled
[10/21/2024-21:45:10] [I] Skip inference: Disabled
[10/21/2024-21:45:10] [I] Inputs:
[10/21/2024-21:45:10] [I] === Reporting Options ===
[10/21/2024-21:45:10] [I] Verbose: Disabled
[10/21/2024-21:45:10] [I] Averages: 10 inferences
[10/21/2024-21:45:10] [I] Percentile: 99
[10/21/2024-21:45:10] [I] Dump refittable layers:Disabled
[10/21/2024-21:45:10] [I] Dump output: Disabled
[10/21/2024-21:45:10] [I] Profile: Disabled
[10/21/2024-21:45:10] [I] Export timing to JSON file:
[10/21/2024-21:45:10] [I] Export output to JSON file:
[10/21/2024-21:45:10] [I] Export profile to JSON file:
[10/21/2024-21:45:10] [I]
[10/21/2024-21:45:11] [I] === Device Information ===
[10/21/2024-21:45:11] [I] Selected Device: NVIDIA A30
[10/21/2024-21:45:11] [I] Compute Capability: 8.0
[10/21/2024-21:45:11] [I] SMs: 56
[10/21/2024-21:45:11] [I] Compute Clock Rate: 1.44 GHz
[10/21/2024-21:45:11] [I] Device Global Memory: 24060 MiB
[10/21/2024-21:45:11] [I] Shared Memory per SM: 164 KiB
[10/21/2024-21:45:11] [I] Memory Bus Width: 3072 bits (ECC enabled)
[10/21/2024-21:45:11] [I] Memory Clock Rate: 1.215 GHz
[10/21/2024-21:45:11] [I]
[10/21/2024-21:45:11] [I] TensorRT version: 8003
[10/21/2024-21:45:16] [I] [TRT] [MemUsageChange] Init CUDA: CPU +502, GPU +0, now: CPU 633, GPU 521 (MiB)
[10/21/2024-21:45:16] [I] [TRT] Loaded engine size: 123 MB
[10/21/2024-21:45:16] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 633 MiB, GPU 521 MiB
[10/21/2024-21:45:24] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +793, GPU +342, now: CPU 1426, GPU 987 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +198, GPU +342, now: CPU 1624, GPU 1329 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1624, GPU 1311 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1624 MiB, GPU 1311 MiB
[10/21/2024-21:45:30] [I] Engine loaded in 19.0488 sec.
[10/21/2024-21:45:30] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1500 MiB, GPU 1311 MiB
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +10, now: CPU 1501, GPU 1321 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1501, GPU 1329 (MiB)
[10/21/2024-21:45:30] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1501 MiB, GPU 1605 MiB
[10/21/2024-21:45:30] [I] Created input binding for audio_seqs__0 with dimensions 4x1x80x16
[10/21/2024-21:45:30] [I] Created input binding for img_seqs__1 with dimensions 4x6x256x256
[10/21/2024-21:45:30] [I] Created output binding for value__0 with dimensions 4x3x256x256
[10/21/2024-21:45:30] [I] Starting inference
[10/21/2024-21:45:33] [I] Warmup completed 26 queries over 200 ms
[10/21/2024-21:45:33] [I] Timing trace has 607 queries over 3.01887 s
[10/21/2024-21:45:33] [I]
[10/21/2024-21:45:33] [I] === Trace details ===
[10/21/2024-21:45:33] [I] Trace averages of 10 runs:
[10/21/2024-21:45:33] [I] Average on 10 runs - GPU latency: 6.70789 ms - Host latency: 7.19922 ms (end to end 13.276 ms, enqueue 1.39257 ms)
...
[10/21/2024-21:45:33] [I] Average on 10 runs - GPU latency: 4.63984 ms - Host latency: 5.05623 ms (end to end 8.15396 ms, enqueue 1.21301 ms)
[10/21/2024-21:45:33] [I]
[10/21/2024-21:45:33] [I] === Performance summary ===
[10/21/2024-21:45:33] [I] Throughput: 201.069 qps
[10/21/2024-21:45:33] [I] Latency: min = 4.92627 ms, max = 7.21841 ms, mean = 5.10661 ms, median = 5.06763 ms, percentile(99%) = 7.19328 ms
[10/21/2024-21:45:33] [I] End-to-End Host Latency: min = 5.026 ms, max = 13.315 ms, mean = 9.14545 ms, median = 9.25452 ms, percentile(99%) = 13.2415 ms
[10/21/2024-21:45:33] [I] Enqueue Time: min = 0.482788 ms, max = 2.88159 ms, mean = 1.24856 ms, median = 1.29077 ms, percentile(99%) = 2.15625 ms
[10/21/2024-21:45:33] [I] H2D Latency: min = 0.258789 ms, max = 0.338821 ms, mean = 0.267017 ms, median = 0.264648 ms, percentile(99%) = 0.321045 ms
[10/21/2024-21:45:33] [I] GPU Compute Time: min = 4.53687 ms, max = 6.72342 ms, mean = 4.71299 ms, median = 4.67749 ms, percentile(99%) = 6.70314 ms
[10/21/2024-21:45:33] [I] D2H Latency: min = 0.124268 ms, max = 0.170563 ms, mean = 0.126614 ms, median = 0.125732 ms, percentile(99%) = 0.168869 ms
[10/21/2024-21:45:33] [I] Total Host Walltime: 3.01887 s
[10/21/2024-21:45:33] [I] Total GPU Compute Time: 2.86078 s
[10/21/2024-21:45:33] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/21/2024-21:45:33] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8003] # trtexec --loadEngine=weights/model_bs4_fp16_1.5GB_CUBLAS_LT_CUDNN.trt --shapes=audio_seqs__0:4x1x80x16,img_seqs__1:4x6x256x256
[10/21/2024-21:45:33] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1501, GPU 1525 (MiB)
Note I set max_workspace_size=1.7G and fp16=True in the polygraphy convertsion pyhton script, but the log of trtexec shows
When load engine to infer, the Workspace is use defaultWorkspace, Precision use user provide para, which both belong to Build Options. So the results of the program(polygraphy and trtexec) are normal.
I use polygraphy to convert onnx model to tensorrt :
its log is :
My first question is why I have set
tactic_sources = ['CUBLAS_LT', 'CUDNN']
( I think my model is fp16, so I don't need CUBLAS, I want to reduce gpu memoey consumption), from the log, it still uses the CUBLAS tactic source ?Then I use trtexec to run the converted trt model :
trtexec --loadEngine=weights/model_bs4_fp16_1.7GB_CUBLAS_LT_CUDNN.trt --shapes=audio_seqs__0:4x1x80x16,img_seqs__1:4x6x256x256
, the log is :Note I set
max_workspace_size=1.7G
andfp16=True
in the polygraphy convertsion pyhton script, but the log of trtexec shows, why they are inconsistent ?
The text was updated successfully, but these errors were encountered: