Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BertQA sample throws segementation fault (TensorRT 10.3) when running GPU Jetson Orin Nano #4220

Open
krishnarajk opened this issue Oct 23, 2024 · 8 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@krishnarajk
Copy link

krishnarajk commented Oct 23, 2024

Description

I tired running the bertQA sample in Jetson Orin nano with jetpack 6.1
I used Bert Base, because Bert Large kills itself when building the engine(may be because of memory issue).

[10/23/2024-13:27:53] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +67, now: CPU 2160, GPU 6001 (MiB)
[10/23/2024-13:27:53] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[10/23/2024-13:28:39] [TRT] [I] Detected 3 inputs and 1 output network tensors.
[10/23/2024-13:28:42] [TRT] [I] Total Host Persistent Memory: 316288
[10/23/2024-13:28:42] [TRT] [I] Total Device Persistent Memory: 110592
[10/23/2024-13:28:42] [TRT] [I] Total Scratch Memory: 0
[10/23/2024-13:28:42] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 164 steps to complete.
[10/23/2024-13:28:43] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.28999ms to assign 5 blocks to 164 nodes requiring 1378304 bytes.
[10/23/2024-13:28:43] [TRT] [I] Total Activation Memory: 1378304
[10/23/2024-13:28:43] [TRT] [I] Total Weights Memory: 170059792
[10/23/2024-13:28:43] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -1, now: CPU 2372, GPU 6707 (MiB)
[10/23/2024-13:28:43] [TRT] [I] Engine generation completed in 51.1302 seconds.
[10/23/2024-13:28:43] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 384 MiB
[10/23/2024-13:28:43] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3087 MiB
[10/23/2024-13:28:43] [TRT] [I] build engine in 52.969 Sec
[10/23/2024-13:28:44] [TRT] [I] Saving Engine to engines/bert_base_128.engine
[10/23/2024-13:28:44] [TRT] [I] Done.

The I used the inference.py, with the same sample given in the examples.
python3 inference.py -e engines/bert_base_128.engine -p "TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps." -q "What is TensorRT?" -v models/fine-tuned/bert_tf_ckpt_base_qa_squad2_amp_128_v19.03.1/vocab.txt
It throws segmenation fault
`
[10/23/2024-13:30:07] [TRT] [I] Loaded engine size: 208 MiB
[10/23/2024-13:30:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +8, GPU +70, now: CPU 317, GPU 4590 (MiB)
[10/23/2024-13:30:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +64, now: CPU 109, GPU 4379 (MiB)
[10/23/2024-13:30:08] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 163 (MiB)

Passage: TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps.

Question: What is TensorRT?
Segmentation fault (core dumped)
`
** https://github.com/NVIDIA/TensorRT/tree/release/10.3/demo/BERT#model-overview
** I dont use the OSS container, but installed these on device
Image

Please help me over here.

Environment

TensorRT Version: 10.3

NVIDIA GPU: Amper, Jetson Orin nano

NVIDIA Driver Version: Jetpack 6.1

CUDA Version: 12.6

CUDNN Version:

Operating System: 22.04

Python Version (if applicable): 3.10

@krishnarajk krishnarajk changed the title BertQA sample throws segementation fault on TensorRT 10.3 when running GPU Jetson Orin Nano BertQA sample throws segementation fault (TensorRT 10.3) when running GPU Jetson Orin Nano Oct 23, 2024
@lix19937
Copy link

You can use tegrastats to watch the RAM usage situation when trtexec load engine to infer.

@krishnarajk
Copy link
Author

krishnarajk commented Oct 24, 2024

This is the RAM usage when I run the inference.
`0-24-2024 14:19:16 RAM 4518/7620MB (lfb 1x1MB) CPU [8%@729,21%@729,10%@729,8%@729,100%@1510,10%@1510] GR3D_FREQ 76% [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] VDD_IN 5291mW/4526mW VDD_CPU_GPU_CV 1500mW/997mW VDD_SOC 1500mW/1358mW

10-24-2024 14:19:17 RAM 4516/7620MB (lfb 1x4MB) CPU [6%@1510,6%@1510,5%@1510,6%@1510,100%@1510,8%@1510] GR3D_FREQ 99% [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] VDD_IN 4864mW/4538mW VDD_CPU_GPU_CV 1263mW/1006mW VDD_SOC 1461mW/1361mW

10-24-2024 14:19:18 RAM 4517/7620MB (lfb 1x4MB) CPU [11%@729,12%@729,17%@729,9%@729,99%@1510,3%@1510] GR3D_FREQ 0% [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] VDD_IN 4746mW/4545mW VDD_CPU_GPU_CV 1145mW/1011mW VDD_SOC 1421mW/1363mW
`
Is shortage of memory causes segmentation issue in this case?

@krishnarajk
Copy link
Author

I also tried increasing the swap memory by 4GB

@lix19937
Copy link

Can you try to use trtexec load engine to infer ?

@krishnarajk
Copy link
Author

krishnarajk commented Oct 27, 2024

how can i do that? I am new tensorrt and was trying to run the sample application. I have tensorrt installed on my container
but it is showing

trtexec --help
bash: trtexec: command not found

dpkg -l|grep -i tensorrt
ii  libnvinfer-dev                       10.3.0.26-1+cuda12.5                    arm64        TensorRT development libraries
ii  libnvinfer-dispatch-dev              10.3.0.26-1+cuda12.5                    arm64        TensorRT development dispatch runtime libraries
ii  libnvinfer-dispatch10                10.3.0.26-1+cuda12.5                    arm64        TensorRT dispatch runtime library
ii  libnvinfer-headers-dev               10.3.0.26-1+cuda12.5                    arm64        TensorRT development headers
ii  libnvinfer-headers-plugin-dev        10.3.0.26-1+cuda12.5                    arm64        TensorRT plugin headers
ii  libnvinfer-lean-dev                  10.3.0.26-1+cuda12.5                    arm64        TensorRT lean runtime libraries
ii  libnvinfer-lean10                    10.3.0.26-1+cuda12.5                    arm64        TensorRT lean runtime library
ii  libnvinfer-plugin-dev                10.3.0.26-1+cuda12.5                    arm64        TensorRT plugin libraries
ii  libnvinfer-plugin10                  10.3.0.26-1+cuda12.5                    arm64        TensorRT plugin libraries
ii  libnvinfer-vc-plugin-dev             10.3.0.26-1+cuda12.5                    arm64        TensorRT vc-plugin library
ii  libnvinfer-vc-plugin10               10.3.0.26-1+cuda12.5                    arm64        TensorRT vc-plugin library
ii  libnvinfer10                         10.3.0.26-1+cuda12.5                    arm64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                 10.3.0.26-1+cuda12.5                    arm64        TensorRT ONNX libraries
ii  libnvonnxparsers10                   10.3.0.26-1+cuda12.5                    arm64        TensorRT ONNX libraries
ii  python3-libnvinfer                   10.3.0.26-1+cuda12.5                    arm64        Python 3 bindings for TensorRT standard runtime

@lix19937
Copy link

trtexec --onnx=your_onnx_file --verbose @krishnarajk

@krishnarajk
Copy link
Author

krishnarajk commented Oct 28, 2024

This is for loading the onnx model right? How do i do inference with a engine, using trtexec?

i tired
/trtexec --loadEngine=/TensorRT/demo/BERT/engines/bert_base_128.engine --verbose

and gets this as log

[10/28/2024-21:14:18] [I] === Performance summary ===
[10/28/2024-21:14:18] [I] Throughput: 171.08 qps
[10/28/2024-21:14:18] [I] Latency: min = 5.69727 ms, max = 10.7233 ms, mean = 5.87895 ms, median = 5.71912 ms, percentile(90%) = 5.73627 ms, percentile(95%) = 7.51221 ms, percentile(99%) = 8.71997 ms
[10/28/2024-21:14:18] [I] Enqueue Time: min = 0.673218 ms, max = 1.84448 ms, mean = 1.21463 ms, median = 1.21777 ms, percentile(90%) = 1.3186 ms, percentile(95%) = 1.34717 ms, percentile(99%) = 1.51196 ms
[10/28/2024-21:14:18] [I] H2D Latency: min = 0.0236206 ms, max = 1.71045 ms, mean = 0.0453783 ms, median = 0.0424805 ms, percentile(90%) = 0.0490723 ms, percentile(95%) = 0.0534668 ms, percentile(99%) = 0.067749 ms
[10/28/2024-21:14:18] [I] GPU Compute Time: min = 5.65784 ms, max = 10.6769 ms, mean = 5.82668 ms, median = 5.66943 ms, percentile(90%) = 5.677 ms, percentile(95%) = 7.46094 ms, percentile(99%) = 8.66394 ms
[10/28/2024-21:14:18] [I] D2H Latency: min = 0.00488281 ms, max = 0.00933838 ms, mean = 0.00689694 ms, median = 0.00695801 ms, percentile(90%) = 0.00805664 ms, percentile(95%) = 0.00830078 ms, percentile(99%) = 0.00897217 ms
[10/28/2024-21:14:18] [I] Total Host Walltime: 3.02199 s
[10/28/2024-21:14:18] [I] Total GPU Compute Time: 3.01239 s
[10/28/2024-21:14:18] [W] * GPU compute time is unstable, with coefficient of variance = 10.5757%.
[10/28/2024-21:14:18] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[10/28/2024-21:14:18] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/28/2024-21:14:18] [V] 
[10/28/2024-21:14:18] [V] === Explanations of the performance metrics ===
[10/28/2024-21:14:18] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[10/28/2024-21:14:18] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[10/28/2024-21:14:18] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/28/2024-21:14:18] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/28/2024-21:14:18] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[10/28/2024-21:14:18] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[10/28/2024-21:14:18] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[10/28/2024-21:14:18] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[10/28/2024-21:14:18] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # ./trtexec --loadEngine=/home/vwif/Documents/thesis/TensorRT/demo/BERT/engines/bert_base_128.engine --verbose

I hope this means the engine doesnt have any probelm. but i still i have the segmentation fault when i try to run the sample inference.py

@lix19937
Copy link

This is for loading the onnx model right? How do i do inference with a engine, using trtexec?

trtexec --onnx=your_onnx_file --verbose --saveEngine=your_plan will load onnx to build and then infer with engine.

I hope this means the engine doesnt have any probelm. but i still i have the segmentation fault when i try to run the sample inference.py

It maybe your code has bug. You can use trtexec to get engine file, then use follow py https://github.com/lix19937/tensorrt-insight/blob/main/tool/infer_from_engine.py @krishnarajk

@yuanyao-nv yuanyao-nv added the triaged Issue has been triaged by maintainers label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants