-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
converting to TensorRT barely increases performance #3646
Comments
Normally it's cause by how your measure the perf, could you please try get a perf summary using trtexec? usage would be like |
Also make sure you feed the some shape of input to trt and pytorch, also disable dynamic shape is much fair to trt. |
Hi @zerollzeng I apologize, a batch of 8 in regular pytorch takes about 160ms, I used time.time() to measure, but I placed it in the wrong place perhaps... now for testing I have converted 2 tensorRT engines with static batches of 1 and 8 so now updated values: batch of 8 pytorch 160ms So I apologize, I will change the title, there is not a performance regression. TensorRT is advertised as being able to speed up inference by 5x, but I'm just not seeing results... So I would like to keep this issue opened and perhaps you know what should be done to make inference faster? Is it the engine conversion part or is special inference code needed to utilize the GPU the best possible? |
@zerollzeng I understand that this might be a very difficult task, but perhaps you can share your experience at which direction should I be looking at? Should I look into learning cuda programming? Shoud I learn to write custom plugins? How can I make this engine utilize more gpu resourses? Because In my logic, because batches are independent of each other, we should be able to run them in parralel at the speed of a single batch? |
@zerollzeng so now I am trying to run inference on tensorrt each batch in parallel. So I am using a static single batch engine, that has inference time of 16ms. In this example I try to inference 10 batches in parallel, so I create 10 execution contexts, 10 streams, for each context I allocate separately input and output memory, as well as do execution on all of them and synchronize only at the end. This is the inference code: ` batch = 10
` If you try to reproduce you should get that the first part before synchronize completes quickly, but executing each synchronize takes the same as normal sequential execution. So I cannot understand, are enqueue_v2 and execution contexts still not running in parallel? What do I do to make my engine faster, because when using it in a real time app tracking multiple objects really slows everything down. Or perhaps the inference needs to be coded in c++, or this will also give the same results? onnx model used here: https://drive.google.com/file/d/1U_djLvIbDYv-Fxh60_coB7H9twPfcmP7/view?usp=sharing code to convert from onnx to tensorrt: ` logger = trt.Logger(trt.Logger.WARNING)
|
Usually mean GPU is fully utilized.
Try FP16/INT8, e.g. run with |
I try fp16, 0nly 7% of GPU is utilized. In the case, how to improve the performance? |
increase batch size or use multi thread? |
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all! |
Description
Hello everyone
I am working on a pytorch object tracking model to convert it to tensorrt for faster inference
When inferencing tensorrt with a single batch the model is about 2x faster, but when adding batches, it becomes SLOWER
batch of 1 inference time:
pytorch - 40ms
tensorrt - 20ms
batch of 8 inference time:
pytorch - 160ms
tensorrt - 100ms
Shouldn't tensorrt be 5x faster? What can be done to improve this?
I have exported the model with batch size 1 and batch size 4 to nsight systems:
1 batch inference test: https://drive.google.com/file/d/1achvISpSc1pvlV2RLfSNLxCLlRsZHcnT/view?usp=sharing
4 batch inference test: https://drive.google.com/file/d/1ZuHsO28LIlETNIcWk6lh7miv2Lovco9D/view?usp=sharing
Can anybody help, why is the speed regression happening?
this is the code that I use to build the engine:
` import pycuda.driver as cuda
import pycuda.autoinit
`
and I use polygraphy for inference:
`from polygraphy.backend.trt import EngineFromNetwork, NetworkFromOnnxPath, TrtRunner
`
perhaps I need to convert the engine differently?
or maybe running inference with polygraphy isn't a good idea?
or perhaps the issue is elswhere?
anybody that has any idea please let me know
link to the onnx model: https://drive.google.com/file/d/1kxWaGbrk3M1slN1-v4C3524vtPeTMSCr/view?usp=sharing
Thank you
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: GTX 1660 Ti
NVIDIA Driver Version: 546.01
CUDA Version: 12.1
CUDNN Version: 8.9.7
Operating System:
Python Version (if applicable): 3.10.13
PyTorch Version (if applicable): 2.1.2+cu121
The text was updated successfully, but these errors were encountered: