-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICudaEngine.getTensorShape function gets the output dimension of the model that contains -1. #4120
Comments
The presence of -1 in the output dimension indicates a dynamic shape. In your specific case, to resolve the dynamic shape, try to run inference with actual input data and then query the output shape again. |
TensorRT inference needs to pass in the addresses of the input and output data, if I don't know the dimensions of the output data, how do I allocate memory space for the output data? in another word, I have to know the output data dimensions before I run inference. could you please give specific usage? |
How about this example? (EDIT: Sorry, the "try to run inference with actual input data" might have been confusing without an example)
|
The getBindingDimensions api has been removed on TensorRT 10.3, and replaced with the getTensorShape function. Going back to the problem at the beginning, getTensorShape gets that the output dimension contains -1. how can I get the true output dimension of the model? |
I found that I don't have this problem with onnx models exported using torch 1.12 and opset 16, but I have this problem with onnx models exported using torch 1.13 and opset 17. |
Are you able to get the true output dimensions of the model after this sequence of calls?
If not, please share the original ONNX model here and I'll instance an internal bug for someone to take over. |
I can't get the true output dimensions of the model using your sequence of calls. this is model link: |
Hi @moraxu , Has there been any progress on this issue? |
Apologies for the late reply, there were company's holidays last week. TRT may also need certain optimizations enabled to fully resolve dynamic shapes, you can enable that by using optimization profiles when building the engine. For example, when I ran:
It seems the output shapes were resolved, so maybe try using those specific profiles:
by passing them to the config. |
I also have no problem using the polygraphy command. I do get this problem when using the TensorRT C++ api, have you tried the TensorRT C++ api? |
Sorry, I had to do more digging because wasn't familiar with that part of the codebase. To unblock you for now, you can use To query for the exact size, one would have to use a class implementing |
Thank you for taking this so seriously. |
@demuxin , I believe for now you will have to use |
The value I get via getMaxOutputSize() is 24064, but this value is a bit strange. The output dimension is [1, 1000, 6], so the number of bytes should be either 1000 * 6 * sizeof(float) = 24000 or 1000 * 6 * sizeof(half) = 12000. |
Can you paste your full standalone C++ snippet or at least the part where you build the engine and query |
Our code has a relatively high degree of encapsulation, and I've intercepted key parts of it, it can't run, but it doesn't affect much. {
std::shared_ptr<nvinfer1::IBuilder> builder(nvinfer1::createInferBuilder(gLogger), destroyNV<nvinfer1::IBuilder>);
std::shared_ptr<nvinfer1::IBuilderConfig> config(builder->createBuilderConfig(), destroyNV<nvinfer1::IBuilderConfig>);
std::shared_ptr<nvinfer1::INetworkDefinition> network;
network = std::shared_ptr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(flag), destroyNV<nvinfer1::INetworkDefinition>);
std::shared_ptr<nvonnxparser::IParser> onnxParser;
onnxParser.reset(nvonnxparser::createParser(*network, gLogger), destroyNV<nvonnxparser::IParser>);
onnxParser->parse(onnxmodel, model_size);
config->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE, 1LU << 31);
std::shared_ptr<nvinfer1::IHostMemory> seridata(builder->buildSerializedNetwork(*network, *config), destroyNV<nvinfer1::IHostMemory>);
std::shared_ptr<nvinfer1::IRuntime> runtime_ = std::shared_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(gLogger), destroyNV<nvinfer1::IRuntime>);
std::shared_ptr<nvinfer1::ICudaEngine> engine_ = std::shared_ptr<nvinfer1::ICudaEngine>(runtime_->deserializeCudaEngine(seridata->data(), seridata->size()), destroyNV<nvinfer1::ICudaEngine>);
std::shared_ptr<nvinfer1::IExecutionContext> context_ = std::shared_ptr<nvinfer1::IExecutionContext>(engine_->createExecutionContext(), destroyNV<nvinfer1::IExecutionContext>);
int output_tensor_id = 1;
const char* tensor_name = engine_->getIOTensorName(output_tensor_id);
printf("--------->>>>>> %ld\n", context_->getMaxOutputSize(tensor_name));
auto dims = engine_->getTensorShape(tensor_name);
for (int j = 0; j < dims.nbDims; ++j) {
printf("%d ", dims.d[j]); // output: [1, -1, 6]
}
} |
@demuxin I also got |
yes, but when I set half precision, I still got 24064 via getMaxOutputSize(). |
This is an upper bound like I said, so feel free to divide it by |
ok, do you have any ideas about the problem of dimension-1? |
The issue with "dimension-1" in the output is caused by operator For such cases, it's simply not possible to provide a fixed output shape at build time, as the shape is determined dynamically during execution. This is why you're seeing dynamic shapes (-1) in the output shape, indicating that the dimension can only be resolved after the input data is known. |
Please simply use |
thank you for your working, But the output of |
Can I solve this problem by editing the output dimensions of the ONNX model? |
OK, I'll tag my colleague here who maintains that piece of the codebase in terms of shape inference during actual inference. He's currently on a short sick leave, apologies for the inconvenience.
No, it's just that your ONNX model exhibits data dependent shapes due to the |
Thank you in advance for your response. |
Thanks @demuxin for your patience. Thanks, @moraxu for helping out. There are 3 possible categories of output shapes:
Data dependence shapes are unknown until a layer has executed as the output shape depends on the input data rather than the input shape. Since input data is not known until a certain point in execution, hence the problem. There are two approaches:
To answer specifics,
Also, regarding why the max output size is
You can read more about Let us know if this resolves issues at your end. Thanks again @moraxu for your help! |
Thanks for your detailed answer, I'll try your method when my vacation is over. |
Hi @jhalakpatel , I read the dev guide, and don't understand how the std::unordered_map<std::string, MyOutputAllocator> allocatorMap;
for (const char* name : names of outputs)
{
Dims extent = context->getTensorShape(name);
void* ptr;
if (engine->getTensorLocation(name) == TensorLocation::kDEVICE)
{
if (extent.d contains a -1)
{
auto allocator = std::make_unique<MyOutputAllocator>();
context->setOutputAllocator(name, allocator.get());
allocatorMap.emplace(name, std::move(allocator));
}
else
{
ptr = allocate device memory per extent and format
}
}
else
{
ptr = allocate cpu memory per extent and format
}
context->setTensorAddress(name, ptr);
} Then the if it's me, how do I know what the value of size is, and if it's TensorRT, what determines the value of size. |
@demuxin Yes, TensorRT computes Your implementation would look like this:
where, the overall class could look like:
You could instantiate the class as:
Let me know how it goes. |
This solution seems to work, but the problem is that reallocation occur every inference. |
@abysslover You would only reallocate if the new size is more than existing allocated memory size.
|
@jhalakpatel Unfortunately, your suggestion did not answer my question and, in turn, did not solve the problem. I have tested the code and compared the pointer addresses before and after executing the code you initially provided with minor modifications. I will provide the solution for future readers of these comments. class OutputAllocator : public nvinfer1::IOutputAllocator {
(...)
if (mOutputPtr != nullptr) {
// Record new memory size for the newly allocated memory
mOutputSize = size;
}
if (mOutputPtr == mCurrentMemory) {
mReallocateOutputCalled = true;
}
mCurrentMemory = mOutputPtr;
return mOutputPtr;
(...)
/// Notify the shape of the tensor.
void DynamicOutputAllocator::notifyShape(char const* tensorName, nvinfer1::Dims const& dims) noexcept {
mOutputDims = dims;
} |
Hello @jhalakpatel, I read docs and your replies thoroughly but I fail to inference with dynamic shaped outputs. My pipeline goes like this
That's where I don't get it. After enqueue, output bindings of buffer are null because I have never allocated memory using Malloc. is the output data stored in memory pointed by OutputAllocator->outputPtr? @demuxin If you have any insight, I would be highly appreciated as well. |
Description
I get ICudaEngine object after model building with TensorRT. And I get output dimension of model by ICudaEngine.getTensorShape.
The output dimension contains -1. Then sometimes there is no -1, but the correct dimension value.
What determines the value of the output dimension, and under what circumstances is there a -1 in the output dimension.
This is my onnx model info:
But getTensorShape gets an output dimension of [1, -1, 6]
Environment
TensorRT Version: tensorrt10.3
NVIDIA GPU: RTX 3090
NVIDIA Driver Version: 535.183.01
CUDA Version: 12.2
Operating System: ubuntu22.04
PyTorch Version (if applicable): 1.13.0
The text was updated successfully, but these errors were encountered: