Dynamic Shapes

Description

Sometimes I get models from others on my team which I need to convert to onnx and then run inference on to measure some performance metrics. I notice that sometimes the models have an dynamic shape on the input tensor but I run my metrics on fixed shapes. For example, I’ve received models with tensor shape (?, C, H, W)

In those cases, C, H, and W are fixed but the first dimension is not defined in the onnx file model (though I already know the fixed value I want to run inference with). I’ve noticed that if I want to use these with an enhanced optimization engine like TensorRT, this can cause a problem where in order to use the model I have to perform some intermediary steps where I have both a preprocessor an prediction engine in order to reshape and run inference.

Correct me if I am wrong, but I believe this could add overhead that would be desired to be avoided. If the inference fixed shape is known, is it possible to build the engine in such a way that it can be serialized, saved, and loaded afterward for inference without having to do reshaping?

Environment

TensorRT Version: 7
CUDA Version: 10.2
CUDNN Version: 7.6
Operating System + Version: Windows 10 64-bit

Please refer to below link for working with dynamic shapes:

You can fine tune model using optimization profiles to specific input dim range

Thanks

@SunilJB So I tried to follow the example provided by TensorRT, but I seem to have run into a problem. The model I am trying to work with is: models/vision/super_resolution/sub_pixel_cnn_2016 at main · onnx/models · GitHub

The code I have implemented:

samplesCommon::OnnxSampleParams initializeSampleParams() { samplesCommon::OnnxSampleParams params; params.dataDirs.push_back("data/mnist/"); params.dataDirs.push_back("data/samples/mnist/"); /*params.onnxFileName = "mnist.onnx"; params.inputTensorNames.push_back("Input3"); params.outputTensorNames.push_back("Plus214_Output_0");*/ params.onnxFileName = "super_resolution.onnx"; params.inputTensorNames.push_back("input"); params.outputTensorNames.push_back("output"); params.batchSize = 1; params.int8 = false; params.fp16 = false; return params; } void SampleDynamicReshape::build() { auto builder = makeUnique(nvinfer1::createInferBuilder(gLogger.getTRTLogger())); // This function will also set mPredictionInputDims and mPredictionOutputDims, // so it needs to be called before building the preprocessor. buildPredictionEngine(builder); buildPreprocessorEngine(builder); } void SampleDynamicReshape::buildPreprocessorEngine(const SampleUniquePtr<nvinfer1::IBuilder>& builder) { // Create the preprocessor engine using a network that supports full dimensions (createNetworkV2). auto preprocessorNetwork = makeUnique( builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH))); // Reshape a dynamically shaped input to the size expected by the model, (1, 1, 28, 28). //auto input = preprocessorNetwork->addInput("input", nvinfer1::DataType::kFLOAT, Dims4{ 1, 1, -1, -1 }); auto input = preprocessorNetwork->addInput("input", nvinfer1::DataType::kFLOAT, Dims4{ -1, 1, 1, 1 }); auto resizeLayer = preprocessorNetwork->addResize(*input); resizeLayer->setOutputDimensions(mPredictionInputDims); preprocessorNetwork->markOutput(*resizeLayer->getOutput(0)); // Finally, configure and build the preprocessor engine. auto preprocessorConfig = makeUnique(builder->createBuilderConfig()); // Create an optimization profile so that we can specify a range of input dimensions. auto profile = builder->createOptimizationProfile(); // This profile will be valid for all images whose size falls in the range of [(1, 1, 1, 1), (1, 1, 224, 224)] // but TensorRT will optimize for (1, 1, 28, 28) /*profile->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{ 1, 1, 1, 1}); profile->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{ 1, 1, 28, 28 }); profile->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{ 1, 1, 56, 56 });*/ profile->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{ 1, 1, 1, 1 }); profile->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{ 1, 1, 224, 224 }); profile->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{ 1, 1, 224, 224}); preprocessorConfig->addOptimizationProfile(profile); mPreprocessorEngine = makeUnique(builder->buildEngineWithConfig(*preprocessorNetwork, *preprocessorConfig)); gLogInfo << "Profile dimensions in preprocessor engine:\n"; gLogInfo << " Minimum = " << mPreprocessorEngine->getProfileDimensions(0, 0, OptProfileSelector::kMIN) << '\n'; gLogInfo << " Optimum = " << mPreprocessorEngine->getProfileDimensions(0, 0, OptProfileSelector::kOPT) << '\n'; gLogInfo << " Maximum = " << mPreprocessorEngine->getProfileDimensions(0, 0, OptProfileSelector::kMAX) << std::endl; 

}

void SampleDynamicReshape::buildPredictionEngine(const SampleUniquePtr<nvinfer1::IBuilder>& builder) { // Create a network using the parser. const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH); auto network = makeUnique(builder->createNetworkV2(explicitBatch)); auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger()); bool parsingSuccess = parser->parseFromFile( locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast<int>(gLogger.getReportableSeverity())); if (!parsingSuccess) { throw std::runtime_error{ "Failed to parse model" }; } /*// Attach a softmax layer to the end of the network. auto softmax = network->addSoftMax(*network->getOutput(0)); // Set softmax axis to 1 since network output has shape [1, 10] in full dims mode softmax->setAxes(1 << 1); network->unmarkOutput(*network->getOutput(0)); network->markOutput(*softmax->getOutput(0));*/ // Get information about the inputs/outputs directly from the model. mPredictionInputDims = network->getInput(0)->getDimensions(); mPredictionOutputDims = network->getOutput(0)->getDimensions(); // Create a builder config auto config = makeUnique(builder->createBuilderConfig()); config->setMaxWorkspaceSize(16_MiB); if (mParams.fp16) { config->setFlag(BuilderFlag::kFP16); } if (mParams.int8) { config->setFlag(BuilderFlag::kINT8); samplesCommon::setAllTensorScales(network.get(), 127.0f, 127.0f); } // Build the prediciton engine. mPredictionEngine = makeUnique(builder->buildEngineWithConfig(*network, *config)); 

}

So the model seems to load and input/output is read, but I get an error on the following line:
mPredictionEngine = makeUnique(builder->buildEngineWithConfig(*network, *config));

[E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[E] [TRT] Network validation failed.

Seems that the right input shape is detected:

Hi,

The input dimension of the model is "input: [ batch_size,1,224,224]
Since only batch size is only dynamic element, if you try changing other element it will fail.

trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x1x1 --optShapes=input:1x1x28x28 --maxShapes=input:1x1x56x56

[06/23/2020-04:58:53] [E] [TRT] input: for dimension number 2 in profile 0 does not match network definition (got min=1, opt=28, max=56), expected min=opt=max=224).
[06/23/2020-04:58:53] [E] [TRT] Network validation failed.
[06/23/2020-04:58:53] [E] Engine creation failed
[06/23/2020-04:58:53] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x1x1 --optShapes=input:1x1x28x28 --maxShapes=input:1x1x56x56

You have to use opt profile something like this:
trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x224x224 --optShapes=input:16x1x224x224 --maxShapes=input:32x1x224x224
[06/23/2020-05:04:32] [I] GPU Compute
[06/23/2020-05:04:32] [I] min: 16.8679 ms
[06/23/2020-05:04:32] [I] max: 17.492 ms
[06/23/2020-05:04:32] [I] mean: 17.2015 ms
[06/23/2020-05:04:32] [I] median: 17.2153 ms
[06/23/2020-05:04:32] [I] percentile: 17.4838 ms at 99%
[06/23/2020-05:04:32] [I] total compute time: 3.04467 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=super-resolution-10.onnx --explicitBatch --verbose --minShapes=input:1x1x224x224 --optShapes=input:16x1x224x224 --maxShapes=input:32x1x224x224

Thanks

@SunilJB Thanks for the reply.

So as a more general question, when would one use trtexec versus using the C++ API? Would using trtexec the way you did end up creating a saved serialized engine that you can later load from file using the C++ API?

Thanks for the clarification on the specific dimensions that can be changed for this example. I tried using your shapes for min, opt, and max, but I still get the same error on line:

mPredictionEngine = makeUnique(builder->buildEngineWithConfig(*network, *config)); 

I get the following on the console:
[E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[E] [TRT] Network validation failed.

In fact, this error occurs before even entering the buildPreprocessorEngine(builder) method, which is where the shapes are specified.

The order of the calling is:
void SampleDynamicReshape::build()
{
auto builder = makeUnique(nvinfer1::createInferBuilder(gLogger.getTRTLogger()));

 // This function will also set mPredictionInputDims and mPredictionOutputDims, // so it needs to be called before building the preprocessor. buildPredictionEngine(builder); buildPreprocessorEngine(builder); } 

The error occurs within the buildPredictionEngine(builder) call, and an optimization profile is not specified until the buildPreprocessorEngine(builder) call.

I am taking this from the Github example: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleDynamicReshape

It seems that I must be doing something out of order, but it looks like the example does it the same way.

trtexec is a tool to quickly utilize TensorRT without having to develop your own application
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can add --saveEngine=<filename> argument in trtexec command to save the generated engine.

Please refer to below link for more details about the sample:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleDynamicReshape#creating-the-preprocessing-network

Thanks

@SunilJB Is the serialized engine that’s saved supposed to be one of a fixed size or dynamic size? I tried it and it seems that the input and output tensor dimensions are still dynamically-shaped.