TensorRT: Cannot set bindings for dynamic shapes

Description

Hi, I somewhat cannot get the following script running. Apparently the error originates from trying to bind a dynamic shape. Weirdly the error only pops up in TensorRT8 on my laptop - I had an almost similar script running with TensorRT7 on an AGX Xavier without any hazzle:

trt.py:

import tensorflow as tf import keras2onnx import tensorrt as trt import numpy as np import pycuda.driver as cuda import pycuda.autoinit import time tf.compat.v1.disable_v2_behavior() INPUT_NAME = 'input' BATCH_SIZE = 100 LEN_INPUT = 5 LEN_OUTPUT = 3 TF_FILE_PATH = './model.h5' ONNX_FILE_PATH = './model.onnx' TRT_LOGGER = trt.Logger(trt.Logger.WARNING) def convert_to_onnx(): model = tf.keras.models.load_model(TF_FILE_PATH) onnx_model = keras2onnx.convert_keras(model) keras2onnx.save_model(onnx_model, ONNX_FILE_PATH) def build_engine(): network_creation_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) # initialize TensorRT engine and parse ONNX model builder = trt.Builder(TRT_LOGGER) network = builder.create_network(network_creation_flag) parser = trt.OnnxParser(network, TRT_LOGGER) config = builder.create_builder_config() profile = builder.create_optimization_profile() # set input shape min/opt/max for optimization profile profile.set_shape(INPUT_NAME, (BATCH_SIZE, LEN_INPUT), \ (BATCH_SIZE, LEN_INPUT), (BATCH_SIZE, LEN_INPUT)) print('Profile shape: ' + str(profile.get_shape(INPUT_NAME))) add_succ = config.add_optimization_profile(profile) print('Added optimization profile successfully: ' + str(add_succ)) # specify batch size for builder builder.max_batch_size = BATCH_SIZE # parse ONNX print('Beginning ONNX file parsing') with open(ONNX_FILE_PATH, 'rb') as model: parser.parse(model.read()) print('Completed parsing of ONNX file') # generate TensorRT engine optimized for the target platform print('Building an engine...') engine = builder.build_engine(network, config=config) context = engine.create_execution_context() print("Completed creating Engine") # set input dimensions at runtime print('Engine binding shape: ' + str(engine.get_profile_shape(profile_index=0, binding=0))) context.set_binding_shape(0, (BATCH_SIZE, LEN_INPUT)) print('Binding set') return engine, context def inference(engine, context): d_input = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize) d_output = cuda.mem_alloc(BATCH_SIZE * LEN_INPUT * np.dtype(np.float32).itemsize) bindings = [int(d_input), int(d_output)] stream = cuda.Stream() t1 = time.time() for i in range(500): input = np.random.random((BATCH_SIZE,LEN_INPUT)).astype(np.float32) output = np.random.random((BATCH_SIZE,LEN_OUTPUT)).astype(np.float32) cuda.memcpy_htod_async(d_input, input, stream) context.execute_async(BATCH_SIZE, bindings, stream.handle, None) cuda.memcpy_dtoh_async(output, d_output, stream) stream.synchronize() #print ("Prediction: " + str(output)) print('Total execution time: ' + str(time.time() - t1)) if __name__ == '__main__': convert_to_onnx() engine, context = build_engine() inference(engine, context) 

console output:

2021-06-21 09:06:08.543047: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:tensorflow:From /home/bcaie/environments/tensorrt_test/lib/python3.8/site-packages/tensorflow/python/keras/initializers/initializers_v1.py:47: calling RandomNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor 2021-06-21 09:06:09.918908: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-06-21 09:06:09.919050: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-06-21 09:06:09.919222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-21 09:06:09.919590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA Quadro M2000M computeCapability: 5.0 coreClock: 1.137GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 74.65GiB/s 2021-06-21 09:06:09.919607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-06-21 09:06:09.919677: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-06-21 09:06:09.919748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-06-21 09:06:09.920580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-06-21 09:06:09.920613: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-06-21 09:06:09.920770: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64:/home/bcaie/xavier_hybrid_models/catkin_ws/devel/lib:/opt/ros/noetic/lib 2021-06-21 09:06:09.921560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-06-21 09:06:09.921591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-06-21 09:06:09.921599: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-06-21 09:06:09.922140: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-06-21 09:06:09.922166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-06-21 09:06:09.922175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 2021-06-21 09:06:09.925542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes) 2021-06-21 09:06:09.926639: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2799925000 Hz tf executing eager_mode: False tf.keras model eager_mode: False The ONNX operator number change on the optimization: 13 -> 8 The maximum opset needed by this model is only 9. None Profile shape: [(100, 5), (100, 5), (100, 5)] Added optimization profile successfully: 0 Beginning ONNX file parsing Completed parsing of ONNX file Building an engine... trt.py:56: DeprecationWarning: Use build_serialized_network instead. engine = builder.build_engine(network, config=config) [TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead Completed creating Engine Engine binding shape: [(0, 5), (0, 5), (0, 5)] [TensorRT] ERROR: [executionContext.cpp::setBindingDimensions::954] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::954, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [100,5] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 0, minimum dimension in profile is 0, but supplied dimension is 100. ) Binding set Total execution time: 0.014324188232421875 terminate called after throwing an instance of 'nvinfer1::CudaDriverError' what(): TensorRT internal error Aborted (core dumped) 

Environment

TensorRT Version: 8.0.0.3
GPU Type: Quadro M2000
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.0.53
Operating System + Version: Ubuntu 20.04.1
Python Version (if applicable): 3.8.5
TensorFlow Version (if applicable): 2.5.0
Baremetal or Container (if container which image + tag): Baremetal + python virtualenv

Relevant Files

I’ve added both the original .h5 file from keras and the converted .onnx file together with the netron outputs (if it helps)
model.h5 (33.8 KB)
model.onnx (1.8 KB)


Steps To Reproduce

python3 trt.py

Hi,
This looks like a Jetson issue. We recommend you to raise it to the respective platform from the below link

Thanks!

thanks, but it is NOT a Jetson issue. The code runs fine (in a slightly modified version) on a Jetson with TensorRT7 but fails on a HP ZBook with the configuration described above.

EDIT: I was wondering if it is related to the name of the input. It is currently set to “input” but the .onnx file has “dense_input” as input. However, the script crashes if I change the input to “dense_input”

thanks, I was able to fix the problem:

  • set input to “dense_input”
  • make first dimension of input shape dynamic using graphsurgeon