Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet)

nhorro · June 6, 2022, 12:25am

Hello,

I’m following the instructions from image classification tutorial with Resnet from the NVIDIA TAO CV samples on a custom dataset of six classes. The notebook and the site documentation describe how to export and deploy to Deepstream SDK but not how to export and deploy to Triton Inference Server.

The Triton documentation specifies that TensorRT models for Triton Inference Server must have this format:

<model-repository-path>/ <model-name>/ config.pbtxt 1/ model.plan

But after the procedure is completely executed the output consist of the following files:

!ls $TAO_EXPERIMENTS_DIR/classification/export calibration.tensor final_model_int8_cache.bin final_model.etlt final_model.trt

Questions:

How is config.pbtxt generated?
Is there an automatic tool or a tao tool to inspect an optimized model architecture to obtain the last layer name, format, etc.? Trying with option ‘–strict-model-config=false’ gives an error (details below).
Is it correct to copy and rename final_model.trt to model.plan to the Triton model repository?
Does Triton Server support final_model.etlt?
Any reference to documentation or tutorial is appreciated.

Details:

• Hardware: NVIDIA RTX3070MaxQ.
• Network Type (Classification)
• TLT Version: 3.22.02
• Triton Server Version: 22.05
• Training spec file:

model_config { arch: "resnet", n_layers: 18 # Setting these parameters to true to match the template downloaded from NGC. use_batch_norm: true all_projections: true freeze_blocks: 0 freeze_blocks: 1 input_image_size: "3,224,224" } train_config { train_dataset_path: "/workspace/tao-experiments/data/split/train" val_dataset_path: "/workspace/tao-experiments/data/split/val" pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5" optimizer { sgd { lr: 0.01 decay: 0.0 momentum: 0.9 nesterov: False } } batch_size_per_gpu: 64 n_epochs: 80 n_workers: 16 preprocess_mode: "caffe" enable_random_crop: True enable_center_crop: True label_smoothing: 0.0 mixup_alpha: 0.1 # regularizer reg_config { type: "L2" scope: "Conv2D,Dense" weight_decay: 0.00005 } # learning_rate lr_config { step { learning_rate: 0.006 step_size: 10 gamma: 0.1 } } } eval_config { eval_dataset_path: "/workspace/tao-experiments/data/split/test" model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt" top_k: 3 batch_size: 256 n_workers: 8 enable_center_crop: True }

Details for the error when running with ‘–strict-model-config=false’:

Command:

export TRITON_SERVER_IMAGE="nvcr.io/nvidia/tritonserver:22.05-py3" docker run --gpus 1 --rm \ --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -p 8000:8000 -p 8001:8001 -p 8002:8002 \ -v"$PWD/model_repository":/models \ $TRITON_SERVER_IMAGE /bin/bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose=1"

Result:

============================= == Triton Inference Server == ============================= NVIDIA Release 22.05 (build 38317651) Triton Server Version 2.22.0 Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license WARNING: CUDA Minor Version Compatibility mode ENABLED. Using driver version 510.73.05 which has support for CUDA 11.6. This container was built with CUDA 11.7 and will be run in Minor Version Compatibility mode. CUDA Forward Compatibility is preferred over Minor Version Compatibility for use with this container but was unavailable: [[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]] See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. I0606 00:43:20.225971 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc61c000000' with size 268435456 I0606 00:43:20.226203 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0606 00:43:20.226517 1 model_config_utils.cc:645] Server side auto-completed config: name: "clasificador1" platform: "tensorrt_plan" default_model_filename: "model.plan" backend: "tensorrt" I0606 00:43:20.227006 1 model_repository_manager.cc:1191] loading: clasificador1:1 I0606 00:43:20.327700 1 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4 I0606 00:43:20.327769 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so I0606 00:43:20.452357 1 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt I0606 00:43:20.452377 1 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9 I0606 00:43:20.452382 1 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9 I0606 00:43:20.452384 1 tensorrt.cc:5333] Registering TensorRT Plugins I0606 00:43:20.452392 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1 I0606 00:43:20.452395 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1 I0606 00:43:20.452398 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 I0606 00:43:20.452401 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1 I0606 00:43:20.452404 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1 I0606 00:43:20.452407 1 logging.cc:52] Registered plugin creator - ::CropAndResizeDynamic version 1 I0606 00:43:20.452410 1 logging.cc:52] Registered plugin creator - ::DecodeBbox3DPlugin version 1 I0606 00:43:20.452413 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1 I0606 00:43:20.452416 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1 I0606 00:43:20.452420 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 I0606 00:43:20.452423 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1 I0606 00:43:20.452426 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1 I0606 00:43:20.452429 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1 I0606 00:43:20.452432 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1 I0606 00:43:20.452436 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1 I0606 00:43:20.452438 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1 I0606 00:43:20.452574 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1 I0606 00:43:20.452580 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1 I0606 00:43:20.452583 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1 I0606 00:43:20.452588 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1 I0606 00:43:20.452591 1 logging.cc:52] Registered plugin creator - ::DMHA version 1 I0606 00:43:20.452594 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1 I0606 00:43:20.452596 1 logging.cc:52] Registered plugin creator - ::NMSDynamic_TRT version 1 I0606 00:43:20.452600 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1 I0606 00:43:20.452603 1 logging.cc:52] Registered plugin creator - ::PillarScatterPlugin version 1 I0606 00:43:20.452607 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1 I0606 00:43:20.452610 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1 I0606 00:43:20.452621 1 logging.cc:52] Registered plugin creator - ::Proposal version 1 I0606 00:43:20.452624 1 logging.cc:52] Registered plugin creator - ::ProposalDynamic version 1 I0606 00:43:20.452628 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1 I0606 00:43:20.452631 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1 I0606 00:43:20.452635 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1 I0606 00:43:20.452638 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1 I0606 00:43:20.452643 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1 I0606 00:43:20.452646 1 logging.cc:52] Registered plugin creator - ::ScatterND version 1 I0606 00:43:20.452653 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1 I0606 00:43:20.452656 1 logging.cc:52] Registered plugin creator - ::Split version 1 I0606 00:43:20.452660 1 logging.cc:52] Registered plugin creator - ::VoxelGeneratorPlugin version 1 I0606 00:43:20.452664 1 tensorrt.cc:5353] backend configuration: {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} I0606 00:43:20.452713 1 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: clasificador1 (version 1) I0606 00:43:20.453274 1 model_config_utils.cc:1597] ModelConfig 64-bit fields: I0606 00:43:20.453280 1 model_config_utils.cc:1599]	ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds I0606 00:43:20.453281 1 model_config_utils.cc:1599]	ModelConfig::dynamic_batching::max_queue_delay_microseconds I0606 00:43:20.453283 1 model_config_utils.cc:1599]	ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds I0606 00:43:20.453284 1 model_config_utils.cc:1599]	ModelConfig::ensemble_scheduling::step::model_version I0606 00:43:20.453285 1 model_config_utils.cc:1599]	ModelConfig::input::dims I0606 00:43:20.453287 1 model_config_utils.cc:1599]	ModelConfig::input::reshape::shape I0606 00:43:20.453288 1 model_config_utils.cc:1599]	ModelConfig::instance_group::secondary_devices::device_id I0606 00:43:20.453290 1 model_config_utils.cc:1599]	ModelConfig::model_warmup::inputs::value::dims I0606 00:43:20.453291 1 model_config_utils.cc:1599]	ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim I0606 00:43:20.453294 1 model_config_utils.cc:1599]	ModelConfig::optimization::cuda::graph_spec::input::value::dim I0606 00:43:20.453295 1 model_config_utils.cc:1599]	ModelConfig::output::dims I0606 00:43:20.453296 1 model_config_utils.cc:1599]	ModelConfig::output::reshape::shape I0606 00:43:20.453298 1 model_config_utils.cc:1599]	ModelConfig::sequence_batching::direct::max_queue_delay_microseconds I0606 00:43:20.453299 1 model_config_utils.cc:1599]	ModelConfig::sequence_batching::max_sequence_idle_microseconds I0606 00:43:20.453300 1 model_config_utils.cc:1599]	ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds I0606 00:43:20.453302 1 model_config_utils.cc:1599]	ModelConfig::sequence_batching::state::dims I0606 00:43:20.453303 1 model_config_utils.cc:1599]	ModelConfig::sequence_batching::state::initial_state::dims I0606 00:43:20.453304 1 model_config_utils.cc:1599]	ModelConfig::version_policy::specific::versions I0606 00:43:20.909592 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +464, GPU +0, now: CPU 485, GPU 2821 (MiB) I0606 00:43:20.917427 1 logging.cc:49] Loaded engine size: 10 MiB E0606 00:43:20.919569 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43) E0606 00:43:20.919583 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

Thanks in advance.
Kind regards,

Nicolás

Morganh · June 6, 2022, 1:06pm

For classification inference in triton, please refer to classification section in GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton
https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/docs/configuring_the_client.md#classification

Refer to TAO unet input and output tensor shapes and order - #3 by Morganh

Yes, you can rename.

It is needed to generate tensorrt engine(i.e., model.plan)
See https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/scripts/download_and_convert.sh#L30

nhorro · June 6, 2022, 3:01pm

Thanks for the fast and clear answer. Tagged as solution.
Best wishes,

Nicolás

nhorro · June 7, 2022, 2:42am

Update. Some additional details in case someone finds them useful.

Regarding the error:

E0606 00:43:20.919569 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43) E0606 00:43:20.919583 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

It seems that TensorRT version used for exporting needs to be the same as for inference. It is not enough that both are 8.x.

In my case, I was using tao-toolkit 3.22.02 which, when invoking the converter with this command:

tao converter $USER_EXPERIMENT_DIR/export/final_model.etlt \ -k $KEY \ -c $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \ -o predictions/Softmax \ -d 3,224,224 \ -i nchw \ -m 64 -t int8 \ -e $USER_EXPERIMENT_DIR/export/final_model.trt \ -b 64

spcefifies that nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 container is being used:

2022-06-05 17:56:21,846 [INFO] root: Registry: ['nvcr.io'] 2022-06-05 17:56:21,887 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3

The following command can be used to obtain the TensorRT version in this container.

docker run --rm -it nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash -c "pip list | grep tensorrt" tensorrt 8.0.1.6

So it is required to find a suitable Triton Server compatible with TensorRT 8.0.1.6.

The same applies to inspect the model with polygraphy.

pip install nvidia-tensorrt==8.0.1.6 polygraphy

Morganh · June 7, 2022, 3:13am

You can copy your .etlt model and replace the resnet18_vehicletypenet_pruned.etlt.
Then let triton server to generate its tensorrt engine.

nhorro · June 9, 2022, 2:13am

I tried dropping the .elt instead of the model.plan, but at least [http://nvcr.io/nvidia/tritonserver:21.08-py3) reports an error of being unable to load the model after trying with all backends: ONNX, Tensorflow, TensorRT, etc. Is the .elt backend added in a later version or is it installed by an additional step?

Morganh · June 9, 2022, 6:40am

The triton app will generate model.plan(tensorrt engine) based on the .etlt model. See tao-toolkit-triton-apps/download_and_convert.sh at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub

system · June 23, 2022, 6:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using TensorRT Inference Server with TLT models TAO Toolkit	6	1329	October 12, 2021
How do I import the trained model from TLT to Triton? TAO Toolkit	2	894	October 12, 2021
How to convert .etlt model to .plan model to use the TAO trained model in triton inference server TAO Toolkit inference-server-triton	3	1358	February 26, 2022
Tao-converted .plan model running in triton-server turned to bad accurate TAO Toolkit	46	3781	April 1, 2022
Model exported from tlt2 fails to load on tritonis TAO Toolkit tensorrt	6	792	October 12, 2021
Triton server inference model placement TAO Toolkit	7	1034	February 23, 2022
Convert pb saved model obtained from Tensorflow Object detection API to a TensorRT model and load it to TensorRT Inference Server TensorRT	1	620	October 25, 2018
Converting etlt file to .engine for jetson TAO Toolkit	17	3149	October 25, 2022
Triton client model config Triton Inference Server (archived)	0	1166	June 16, 2021
transfert learning toolkit-> export model TAO Toolkit	11	3695	October 12, 2021

Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet)

Related topics