Hello,
I’m following the instructions from image classification tutorial with Resnet from the NVIDIA TAO CV samples on a custom dataset of six classes. The notebook and the site documentation describe how to export and deploy to Deepstream SDK but not how to export and deploy to Triton Inference Server.
The Triton documentation specifies that TensorRT models for Triton Inference Server must have this format:
<model-repository-path>/ <model-name>/ config.pbtxt 1/ model.plan But after the procedure is completely executed the output consist of the following files:
!ls $TAO_EXPERIMENTS_DIR/classification/export calibration.tensor final_model_int8_cache.bin final_model.etlt final_model.trt Questions:
- How is
config.pbtxtgenerated? - Is there an automatic tool or a tao tool to inspect an optimized model architecture to obtain the last layer name, format, etc.? Trying with option ‘–strict-model-config=false’ gives an error (details below).
- Is it correct to copy and rename
final_model.trttomodel.planto the Triton model repository? - Does Triton Server support
final_model.etlt? - Any reference to documentation or tutorial is appreciated.
Details:
• Hardware: NVIDIA RTX3070MaxQ.
• Network Type (Classification)
• TLT Version: 3.22.02
• Triton Server Version: 22.05
• Training spec file:
model_config { arch: "resnet", n_layers: 18 # Setting these parameters to true to match the template downloaded from NGC. use_batch_norm: true all_projections: true freeze_blocks: 0 freeze_blocks: 1 input_image_size: "3,224,224" } train_config { train_dataset_path: "/workspace/tao-experiments/data/split/train" val_dataset_path: "/workspace/tao-experiments/data/split/val" pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5" optimizer { sgd { lr: 0.01 decay: 0.0 momentum: 0.9 nesterov: False } } batch_size_per_gpu: 64 n_epochs: 80 n_workers: 16 preprocess_mode: "caffe" enable_random_crop: True enable_center_crop: True label_smoothing: 0.0 mixup_alpha: 0.1 # regularizer reg_config { type: "L2" scope: "Conv2D,Dense" weight_decay: 0.00005 } # learning_rate lr_config { step { learning_rate: 0.006 step_size: 10 gamma: 0.1 } } } eval_config { eval_dataset_path: "/workspace/tao-experiments/data/split/test" model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt" top_k: 3 batch_size: 256 n_workers: 8 enable_center_crop: True } Details for the error when running with ‘–strict-model-config=false’:
Command:
export TRITON_SERVER_IMAGE="nvcr.io/nvidia/tritonserver:22.05-py3" docker run --gpus 1 --rm \ --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -p 8000:8000 -p 8001:8001 -p 8002:8002 \ -v"$PWD/model_repository":/models \ $TRITON_SERVER_IMAGE /bin/bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose=1" Result:
============================= == Triton Inference Server == ============================= NVIDIA Release 22.05 (build 38317651) Triton Server Version 2.22.0 Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license WARNING: CUDA Minor Version Compatibility mode ENABLED. Using driver version 510.73.05 which has support for CUDA 11.6. This container was built with CUDA 11.7 and will be run in Minor Version Compatibility mode. CUDA Forward Compatibility is preferred over Minor Version Compatibility for use with this container but was unavailable: [[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]] See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. I0606 00:43:20.225971 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc61c000000' with size 268435456 I0606 00:43:20.226203 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0606 00:43:20.226517 1 model_config_utils.cc:645] Server side auto-completed config: name: "clasificador1" platform: "tensorrt_plan" default_model_filename: "model.plan" backend: "tensorrt" I0606 00:43:20.227006 1 model_repository_manager.cc:1191] loading: clasificador1:1 I0606 00:43:20.327700 1 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4 I0606 00:43:20.327769 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so I0606 00:43:20.452357 1 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt I0606 00:43:20.452377 1 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9 I0606 00:43:20.452382 1 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9 I0606 00:43:20.452384 1 tensorrt.cc:5333] Registering TensorRT Plugins I0606 00:43:20.452392 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1 I0606 00:43:20.452395 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1 I0606 00:43:20.452398 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 I0606 00:43:20.452401 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1 I0606 00:43:20.452404 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1 I0606 00:43:20.452407 1 logging.cc:52] Registered plugin creator - ::CropAndResizeDynamic version 1 I0606 00:43:20.452410 1 logging.cc:52] Registered plugin creator - ::DecodeBbox3DPlugin version 1 I0606 00:43:20.452413 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1 I0606 00:43:20.452416 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1 I0606 00:43:20.452420 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 I0606 00:43:20.452423 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1 I0606 00:43:20.452426 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1 I0606 00:43:20.452429 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1 I0606 00:43:20.452432 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1 I0606 00:43:20.452436 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1 I0606 00:43:20.452438 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1 I0606 00:43:20.452574 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1 I0606 00:43:20.452580 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1 I0606 00:43:20.452583 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1 I0606 00:43:20.452588 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1 I0606 00:43:20.452591 1 logging.cc:52] Registered plugin creator - ::DMHA version 1 I0606 00:43:20.452594 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1 I0606 00:43:20.452596 1 logging.cc:52] Registered plugin creator - ::NMSDynamic_TRT version 1 I0606 00:43:20.452600 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1 I0606 00:43:20.452603 1 logging.cc:52] Registered plugin creator - ::PillarScatterPlugin version 1 I0606 00:43:20.452607 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1 I0606 00:43:20.452610 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1 I0606 00:43:20.452621 1 logging.cc:52] Registered plugin creator - ::Proposal version 1 I0606 00:43:20.452624 1 logging.cc:52] Registered plugin creator - ::ProposalDynamic version 1 I0606 00:43:20.452628 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1 I0606 00:43:20.452631 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1 I0606 00:43:20.452635 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1 I0606 00:43:20.452638 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1 I0606 00:43:20.452643 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1 I0606 00:43:20.452646 1 logging.cc:52] Registered plugin creator - ::ScatterND version 1 I0606 00:43:20.452653 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1 I0606 00:43:20.452656 1 logging.cc:52] Registered plugin creator - ::Split version 1 I0606 00:43:20.452660 1 logging.cc:52] Registered plugin creator - ::VoxelGeneratorPlugin version 1 I0606 00:43:20.452664 1 tensorrt.cc:5353] backend configuration: {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} I0606 00:43:20.452713 1 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: clasificador1 (version 1) I0606 00:43:20.453274 1 model_config_utils.cc:1597] ModelConfig 64-bit fields: I0606 00:43:20.453280 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds I0606 00:43:20.453281 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::max_queue_delay_microseconds I0606 00:43:20.453283 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds I0606 00:43:20.453284 1 model_config_utils.cc:1599] ModelConfig::ensemble_scheduling::step::model_version I0606 00:43:20.453285 1 model_config_utils.cc:1599] ModelConfig::input::dims I0606 00:43:20.453287 1 model_config_utils.cc:1599] ModelConfig::input::reshape::shape I0606 00:43:20.453288 1 model_config_utils.cc:1599] ModelConfig::instance_group::secondary_devices::device_id I0606 00:43:20.453290 1 model_config_utils.cc:1599] ModelConfig::model_warmup::inputs::value::dims I0606 00:43:20.453291 1 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim I0606 00:43:20.453294 1 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::input::value::dim I0606 00:43:20.453295 1 model_config_utils.cc:1599] ModelConfig::output::dims I0606 00:43:20.453296 1 model_config_utils.cc:1599] ModelConfig::output::reshape::shape I0606 00:43:20.453298 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds I0606 00:43:20.453299 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::max_sequence_idle_microseconds I0606 00:43:20.453300 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds I0606 00:43:20.453302 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::dims I0606 00:43:20.453303 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::initial_state::dims I0606 00:43:20.453304 1 model_config_utils.cc:1599] ModelConfig::version_policy::specific::versions I0606 00:43:20.909592 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +464, GPU +0, now: CPU 485, GPU 2821 (MiB) I0606 00:43:20.917427 1 logging.cc:49] Loaded engine size: 10 MiB E0606 00:43:20.919569 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43) E0606 00:43:20.919583 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.) Thanks in advance.
Kind regards,
Nicolás