Inference time is linear respective to batch size while using TENSORRT MODEL

bschandu67 · February 2, 2021, 12:41pm

Description

Inference time linear proportionality with batch size while using Tensorrt engine for scaledyolov4 for object detection(scaled yolov4).
A clear and concise description of the bug or issue.
When I am increasing batch size, inference time is increasing linearly.

Environment

TensorRT Version:
Checked on two versions (7.2.2 and 7.0.0)
GPU Type:
Tesla T4
Nvidia Driver Version:
455
CUDA Version:
7.2.2 with cuda-11.1 and 7.0.0 with cuda-10.2
CUDNN Version:
7 with trt-7.0.0 and 8 with trt-7.2.2
Operating System + Version:
ubuntu-18.04
Python Version (if applicable):
3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
nvcr.io/nvidia/tensorrt:20.12-py3 - trt-7.2.2
nvcr.io/nvidia/tensorrt:20.03-py3 - trt-7.0.0

FOR BATCH SIZE - 1:

Inference take: 48.5283 ms. Inference take: 48.518 ms. Inference take: 40.1897 ms. Inference take: 40.0713 ms. Inference take: 38.54 ms. Inference take: 38.7829 ms. Inference take: 38.6083 ms. Inference take: 38.6635 ms. Inference take: 38.1827 ms. Inference take: 38.1016 ms

FOR BATCH SIZE - 2:

Inference take: 76.3045 ms. Inference take: 74.9346 ms. Inference take: 73.3341 ms. Inference take: 73.9554 ms. Inference take: 73.4185 ms. Inference take: 75.4546 ms. Inference take: 77.7809 ms. Inference take: 78.3289 ms. Inference take: 79.5533 ms. Inference take: 79.0556 ms. Inference take: 79.2939 ms. Inference take: 77.214 ms.

FOR BATCH SIZE - 4:

Inference take: 158.327 ms. Inference take: 157.001 ms. Inference take: 157.107 ms. Inference take: 154.237 ms. Inference take: 155.899 ms. Inference take: 157.408 ms. Inference take: 155.758 ms. Inference take: 155.906 ms.

I expected batch size not to have this proportionality. Can anything done to improve the inference time using batching?
TIY.

NVES · February 2, 2021, 1:07pm

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

bschandu67 · February 2, 2021, 1:21pm

ScaledYOLOv4 model
I am using this model for object detection.

Model to ONNX conversion script

Followed this repository to convert my onnx model to tensorrt model.

I will try to run my model with trtexec. But at the EOD, I need to export my model as a python library. So I can’t use trtexec(which hinders my end goal ).
Thanks!

spolisetty · February 3, 2021, 9:31am

Hi @bschandu67,

Could you please check the gpu utilization.

Thank you.

bschandu67 · February 3, 2021, 12:35pm

83% volatile GPU and 1.7 GB memory was used for batch size 1 model. and 100% volatile GPU and 2.7 GB memory was used for batch size 4 model.
While using baremetal tensorrt engine for inference using python, 46% volatile for batch size 1 and 100% volatile for batch size 4.

spolisetty · February 5, 2021, 11:38am

Hi @bschandu67,

Thanks for providing gpu utilization. It is fine. Could you please also provide the engine build verbose log, and inference layer performance.

build: trtexec --verbose ..... inference: trtexec --dumpProfile --separateProfileRun

Thank you.

Aref · May 5, 2021, 11:33am

Hi @bschandu67 .

Have you been able to reduce the runtime with higher batch sizes?

Thanks!

bschandu67 · May 5, 2021, 11:59am

Yes.
But not by a huge margin.
With batching, we were able to achieve 4ms less per frame.

Aref · May 5, 2021, 2:18pm

Right. I realized that when the inference uses GPU’s full capacity even with a batch size of 1, increasing the batch size wouldn’t help much, according to this thread.

Was it the same case with your problem?

Thanks a lot!

Topic		Replies	Views
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1822	September 30, 2021
TensorRT 5.0.2 Batch Size Problem: bigger batch size Inference Time increase??? General	6	1607	October 12, 2021
TRT inference on batches is not giving any performance benefit Jetson TX2 tensorrt , nvbugs	11	1303	October 18, 2021
No speedup on batch size larger than 1 TensorRT tensorrt , pytorch	4	1680	July 31, 2020
Inference on large batch size TensorRT	5	4670	September 21, 2018
Batch Inference using BatchSize=8 takes nearly as long as 8 individual runs of BatchSize=1 TensorRT	3	1266	July 20, 2021
Inference time is not improving with the increase in batch size TensorRT	8	2011	June 1, 2022
ResNet18: Batch size 1 works, but batch size 10, 32 only has minor acceleration TensorRT	2	1809	February 20, 2020
The larger the batch size, the better when build engine? TensorRT tensorrt	3	1790	July 29, 2020
Latency linearly increases when increased batch size or concurrent models Tensorrt Triton Inference Server (archived) tensorrt	3	1895	October 1, 2021

Inference time is linear respective to batch size while using TENSORRT MODEL

Description

Environment

Related topics