TensorRT 8.0.3 imagenet resnet model INT8 conversion identical output with different input after calibration

user74264 · December 1, 2021, 2:19am

Description

I am trying to convert my resnet100 for classification tasks into the INT8 format. I have the inference code, and validation code, and was able to confirm FP32 and FP16 modes both gave accurate results. However, INT8 gave me a lot of trouble.

After carefully following the image_batcher.py and build_engine.py in TensorRT/samples/python/efficientnet at master · NVIDIA/TensorRT · GitHub, I was able to use identical preprocessing steps (to inference) in the batcher, and I validated the range of the preprocessed input tensors. They seemed correct. The program generated a calibration file with my directory of 5000 images (sample below used less). And it eventually saved an engine to disk. However, when I tried to examine the feature vectors, different input images will produce identical output, and they are incorrect, compared with same code with FP32 and FP16 modes. My evaluation just ended up having 0 accuracy.

My onnx model had batch axis as dynamic, so I added a dynamic profile with batch size ranging from 1 to 256. It didn’t matter what batch size calibration used.

Environment

TensorRT Version : 8.0.3
GPU Type : Tesla T4
Nvidia Driver Version : 450.119.03
CUDA Version : In container cuda-11.5
CUDNN Version :
Operating System + Version : ec2 instance
Python Version (if applicable) : 3.8.10
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) : Container nvcr.io/nvidia/tensorrt:21.11-py3

Steps To Reproduce

Here is my customization to TensorRT/build_engine.py at master · NVIDIA/TensorRT · GitHub

 log.info("Network Description") for input in inputs: self.batch_size = args.calib_batch_size log.info("Input '{}' with shape {} and dtype {}".format(input.name, input.shape, input.dtype)) profile = self.builder.create_optimization_profile() profile.set_shape(input.name, (1, 3, input.shape[2], input.shape[3]), (128, 3, input.shape[2], input.shape[3]), (256, 3, input.shape[2], input.shape[3])) self.config.add_optimization_profile(profile) for output in outputs: log.info("Output '{}' with shape {} and dtype {}".format(output.name, output.shape, output.dtype)) assert self.batch_size > 0 self.builder.max_batch_size = 256

Here is the output from that build script:

[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +321, GPU +0, now: CPU 342, GPU 252 (MiB) [TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. INFO:EngineBuilder:Network Description INFO:EngineBuilder:Input 'input.1' with shape (-1, 3, 224, 224) and dtype DataType.FLOAT INFO:EngineBuilder:Output '1333' with shape (-1, 512) and dtype DataType.FLOAT INFO:EngineBuilder:Building int8 Engine in /training/models/resnet100.trt /training/docker-tensorrt-workbench/build_engine.py:209: DeprecationWarning: Use build_serialized_network instead. with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f: [TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 592 MiB, GPU 254 MiB [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +507, GPU +220, now: CPU 1101, GPU 474 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +114, GPU +52, now: CPU 1215, GPU 526 (MiB) [TensorRT] WARNING: Calibration Profile is not defined. Running calibration with Profile 0 [TensorRT] INFO: Detected 1 inputs and 1 output network tensors. [TensorRT] INFO: Total Host Persistent Memory: 22528 [TensorRT] INFO: Total Device Persistent Memory: 0 [TensorRT] INFO: Total Scratch Memory: 0 [TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1738, GPU 1002 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1739, GPU 1010 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1738, GPU 994 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1738, GPU 978 (MiB) [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 1738 MiB, GPU 978 MiB [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1738, GPU 986 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1738, GPU 994 (MiB) [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 1738 MiB, GPU 1902 MiB [TensorRT] INFO: Starting Calibration. INFO:EngineBuilder:Calibrating image 4 / 40 [TensorRT] INFO: Calibrated batch 0 in 3.5102 seconds. INFO:EngineBuilder:Calibrating image 8 / 40 [TensorRT] INFO: Calibrated batch 1 in 3.51987 seconds. INFO:EngineBuilder:Calibrating image 12 / 40 [TensorRT] INFO: Calibrated batch 2 in 3.54024 seconds. INFO:EngineBuilder:Calibrating image 16 / 40 [TensorRT] INFO: Calibrated batch 3 in 3.54137 seconds. INFO:EngineBuilder:Calibrating image 20 / 40 [TensorRT] INFO: Calibrated batch 4 in 3.5381 seconds. INFO:EngineBuilder:Calibrating image 24 / 40 [TensorRT] INFO: Calibrated batch 5 in 3.53881 seconds. INFO:EngineBuilder:Calibrating image 28 / 40 [TensorRT] INFO: Calibrated batch 6 in 3.56045 seconds. INFO:EngineBuilder:Calibrating image 32 / 40 [TensorRT] INFO: Calibrated batch 7 in 3.56059 seconds. INFO:EngineBuilder:Calibrating image 36 / 40 [TensorRT] INFO: Calibrated batch 8 in 3.55971 seconds. INFO:EngineBuilder:Calibrating image 40 / 40 [TensorRT] INFO: Calibrated batch 9 in 3.56199 seconds. INFO:EngineBuilder:Finished calibration batches [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1746, GPU 1886 (MiB) [TensorRT] INFO: Post Processing Calibration data in 38.0515 seconds. [TensorRT] INFO: Calibration completed in 78.6834 seconds. [TensorRT] INFO: Writing Calibration Cache for calibrator: TRT-8003-EntropyCalibration2 INFO:EngineBuilder:Writing calibration cache data to: calibration.cache [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1781, GPU 734 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1781, GPU 742 (MiB) [TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead [TensorRT] INFO: Detected 1 inputs and 1 output network tensors. [TensorRT] INFO: Total Host Persistent Memory: 243872 [TensorRT] INFO: Total Device Persistent Memory: 53171200 [TensorRT] INFO: Total Scratch Memory: 512 [TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 148 MiB, GPU 910 MiB [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1870, GPU 828 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1870, GPU 836 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1870, GPU 820 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1869, GPU 804 (MiB)

Here is my generated calibration cache:

TRT-8003-EntropyCalibration2 input.1: 7f800000 1334: 7f800000 (Unnamed Layer* 1) [Constant]_output: 3bd7b8b7 (Unnamed Layer* 2) [Shuffle]_output: 3bd7b8b7 929: 7f800000 930: 7f800000 1337: 7f800000 (Unnamed Layer* 6) [Constant]_output: 3b0efec6 (Unnamed Layer* 7) [Shuffle]_output: 3b0efec6 934: 7f800000 1340: 7f800000 1343: 7ab458b8 939: 7f800000 940: 7f800000 1346: 7f800000 (Unnamed Layer* 14) [Constant]_output: 3ad4d621 (Unnamed Layer* 15) [Shuffle]_output: 3ad4d621 944: 7f800000 1349: 7a90e807 947: 7f800000 948: 7f800000 1352: 7f800000 (Unnamed Layer* 21) [Constant]_output: 3a969a81 (Unnamed Layer* 22) [Shuffle]_output: 3a969a81 952: 7f800000 1355: 7b035e84 955: 7f800000 956: 7f800000 1358: 7f800000 (Unnamed Layer* 28) [Constant]_output: 3b0ee760 (Unnamed Layer* 29) [Shuffle]_output: 3b0ee760 960: 7f800000 1361: 7a4e9ef7 1364: 7a0ed897 965: 7a805efb 966: 79ff7b20 1367: 7a0bd9a8 (Unnamed Layer* 36) [Constant]_output: 3acaee93 (Unnamed Layer* 37) [Shuffle]_output: 3acaee93 970: 79a74866 1370: 7988f33e 973: 7a24191d 974: 79b4ce9c 1373: 7984a58b (Unnamed Layer* 43) [Constant]_output: 3aab3324 (Unnamed Layer* 44) [Shuffle]_output: 3aab3324 978: 78901743 1376: 760b50ec 981: 760b50ec 982: 761d6dcb 1379: 75e783b4 (Unnamed Layer* 50) [Constant]_output: 3a8b7a92 (Unnamed Layer* 51) [Shuffle]_output: 3a8b7a92 986: 750782a1 1382: 3ad03e59 989: 3bbd7be6 990: 3aff2390 1385: 3b75658c (Unnamed Layer* 57) [Constant]_output: 3a9a2797 (Unnamed Layer* 58) [Shuffle]_output: 3a9a2797 994: 3b367975 1388: 3b578819 997: 3c0d3eb2 998: 3b7a7d0e 1391: 3b92bad5 (Unnamed Layer* 64) [Constant]_output: 3a9f9490 (Unnamed Layer* 65) [Shuffle]_output: 3a9f9490 1002: 3b2019b6 1394: 3b81d8d4 1005: 3c50a850 1006: 3ba35a12 1397: 3bc35d71 (Unnamed Layer* 71) [Constant]_output: 3af3ad53 (Unnamed Layer* 72) [Shuffle]_output: 3af3ad53 1010: 3b2e08b5 1400: 3b8c3ecb 1013: 3c5b1b0f 1014: 3b954245 1403: 3ba2990d (Unnamed Layer* 78) [Constant]_output: 3aca1921 (Unnamed Layer* 79) [Shuffle]_output: 3aca1921 1018: 3b1c5c5f 1406: 3bcbac74 1021: 3c5ee144 1022: 3b990fbf 1409: 3baa9825 (Unnamed Layer* 85) [Constant]_output: 3b05ee7a (Unnamed Layer* 86) [Shuffle]_output: 3b05ee7a 1026: 3b207e7b 1412: 3b8a2d2d 1029: 3c6ec711 1030: 3ba839f0 1415: 3b90c2e7 (Unnamed Layer* 92) [Constant]_output: 3abfafff (Unnamed Layer* 93) [Shuffle]_output: 3abfafff 1034: 3b12b5ec 1418: 3b84d275 1037: 3c828356 1038: 3b62060b 1421: 3b82c46e (Unnamed Layer* 99) [Constant]_output: 3ad0594a (Unnamed Layer* 100) [Shuffle]_output: 3ad0594a 1042: 3b23cf3f 1424: 3bc19cb1 1045: 3c9c3911 1046: 3b89f6de 1427: 3b8b7f01 (Unnamed Layer* 106) [Constant]_output: 3aa2b279 (Unnamed Layer* 107) [Shuffle]_output: 3aa2b279 1050: 3b0d6831 1430: 3bbff987 1053: 3cb5e90b 1054: 3b989c5e 1433: 3b7f3220 (Unnamed Layer* 113) [Constant]_output: 3aacd63c (Unnamed Layer* 114) [Shuffle]_output: 3aacd63c 1058: 3b186342 1436: 3ba70bdc 1061: 3cde4ad0 1062: 3bd0bc14 1439: 3bf217bf (Unnamed Layer* 120) [Constant]_output: 3ad4cfe5 (Unnamed Layer* 121) [Shuffle]_output: 3ad4cfe5 1066: 3bf217bf 1442: 3b567412 1445: 3b1ea081 1071: 3b9bc123 1072: 3ab5bc2e 1448: 3b6c1ffb (Unnamed Layer* 128) [Constant]_output: 3a8a53d7 (Unnamed Layer* 129) [Shuffle]_output: 3a8a53d7 1076: 3b0f86e4 1451: 3ae20f9c 1079: 3b9b02ba 1080: 3b61be4c 1454: 3b88f7e0 (Unnamed Layer* 135) [Constant]_output: 3a7a4331 (Unnamed Layer* 136) [Shuffle]_output: 3a7a4331 1084: 3ab1e180 1457: 3ac3fe16 1087: 3bbc7fbb 1088: 3b401555 1460: 3b5af4f0 (Unnamed Layer* 142) [Constant]_output: 3a8e7af2 (Unnamed Layer* 143) [Shuffle]_output: 3a8e7af2 1092: 3af870e7 1463: 3ac71e86 1095: 3bcb401c 1096: 3b603f99 1466: 3bd4708f (Unnamed Layer* 149) [Constant]_output: 3a84c7f1 (Unnamed Layer* 150) [Shuffle]_output: 3a84c7f1 1100: 3bdb0064 1469: 3b16dfcb 1103: 3be5579d 1104: 3b74ecb7 1472: 3ba9e62c (Unnamed Layer* 156) [Constant]_output: 3a6ebd68 (Unnamed Layer* 157) [Shuffle]_output: 3a6ebd68 1108: 3baefd5f 1475: 3b1ca833 1111: 3be97e83 1112: 3b8e5559 1478: 3bb9d60f (Unnamed Layer* 163) [Constant]_output: 3a8852d7 (Unnamed Layer* 164) [Shuffle]_output: 3a8852d7 1116: 3b895b6f 1481: 3b35a312 1119: 3bd62fe7 1120: 3b84fbed 1484: 3baef164 (Unnamed Layer* 170) [Constant]_output: 3afc2d20 (Unnamed Layer* 171) [Shuffle]_output: 3afc2d20 1124: 3baef164 1487: 3b81bb3c 1127: 3c0d8b7d 1128: 3b8e2fbd 1490: 3ba869e3 (Unnamed Layer* 177) [Constant]_output: 3a82e480 (Unnamed Layer* 178) [Shuffle]_output: 3a82e480 1132: 3ae0018f 1493: 3b33bcab 1135: 3be3b76a 1136: 3b8dc703 1496: 3b96fe57 (Unnamed Layer* 184) [Constant]_output: 3a8c7e8e (Unnamed Layer* 185) [Shuffle]_output: 3a8c7e8e 1140: 3b1ec9e1 1499: 3b515594 1143: 3c02dfd2 1144: 3b92e7f4 1502: 3b941d08 (Unnamed Layer* 191) [Constant]_output: 3a8f4047 (Unnamed Layer* 192) [Shuffle]_output: 3a8f4047 1148: 3b1e490d 1505: 3b08736e 1151: 3c194e1c 1152: 3b65e9cf 1508: 3ba6edda (Unnamed Layer* 198) [Constant]_output: 3a94b4c9 (Unnamed Layer* 199) [Shuffle]_output: 3a94b4c9 1156: 3af0949b 1511: 3aec2aab 1159: 3c19bd98 1160: 3b5d276d 1514: 3b61ef4c (Unnamed Layer* 205) [Constant]_output: 3a90d257 (Unnamed Layer* 206) [Shuffle]_output: 3a90d257 1164: 3ac0786d 1517: 3aff5b5b 1167: 3bf868e0 1168: 3b262f60 1520: 3b8cca4b (Unnamed Layer* 212) [Constant]_output: 3a83b99b (Unnamed Layer* 213) [Shuffle]_output: 3a83b99b 1172: 3ab3616a 1523: 3ac8cd82 1175: 3c299b8e 1176: 3b14345f 1526: 3b832ddd (Unnamed Layer* 219) [Constant]_output: 3a8d6282 (Unnamed Layer* 220) [Shuffle]_output: 3a8d6282 1180: 3aaa8079 1529: 3ac64a56 1183: 3c1b556f 1184: 3b08705c 1532: 3b434171 (Unnamed Layer* 226) [Constant]_output: 3aa7d55d (Unnamed Layer* 227) [Shuffle]_output: 3aa7d55d 1188: 3ab8e3b8 1535: 3b0429e5 1191: 3c06cd05 1192: 3b4151d5 1538: 3b810394 (Unnamed Layer* 233) [Constant]_output: 3a9977d9 (Unnamed Layer* 234) [Shuffle]_output: 3a9977d9 1196: 3aa41975 1541: 3acf2a74 1199: 3c25753f 1200: 3b2ee3d3 1544: 3b740030 (Unnamed Layer* 240) [Constant]_output: 3a800fad (Unnamed Layer* 241) [Shuffle]_output: 3a800fad 1204: 3ad16109 1547: 3b1a54fe 1207: 3c24e6fc 1208: 3b31d579 1550: 3b6d1c5f (Unnamed Layer* 247) [Constant]_output: 3a810c04 (Unnamed Layer* 248) [Shuffle]_output: 3a810c04 1212: 3a98343e 1553: 3b006418 1215: 3c2aba7e 1216: 3b258b3c 1556: 3b61e917 (Unnamed Layer* 254) [Constant]_output: 3a833126 (Unnamed Layer* 255) [Shuffle]_output: 3a833126 1220: 3ab9ad44 1559: 3b346df2 1223: 3c0c8b8f 1224: 3b279bc7 1562: 3b7c5105 (Unnamed Layer* 261) [Constant]_output: 3a859747 (Unnamed Layer* 262) [Shuffle]_output: 3a859747 1228: 3ac08109 1565: 3b2a6c65 1231: 3c4d5550 1232: 3b2f53ac 1568: 3b49f838 (Unnamed Layer* 268) [Constant]_output: 3a96c0f3 (Unnamed Layer* 269) [Shuffle]_output: 3a96c0f3 1236: 3aa7961d 1571: 3b2b98bd 1239: 3c3e7f44 1240: 3b324498 1574: 3b715eae (Unnamed Layer* 275) [Constant]_output: 3a899b80 (Unnamed Layer* 276) [Shuffle]_output: 3a899b80 1244: 3abc6eae 1577: 3b50fa3f 1247: 3c4f6214 1248: 3b3df409 1580: 3b6e2524 (Unnamed Layer* 282) [Constant]_output: 3a9bc15c (Unnamed Layer* 283) [Shuffle]_output: 3a9bc15c 1252: 3a9968a7 1583: 3b45157d 1255: 3c51d5c1 1256: 3b33bfa2 1586: 3b7c72dc (Unnamed Layer* 289) [Constant]_output: 3a8b47e3 (Unnamed Layer* 290) [Shuffle]_output: 3a8b47e3 1260: 3ac21a64 1589: 3b461591 1263: 3c557fe0 1264: 3b255ca6 1592: 3b72b93c (Unnamed Layer* 296) [Constant]_output: 3aaa84ce (Unnamed Layer* 297) [Shuffle]_output: 3aaa84ce 1268: 3a949169 1595: 3b999bf0 1271: 3c5c96a2 1272: 3b2b422f 1598: 3b92f3aa (Unnamed Layer* 303) [Constant]_output: 3a92430a (Unnamed Layer* 304) [Shuffle]_output: 3a92430a 1276: 3a8f6092 1601: 3b8d44dc 1279: 3c729db1 1280: 3b42c812 1604: 3ba398b1 (Unnamed Layer* 310) [Constant]_output: 3a9426dd (Unnamed Layer* 311) [Shuffle]_output: 3a9426dd 1284: 3acf4f80 1607: 3b566ea9 1287: 3c5d79a7 1288: 3b2e0069 1610: 3b8acd90 (Unnamed Layer* 317) [Constant]_output: 3a938d19 (Unnamed Layer* 318) [Shuffle]_output: 3a938d19 1292: 3ade2fe6 1613: 3bc06970 1295: 3c673039 1296: 3b29fe0a 1616: 3b71492c (Unnamed Layer* 324) [Constant]_output: 3a774f86 (Unnamed Layer* 325) [Shuffle]_output: 3a774f86 1300: 3ae6a4b3 1619: 3b9b0e4b 1303: 3c7e875d 1304: 3b8354c6 1622: 3b6ce6b1 (Unnamed Layer* 331) [Constant]_output: 3ae74fa8 (Unnamed Layer* 332) [Shuffle]_output: 3ae74fa8 1308: 3b1469cb 1625: 3ad35bc4 1628: 3a41e752 1313: 3b195750 1314: 3afa5103 1631: 3bafb5d5 (Unnamed Layer* 339) [Constant]_output: 3ace4739 (Unnamed Layer* 340) [Shuffle]_output: 3ace4739 1318: 3ac85277 1634: 3ae1731b 1321: 3b3926ee 1322: 3b39e7e9 1637: 3b935125 (Unnamed Layer* 346) [Constant]_output: 3b0ad1c8 (Unnamed Layer* 347) [Shuffle]_output: 3b0ad1c8 1326: 3ad43734 1640: 3b0e8937 1329: 3b551758 1330: 3b54c765 1331: 3b54c765 (Unnamed Layer* 363) [Shuffle]_output: 3b54c765 (Unnamed Layer* 364) [Fully Connected]_output: 3b14cb0e 1332: 3b14cb0e (Unnamed Layer* 374) [Shuffle]_output: 3b14cb0e (Unnamed Layer* 375) [Scale]_output: 3cb503e3 1333: 3cb503e3

I don’t know how to decipher this.

NVES · December 1, 2021, 6:39am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

user74264 · December 1, 2021, 12:46pm

I read those two articles. I am asking how the python example TensorRT/samples/python/efficientnet at master · NVIDIA/TensorRT · GitHub is different from your two links, and what could be inferred from my description of the problem.

spolisetty · December 23, 2021, 6:20am

Hi,

Could you please try on latest TRT 8.2 release. If you still face this issue, we recommend you to post your concern on Issues · NVIDIA/TensorRT · GitHub to get better help on this.

Thank you.

Topic		Replies	Views
Int8 problem TensorRT tensorrt	19	1190	May 11, 2021
TensorRT INT8 inference, the result is totally wrong! TensorRT	7	916	May 13, 2020
Classification model of densenet converted to int8 that outputs result is error! TensorRT	4	1173	October 28, 2019
Generate the INT8 calibration In TensorRT GPU-Accelerated Libraries	0	659	October 23, 2017
INT8 inference with different results TensorRT	5	1250	October 5, 2018
Calibration and int8 inference on Onnx model TensorRT tensorrt	17	2695	March 20, 2023
Failed to use INT8 precision mode when using caffemodel on Xavier Jetson AGX Xavier	4	1083	October 18, 2021
After int8 quantification, the model does not detect any objects TensorRT	1	1245	June 27, 2019
How to do calibration for int8 engine correctly? TensorRT tensorrt	0	537	September 22, 2020
TensorRT Python INT8 calibration failure TensorRT	3	1994	November 23, 2018

TensorRT 8.0.3 imagenet resnet model INT8 conversion identical output with different input after calibration

Description

Environment

Steps To Reproduce

Related topics