@@ -30,7 +30,6 @@ This subfolder of the BERT TensorFlow repository, tested and maintained by NVIDI
3030 * [ TensorRT inference benchmark] ( #tensorrt-inference-benchmark )
3131 * [ Results] ( #results )
3232 * [ Inference performance: NVIDIA A100] ( #inference-performance-nvidia-a100-40gb )
33- * [ Inference performance: NVIDIA A30] ( #inference-performance-nvidia-a30 )
3433
3534
3635## Model overview
@@ -434,158 +433,75 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
434433| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
435434| -----------------| ------------| -----------------| -----------------| ---------| -----------------| -----------------| ---------|
436435| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
437- | 128 | 1 | 0.64 | 0.69 | 0.56 | 0.79 | 0.79 | 0.63 |
438- | 128 | 2 | 0.78 | 0.78 | 0.62 | 0.80 | 0.80 | 0.73 |
439- | 128 | 4 | 0.74 | 0.74 | 0.74 | 1.12 | 1.20 | 0.95 |
440- | 128 | 8 | 1.22 | 1.23 | 0.96 | 1.31 | 1.31 | 1.31 |
441- | 128 | 12 | 1.29 | 1.30 | 1.21 | 1.70 | 1.70 | 1.70 |
442- | 128 | 16 | 1.34 | 1.34 | 1.34 | 2.10 | 2.10 | 2.08 |
443- | 128 | 24 | 1.83 | 1.84 | 1.83 | 3.07 | 3.08 | 3.04 |
444- | 128 | 32 | 2.25 | 2.26 | 2.25 | 3.95 | 3.95 | 3.92 |
445- | 128 | 64 | 4.19 | 4.20 | 4.17 | 7.68 | 7.74 | 7.63 |
446- | 128 | 128 | 8.15 | 8.16 | 8.10 | 15.45 | 15.46 | 15.30 |
447- | 384 | 1 | 1.14 | 1.46 | 1.15 | 1.26 | 1.62 | 1.26 |
448- | 384 | 2 | 1.32 | 1.32 | 1.32 | 1.55 | 1.55 | 1.55 |
449- | 384 | 4 | 1.68 | 1.72 | 1.68 | 2.11 | 2.11 | 2.11 |
450- | 384 | 8 | 2.22 | 2.23 | 2.22 | 3.38 | 3.42 | 3.35 |
451- | 384 | 12 | 3.34 | 3.34 | 3.34 | 4.84 | 4.86 | 4.81 |
452- | 384 | 16 | 4.02 | 4.03 | 4.02 | 6.41 | 6.41 | 6.39 |
453- | 384 | 24 | 5.73 | 5.73 | 5.73 | 9.47 | 9.47 | 9.36 |
454- | 384 | 32 | 7.75 | 7.77 | 7.68 | 13.05 | 13.12 | 12.92 |
455- | 384 | 64 | 14.96 | 14.96 | 14.85 | 25.24 | 25.36 | 24.93 |
456- | 384 | 128 | 29.13 | 29.14 | 28.89 | 49.27 | 49.37 | 48.84 |
436+ | 128 | 1 | 0.68 | 0.68 | 0.55 | 0.67 | 0.79 | 0.63 |
437+ | 128 | 2 | 0.60 | 0.76 | 0.60 | 0.91 | 0.91 | 0.73 |
438+ | 128 | 4 | 0.73 | 0.93 | 0.73 | 1.19 | 1.19 | 0.94 |
439+ | 128 | 8 | 1.21 | 1.21 | 0.96 | 1.31 | 1.31 | 1.31 |
440+ | 128 | 12 | 1.20 | 1.52 | 1.20 | 1.72 | 1.72 | 1.71 |
441+ | 128 | 16 | 1.34 | 1.72 | 1.35 | 2.07 | 2.32 | 2.06 |
442+ | 128 | 24 | 1.82 | 1.82 | 1.82 | 3.02 | 3.08 | 3.02 |
443+ | 128 | 32 | 2.24 | 2.24 | 2.24 | 3.91 | 3.91 | 3.89 |
444+ | 128 | 64 | 4.15 | 4.19 | 4.12 | 7.62 | 7.64 | 7.57 |
445+ | 128 | 128 | 8.11 | 8.12 | 8.03 | 15.34 | 15.38 | 15.21 |
446+ | 384 | 1 | 1.13 | 1.13 | 1.13 | 1.24 | 1.60 | 1.25 |
447+ | 384 | 2 | 1.31 | 1.31 | 1.31 | 1.54 | 1.54 | 1.54 |
448+ | 384 | 4 | 1.66 | 1.66 | 1.66 | 2.08 | 2.08 | 2.08 |
449+ | 384 | 8 | 2.21 | 2.21 | 2.21 | 3.37 | 3.37 | 3.32 |
450+ | 384 | 12 | 3.32 | 3.32 | 3.32 | 4.78 | 4.82 | 4.77 |
451+ | 384 | 16 | 4.01 | 4.01 | 4.00 | 6.37 | 6.37 | 6.36 |
452+ | 384 | 24 | 5.70 | 5.70 | 5.70 | 9.34 | 9.39 | 9.29 |
453+ | 384 | 32 | 7.63 | 7.63 | 7.63 | 12.99 | 13.03 | 12.85 |
454+ | 384 | 64 | 14.86 | 14.87 | 14.72 | 24.89 | 25.12 | 24.70 |
455+ | 384 | 128 | 28.96 | 28.96 | 28.69 | 48.93 | 49.02 | 48.59 |
457456
458457# #### BERT Large
459458
460459| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
461460| -----------------| ------------| -----------------| -----------------| ---------| -----------------| -----------------| ---------|
462461| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
463- | 128 | 1 | 1.24 | 1.24 | 1.23 | 1.56 | 1.56 | 1.56 |
464- | 128 | 2 | 1.44 | 1.83 | 1.45 | 1.83 | 1.83 | 1.83 |
465- | 128 | 4 | 1.78 | 1.78 | 1.78 | 2.55 | 2.56 | 2.55 |
466- | 128 | 8 | 2.66 | 2.66 | 2.66 | 3.96 | 3.97 | 3.93 |
467- | 128 | 12 | 3.11 | 3.11 | 3.10 | 5.07 | 5.12 | 5.05 |
468- | 128 | 16 | 4.07 | 4.07 | 4.06 | 6.96 | 6.97 | 6.91 |
469- | 128 | 24 | 5.31 | 5.32 | 5.31 | 9.72 | 9.82 | 9.63 |
470- | 128 | 32 | 7.04 | 7.07 | 7.02 | 13.00 | 13.04 | 12.95 |
471- | 128 | 64 | 12.96 | 12.96 | 12.86 | 24.90 | 25.07 | 24.71 |
472- | 128 | 128 | 25.20 | 25.21 | 25.16 | 49.29 | 49.55 | 48.86 |
473- | 384 | 1 | 2.57 | 2.57 | 2.57 | 2.98 | 2.98 | 2.98 |
474- | 384 | 2 | 3.06 | 3.07 | 3.06 | 3.93 | 3.93 | 3.92 |
475- | 384 | 4 | 4.03 | 4.03 | 4.03 | 5.78 | 5.79 | 5.74 |
476- | 384 | 8 | 7.20 | 7.21 | 7.19 | 11.16 | 11.19 | 11.04 |
477- | 384 | 12 | 9.18 | 9.18 | 9.17 | 15.51 | 15.51 | 15.39 |
478- | 384 | 16 | 12.34 | 12.34 | 12.33 | 21.25 | 21.25 | 21.03 |
479- | 384 | 24 | 17.74 | 17.79 | 17.69 | 31.13 | 31.14 | 30.82 |
480- | 384 | 32 | 23.37 | 23.37 | 23.16 | 41.26 | 41.43 | 40.83 |
481- | 384 | 64 | 45.08 | 45.09 | 45.01 | 79.88 | 80.21 | 79.18 |
482- | 384 | 128 | 88.34 | 88.37 | 88.06 | 156.43 | 157.17 | 155.47 |
462+ | 128 | 1 | 1.39 | 1.39 | 1.24 | 1.54 | 1.55 | 1.54 |
463+ | 128 | 2 | 1.42 | 1.42 | 1.41 | 1.82 | 1.82 | 1.82 |
464+ | 128 | 4 | 1.78 | 1.95 | 1.79 | 2.50 | 2.50 | 2.50 |
465+ | 128 | 8 | 2.64 | 2.64 | 2.64 | 3.97 | 3.97 | 3.97 |
466+ | 128 | 12 | 3.09 | 3.09 | 3.09 | 5.02 | 5.03 | 4.99 |
467+ | 128 | 16 | 4.03 | 4.03 | 4.03 | 6.93 | 6.93 | 6.86 |
468+ | 128 | 24 | 5.28 | 5.31 | 5.28 | 9.64 | 9.65 | 9.56 |
469+ | 128 | 32 | 7.01 | 7.01 | 6.95 | 12.95 | 13.07 | 12.86 |
470+ | 128 | 64 | 12.84 | 12.86 | 12.72 | 24.80 | 25.05 | 24.68 |
471+ | 128 | 128 | 25.26 | 25.27 | 25.01 | 49.09 | 49.25 | 48.71 |
472+ | 384 | 1 | 2.55 | 2.55 | 2.55 | 2.96 | 2.96 | 2.95 |
473+ | 384 | 2 | 3.04 | 3.04 | 3.04 | 3.90 | 3.90 | 3.90 |
474+ | 384 | 4 | 4.01 | 4.02 | 4.01 | 5.74 | 5.80 | 5.68 |
475+ | 384 | 8 | 7.18 | 7.18 | 7.17 | 10.98 | 11.00 | 10.91 |
476+ | 384 | 12 | 9.15 | 9.15 | 9.14 | 15.43 | 15.44 | 15.33 |
477+ | 384 | 16 | 12.28 | 12.29 | 12.28 | 21.13 | 21.14 | 20.90 |
478+ | 384 | 24 | 17.67 | 17.67 | 17.56 | 30.98 | 31.07 | 30.71 |
479+ | 384 | 32 | 23.22 | 23.23 | 23.02 | 41.22 | 41.28 | 40.63 |
480+ | 384 | 64 | 45.16 | 45.30 | 44.83 | 79.64 | 79.98 | 79.24 |
481+ | 384 | 128 | 87.81 | 87.82 | 87.73 | 156.66 | 157.03 | 155.65 |
483482
484483# #### Megatron Large with Sparsity
485484
486485| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
487486| -----------------| ------------| -----------------| -----------------| ---------|
488487| | | 95th Percentile | 99th Percentile | Average |
489- | 128 | 1 | 1.17 | 1.48 | 1.18 |
490- | 128 | 2 | 1.49 | 1.88 | 1.50 |
491- | 128 | 4 | 1.79 | 1.79 | 1.79 |
488+ | 128 | 1 | 1.12 | 1.41 | 1.13 |
489+ | 128 | 2 | 1.37 | 1.70 | 1.38 |
490+ | 128 | 4 | 1.77 | 1.78 | 1.77 |
492491| 128 | 8 | 2.54 | 2.54 | 2.53 |
493- | 128 | 12 | 2.95 | 2.95 | 2.94 |
494- | 128 | 16 | 3.97 | 3.97 | 3.96 |
495- | 128 | 24 | 4.91 | 4.91 | 4.90 |
496- | 128 | 32 | 6.90 | 6.92 | 6.86 |
497- | 128 | 64 | 11.61 | 11.64 | 11.59 |
498- | 128 | 128 | 21.34 | 21.35 | 21.21 |
499- | 384 | 1 | 1.71 | 1.72 | 1.71 |
492+ | 128 | 12 | 3.13 | 3.13 | 3.12 |
493+ | 128 | 16 | 3.99 | 3.99 | 3.98 |
494+ | 128 | 24 | 4.90 | 4.90 | 4.90 |
495+ | 128 | 32 | 7.04 | 7.06 | 7.00 |
496+ | 128 | 64 | 11.62 | 11.63 | 11.61 |
497+ | 128 | 128 | 21.24 | 21.34 | 21.12 |
498+ | 384 | 1 | 1.71 | 2.15 | 1.71 |
500499| 384 | 2 | 2.21 | 2.21 | 2.21 |
501- | 384 | 4 | 3.47 | 3.47 | 3.47 |
502- | 384 | 8 | 5.75 | 5.75 | 5.74 |
503- | 384 | 12 | 8.37 | 8.38 | 8.35 |
504- | 384 | 16 | 10.39 | 10.40 | 10.37 |
505- | 384 | 24 | 14.61 | 14.62 | 14.59 |
506- | 384 | 32 | 18.80 | 18.96 | 18.78 |
507- | 384 | 64 | 35.90 | 35.92 | 35.62 |
508- | 384 | 128 | 67.74 | 67.77 | 67.60 |
509-
510- # ### Inference performance: NVIDIA A30
511-
512- Results were obtained by running ` scripts/inference_benchmark.sh --gpu Ampere` on NVIDIA A30.
513-
514- # #### BERT Base
515-
516- | Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
517- | -----------------| ------------| -----------------| -----------------| ---------| -----------------| -----------------| ---------|
518- | | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
519- | 128 | 1 | 0.88 | 0.88 | 0.61 | 0.78 | 1.14 | 0.79 |
520- | 128 | 2 | 1.03 | 1.04 | 0.77 | 0.97 | 1.45 | 0.98 |
521- | 128 | 4 | 1.04 | 1.56 | 1.05 | 1.43 | 1.44 | 1.41 |
522- | 128 | 8 | 1.44 | 1.46 | 1.43 | 2.43 | 2.44 | 2.41 |
523- | 128 | 12 | 1.92 | 1.92 | 1.91 | 3.44 | 3.45 | 3.39 |
524- | 128 | 16 | 2.38 | 2.43 | 2.35 | 4.36 | 4.37 | 4.28 |
525- | 128 | 24 | 3.47 | 3.50 | 3.44 | 6.56 | 6.65 | 6.48 |
526- | 128 | 32 | 4.42 | 4.45 | 4.38 | 8.42 | 8.58 | 8.36 |
527- | 128 | 64 | 8.58 | 8.66 | 8.49 | 16.58 | 16.60 | 16.40 |
528- | 128 | 128 | 16.56 | 16.62 | 16.39 | 32.13 | 32.30 | 31.93 |
529- | 384 | 1 | 1.31 | 2.01 | 1.32 | 1.63 | 1.63 | 1.62 |
530- | 384 | 2 | 1.67 | 1.67 | 1.66 | 2.29 | 2.35 | 2.26 |
531- | 384 | 4 | 2.29 | 2.34 | 2.27 | 3.74 | 3.77 | 3.71 |
532- | 384 | 8 | 4.23 | 4.24 | 4.20 | 7.25 | 7.30 | 7.15 |
533- | 384 | 12 | 6.05 | 6.10 | 6.00 | 10.21 | 10.27 | 10.12 |
534- | 384 | 16 | 8.07 | 8.11 | 8.02 | 13.97 | 14.05 | 13.84 |
535- | 384 | 24 | 11.85 | 11.86 | 11.71 | 20.31 | 20.42 | 20.16 |
536- | 384 | 32 | 15.45 | 15.47 | 15.29 | 26.86 | 27.04 | 26.65 |
537- | 384 | 64 | 30.49 | 30.74 | 30.25 | 52.21 | 52.34 | 51.75 |
538- | 384 | 128 | 60.21 | 60.48 | 59.56 | 103.20 | 103.58 | 102.66 |
539-
540- # #### BERT Large
541-
542- | Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
543- | -----------------| ------------| -----------------| -----------------| ---------| -----------------| -----------------| ---------|
544- | | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
545- | 128 | 1 | 1.46 | 1.46 | 1.45 | 2.01 | 2.01 | 2.01 |
546- | 128 | 2 | 1.83 | 1.85 | 1.83 | 2.80 | 2.83 | 2.75 |
547- | 128 | 4 | 2.71 | 2.71 | 2.69 | 4.34 | 4.36 | 4.29 |
548- | 128 | 8 | 4.33 | 4.35 | 4.28 | 8.12 | 8.20 | 8.03 |
549- | 128 | 12 | 5.71 | 5.72 | 5.61 | 10.65 | 10.65 | 10.51 |
550- | 128 | 16 | 7.62 | 7.64 | 7.55 | 14.57 | 14.66 | 14.55 |
551- | 128 | 24 | 10.58 | 10.62 | 10.46 | 20.64 | 20.79 | 20.45 |
552- | 128 | 32 | 14.18 | 14.26 | 13.99 | 28.17 | 28.31 | 28.01 |
553- | 128 | 64 | 26.87 | 27.00 | 26.61 | 53.44 | 53.71 | 53.31 |
554- | 128 | 128 | 52.36 | 52.71 | 51.90 | 105.42 | 105.95 | 104.96 |
555- | 384 | 1 | 3.33 | 3.33 | 3.33 | 4.23 | 4.24 | 4.19 |
556- | 384 | 2 | 4.26 | 4.26 | 4.23 | 6.63 | 6.65 | 6.57 |
557- | 384 | 4 | 7.26 | 7.26 | 7.25 | 12.00 | 12.06 | 11.88 |
558- | 384 | 8 | 12.91 | 12.99 | 12.83 | 22.61 | 22.69 | 22.45 |
559- | 384 | 12 | 18.73 | 18.85 | 18.53 | 33.43 | 33.64 | 33.28 |
560- | 384 | 16 | 24.06 | 24.22 | 24.02 | 44.35 | 44.64 | 44.06 |
561- | 384 | 24 | 35.83 | 35.95 | 35.49 | 64.84 | 64.90 | 64.78 |
562- | 384 | 32 | 47.05 | 47.27 | 46.73 | 85.89 | 86.17 | 85.11 |
563- | 384 | 64 | 92.09 | 92.32 | 91.34 | 168.09 | 168.48 | 167.24 |
564- | 384 | 128 | 180.47 | 180.90 | 179.75 | 330.71 | 331.31 | 329.53 |
565-
566- # #### Megatron Large with Sparsity
567-
568- | Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
569- | -----------------| ------------| -----------------| -----------------| ---------|
570- | | | 95th Percentile | 99th Percentile | Average |
571- | 128 | 1 | 1.44 | 1.45 | 1.44 |
572- | 128 | 2 | 1.84 | 1.84 | 1.84 |
573- | 128 | 4 | 2.76 | 2.76 | 2.75 |
574- | 128 | 8 | 4.12 | 4.12 | 4.11 |
575- | 128 | 12 | 5.26 | 5.28 | 5.22 |
576- | 128 | 16 | 7.52 | 7.52 | 7.51 |
577- | 128 | 24 | 9.97 | 9.99 | 9.89 |
578- | 128 | 32 | 12.84 | 12.85 | 12.80 |
579- | 128 | 64 | 24.35 | 24.46 | 24.15 |
580- | 128 | 128 | 46.38 | 46.60 | 45.96 |
581- | 384 | 1 | 2.37 | 2.37 | 2.36 |
582- | 384 | 2 | 3.88 | 3.88 | 3.87 |
583- | 384 | 4 | 6.10 | 6.11 | 6.05 |
584- | 384 | 8 | 11.60 | 11.63 | 11.49 |
585- | 384 | 12 | 15.73 | 15.78 | 15.64 |
586- | 384 | 16 | 20.95 | 21.01 | 20.90 |
587- | 384 | 24 | 29.83 | 29.93 | 29.71 |
588- | 384 | 32 | 40.01 | 40.09 | 39.75 |
589- | 384 | 64 | 76.46 | 76.67 | 76.28 |
590- | 384 | 128 | 148.96 | 149.23 | 148.11 |
591-
500+ | 384 | 4 | 3.63 | 3.64 | 3.63 |
501+ | 384 | 8 | 5.74 | 5.74 | 5.73 |
502+ | 384 | 12 | 8.22 | 8.23 | 8.21 |
503+ | 384 | 16 | 10.33 | 10.33 | 10.31 |
504+ | 384 | 24 | 14.52 | 14.52 | 14.51 |
505+ | 384 | 32 | 18.72 | 18.73 | 18.71 |
506+ | 384 | 64 | 35.79 | 35.81 | 35.50 |
507+ | 384 | 128 | 67.72 | 67.86 | 67.55 |
0 commit comments