Skip to content

Commit 6d2fa4d

Browse files
Merge pull request NVIDIA#3886 from asfiyab-nvidia/dev-bert-benchmark-update
Publish demoBERT benchmark data
2 parents 0598d5d + f703408 commit 6d2fa4d

File tree

1 file changed

+58
-142
lines changed

1 file changed

+58
-142
lines changed

demo/BERT/README.md

Lines changed: 58 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ This subfolder of the BERT TensorFlow repository, tested and maintained by NVIDI
3030
* [TensorRT inference benchmark](#tensorrt-inference-benchmark)
3131
* [Results](#results)
3232
* [Inference performance: NVIDIA A100](#inference-performance-nvidia-a100-40gb)
33-
* [Inference performance: NVIDIA A30](#inference-performance-nvidia-a30)
3433

3534

3635
## Model overview
@@ -434,158 +433,75 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
434433
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
435434
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
436435
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
437-
| 128 | 1 | 0.64 | 0.69 | 0.56 | 0.79 | 0.79 | 0.63 |
438-
| 128 | 2 | 0.78 | 0.78 | 0.62 | 0.80 | 0.80 | 0.73 |
439-
| 128 | 4 | 0.74 | 0.74 | 0.74 | 1.12 | 1.20 | 0.95 |
440-
| 128 | 8 | 1.22 | 1.23 | 0.96 | 1.31 | 1.31 | 1.31 |
441-
| 128 | 12 | 1.29 | 1.30 | 1.21 | 1.70 | 1.70 | 1.70 |
442-
| 128 | 16 | 1.34 | 1.34 | 1.34 | 2.10 | 2.10 | 2.08 |
443-
| 128 | 24 | 1.83 | 1.84 | 1.83 | 3.07 | 3.08 | 3.04 |
444-
| 128 | 32 | 2.25 | 2.26 | 2.25 | 3.95 | 3.95 | 3.92 |
445-
| 128 | 64 | 4.19 | 4.20 | 4.17 | 7.68 | 7.74 | 7.63 |
446-
| 128 | 128 | 8.15 | 8.16 | 8.10 | 15.45 | 15.46 | 15.30 |
447-
| 384 | 1 | 1.14 | 1.46 | 1.15 | 1.26 | 1.62 | 1.26 |
448-
| 384 | 2 | 1.32 | 1.32 | 1.32 | 1.55 | 1.55 | 1.55 |
449-
| 384 | 4 | 1.68 | 1.72 | 1.68 | 2.11 | 2.11 | 2.11 |
450-
| 384 | 8 | 2.22 | 2.23 | 2.22 | 3.38 | 3.42 | 3.35 |
451-
| 384 | 12 | 3.34 | 3.34 | 3.34 | 4.84 | 4.86 | 4.81 |
452-
| 384 | 16 | 4.02 | 4.03 | 4.02 | 6.41 | 6.41 | 6.39 |
453-
| 384 | 24 | 5.73 | 5.73 | 5.73 | 9.47 | 9.47 | 9.36 |
454-
| 384 | 32 | 7.75 | 7.77 | 7.68 | 13.05 | 13.12 | 12.92 |
455-
| 384 | 64 | 14.96 | 14.96 | 14.85 | 25.24 | 25.36 | 24.93 |
456-
| 384 | 128 | 29.13 | 29.14 | 28.89 | 49.27 | 49.37 | 48.84 |
436+
| 128 | 1 | 0.68 | 0.68 | 0.55 | 0.67 | 0.79 | 0.63 |
437+
| 128 | 2 | 0.60 | 0.76 | 0.60 | 0.91 | 0.91 | 0.73 |
438+
| 128 | 4 | 0.73 | 0.93 | 0.73 | 1.19 | 1.19 | 0.94 |
439+
| 128 | 8 | 1.21 | 1.21 | 0.96 | 1.31 | 1.31 | 1.31 |
440+
| 128 | 12 | 1.20 | 1.52 | 1.20 | 1.72 | 1.72 | 1.71 |
441+
| 128 | 16 | 1.34 | 1.72 | 1.35 | 2.07 | 2.32 | 2.06 |
442+
| 128 | 24 | 1.82 | 1.82 | 1.82 | 3.02 | 3.08 | 3.02 |
443+
| 128 | 32 | 2.24 | 2.24 | 2.24 | 3.91 | 3.91 | 3.89 |
444+
| 128 | 64 | 4.15 | 4.19 | 4.12 | 7.62 | 7.64 | 7.57 |
445+
| 128 | 128 | 8.11 | 8.12 | 8.03 | 15.34 | 15.38 | 15.21 |
446+
| 384 | 1 | 1.13 | 1.13 | 1.13 | 1.24 | 1.60 | 1.25 |
447+
| 384 | 2 | 1.31 | 1.31 | 1.31 | 1.54 | 1.54 | 1.54 |
448+
| 384 | 4 | 1.66 | 1.66 | 1.66 | 2.08 | 2.08 | 2.08 |
449+
| 384 | 8 | 2.21 | 2.21 | 2.21 | 3.37 | 3.37 | 3.32 |
450+
| 384 | 12 | 3.32 | 3.32 | 3.32 | 4.78 | 4.82 | 4.77 |
451+
| 384 | 16 | 4.01 | 4.01 | 4.00 | 6.37 | 6.37 | 6.36 |
452+
| 384 | 24 | 5.70 | 5.70 | 5.70 | 9.34 | 9.39 | 9.29 |
453+
| 384 | 32 | 7.63 | 7.63 | 7.63 | 12.99 | 13.03 | 12.85 |
454+
| 384 | 64 | 14.86 | 14.87 | 14.72 | 24.89 | 25.12 | 24.70 |
455+
| 384 | 128 | 28.96 | 28.96 | 28.69 | 48.93 | 49.02 | 48.59 |
457456

458457
##### BERT Large
459458

460459
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
461460
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
462461
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
463-
| 128 | 1 | 1.24 | 1.24 | 1.23 | 1.56 | 1.56 | 1.56 |
464-
| 128 | 2 | 1.44 | 1.83 | 1.45 | 1.83 | 1.83 | 1.83 |
465-
| 128 | 4 | 1.78 | 1.78 | 1.78 | 2.55 | 2.56 | 2.55 |
466-
| 128 | 8 | 2.66 | 2.66 | 2.66 | 3.96 | 3.97 | 3.93 |
467-
| 128 | 12 | 3.11 | 3.11 | 3.10 | 5.07 | 5.12 | 5.05 |
468-
| 128 | 16 | 4.07 | 4.07 | 4.06 | 6.96 | 6.97 | 6.91 |
469-
| 128 | 24 | 5.31 | 5.32 | 5.31 | 9.72 | 9.82 | 9.63 |
470-
| 128 | 32 | 7.04 | 7.07 | 7.02 | 13.00 | 13.04 | 12.95 |
471-
| 128 | 64 | 12.96 | 12.96 | 12.86 | 24.90 | 25.07 | 24.71 |
472-
| 128 | 128 | 25.20 | 25.21 | 25.16 | 49.29 | 49.55 | 48.86 |
473-
| 384 | 1 | 2.57 | 2.57 | 2.57 | 2.98 | 2.98 | 2.98 |
474-
| 384 | 2 | 3.06 | 3.07 | 3.06 | 3.93 | 3.93 | 3.92 |
475-
| 384 | 4 | 4.03 | 4.03 | 4.03 | 5.78 | 5.79 | 5.74 |
476-
| 384 | 8 | 7.20 | 7.21 | 7.19 | 11.16 | 11.19 | 11.04 |
477-
| 384 | 12 | 9.18 | 9.18 | 9.17 | 15.51 | 15.51 | 15.39 |
478-
| 384 | 16 | 12.34 | 12.34 | 12.33 | 21.25 | 21.25 | 21.03 |
479-
| 384 | 24 | 17.74 | 17.79 | 17.69 | 31.13 | 31.14 | 30.82 |
480-
| 384 | 32 | 23.37 | 23.37 | 23.16 | 41.26 | 41.43 | 40.83 |
481-
| 384 | 64 | 45.08 | 45.09 | 45.01 | 79.88 | 80.21 | 79.18 |
482-
| 384 | 128 | 88.34 | 88.37 | 88.06 | 156.43 | 157.17 | 155.47 |
462+
| 128 | 1 | 1.39 | 1.39 | 1.24 | 1.54 | 1.55 | 1.54 |
463+
| 128 | 2 | 1.42 | 1.42 | 1.41 | 1.82 | 1.82 | 1.82 |
464+
| 128 | 4 | 1.78 | 1.95 | 1.79 | 2.50 | 2.50 | 2.50 |
465+
| 128 | 8 | 2.64 | 2.64 | 2.64 | 3.97 | 3.97 | 3.97 |
466+
| 128 | 12 | 3.09 | 3.09 | 3.09 | 5.02 | 5.03 | 4.99 |
467+
| 128 | 16 | 4.03 | 4.03 | 4.03 | 6.93 | 6.93 | 6.86 |
468+
| 128 | 24 | 5.28 | 5.31 | 5.28 | 9.64 | 9.65 | 9.56 |
469+
| 128 | 32 | 7.01 | 7.01 | 6.95 | 12.95 | 13.07 | 12.86 |
470+
| 128 | 64 | 12.84 | 12.86 | 12.72 | 24.80 | 25.05 | 24.68 |
471+
| 128 | 128 | 25.26 | 25.27 | 25.01 | 49.09 | 49.25 | 48.71 |
472+
| 384 | 1 | 2.55 | 2.55 | 2.55 | 2.96 | 2.96 | 2.95 |
473+
| 384 | 2 | 3.04 | 3.04 | 3.04 | 3.90 | 3.90 | 3.90 |
474+
| 384 | 4 | 4.01 | 4.02 | 4.01 | 5.74 | 5.80 | 5.68 |
475+
| 384 | 8 | 7.18 | 7.18 | 7.17 | 10.98 | 11.00 | 10.91 |
476+
| 384 | 12 | 9.15 | 9.15 | 9.14 | 15.43 | 15.44 | 15.33 |
477+
| 384 | 16 | 12.28 | 12.29 | 12.28 | 21.13 | 21.14 | 20.90 |
478+
| 384 | 24 | 17.67 | 17.67 | 17.56 | 30.98 | 31.07 | 30.71 |
479+
| 384 | 32 | 23.22 | 23.23 | 23.02 | 41.22 | 41.28 | 40.63 |
480+
| 384 | 64 | 45.16 | 45.30 | 44.83 | 79.64 | 79.98 | 79.24 |
481+
| 384 | 128 | 87.81 | 87.82 | 87.73 | 156.66 | 157.03 | 155.65 |
483482

484483
##### Megatron Large with Sparsity
485484

486485
| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
487486
|-----------------|------------|-----------------|-----------------|---------|
488487
| | | 95th Percentile | 99th Percentile | Average |
489-
| 128 | 1 | 1.17 | 1.48 | 1.18 |
490-
| 128 | 2 | 1.49 | 1.88 | 1.50 |
491-
| 128 | 4 | 1.79 | 1.79 | 1.79 |
488+
| 128 | 1 | 1.12 | 1.41 | 1.13 |
489+
| 128 | 2 | 1.37 | 1.70 | 1.38 |
490+
| 128 | 4 | 1.77 | 1.78 | 1.77 |
492491
| 128 | 8 | 2.54 | 2.54 | 2.53 |
493-
| 128 | 12 | 2.95 | 2.95 | 2.94 |
494-
| 128 | 16 | 3.97 | 3.97 | 3.96 |
495-
| 128 | 24 | 4.91 | 4.91 | 4.90 |
496-
| 128 | 32 | 6.90 | 6.92 | 6.86 |
497-
| 128 | 64 | 11.61 | 11.64 | 11.59 |
498-
| 128 | 128 | 21.34 | 21.35 | 21.21 |
499-
| 384 | 1 | 1.71 | 1.72 | 1.71 |
492+
| 128 | 12 | 3.13 | 3.13 | 3.12 |
493+
| 128 | 16 | 3.99 | 3.99 | 3.98 |
494+
| 128 | 24 | 4.90 | 4.90 | 4.90 |
495+
| 128 | 32 | 7.04 | 7.06 | 7.00 |
496+
| 128 | 64 | 11.62 | 11.63 | 11.61 |
497+
| 128 | 128 | 21.24 | 21.34 | 21.12 |
498+
| 384 | 1 | 1.71 | 2.15 | 1.71 |
500499
| 384 | 2 | 2.21 | 2.21 | 2.21 |
501-
| 384 | 4 | 3.47 | 3.47 | 3.47 |
502-
| 384 | 8 | 5.75 | 5.75 | 5.74 |
503-
| 384 | 12 | 8.37 | 8.38 | 8.35 |
504-
| 384 | 16 | 10.39 | 10.40 | 10.37 |
505-
| 384 | 24 | 14.61 | 14.62 | 14.59 |
506-
| 384 | 32 | 18.80 | 18.96 | 18.78 |
507-
| 384 | 64 | 35.90 | 35.92 | 35.62 |
508-
| 384 | 128 | 67.74 | 67.77 | 67.60 |
509-
510-
#### Inference performance: NVIDIA A30
511-
512-
Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` on NVIDIA A30.
513-
514-
##### BERT Base
515-
516-
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
517-
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
518-
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
519-
| 128 | 1 | 0.88 | 0.88 | 0.61 | 0.78 | 1.14 | 0.79 |
520-
| 128 | 2 | 1.03 | 1.04 | 0.77 | 0.97 | 1.45 | 0.98 |
521-
| 128 | 4 | 1.04 | 1.56 | 1.05 | 1.43 | 1.44 | 1.41 |
522-
| 128 | 8 | 1.44 | 1.46 | 1.43 | 2.43 | 2.44 | 2.41 |
523-
| 128 | 12 | 1.92 | 1.92 | 1.91 | 3.44 | 3.45 | 3.39 |
524-
| 128 | 16 | 2.38 | 2.43 | 2.35 | 4.36 | 4.37 | 4.28 |
525-
| 128 | 24 | 3.47 | 3.50 | 3.44 | 6.56 | 6.65 | 6.48 |
526-
| 128 | 32 | 4.42 | 4.45 | 4.38 | 8.42 | 8.58 | 8.36 |
527-
| 128 | 64 | 8.58 | 8.66 | 8.49 | 16.58 | 16.60 | 16.40 |
528-
| 128 | 128 | 16.56 | 16.62 | 16.39 | 32.13 | 32.30 | 31.93 |
529-
| 384 | 1 | 1.31 | 2.01 | 1.32 | 1.63 | 1.63 | 1.62 |
530-
| 384 | 2 | 1.67 | 1.67 | 1.66 | 2.29 | 2.35 | 2.26 |
531-
| 384 | 4 | 2.29 | 2.34 | 2.27 | 3.74 | 3.77 | 3.71 |
532-
| 384 | 8 | 4.23 | 4.24 | 4.20 | 7.25 | 7.30 | 7.15 |
533-
| 384 | 12 | 6.05 | 6.10 | 6.00 | 10.21 | 10.27 | 10.12 |
534-
| 384 | 16 | 8.07 | 8.11 | 8.02 | 13.97 | 14.05 | 13.84 |
535-
| 384 | 24 | 11.85 | 11.86 | 11.71 | 20.31 | 20.42 | 20.16 |
536-
| 384 | 32 | 15.45 | 15.47 | 15.29 | 26.86 | 27.04 | 26.65 |
537-
| 384 | 64 | 30.49 | 30.74 | 30.25 | 52.21 | 52.34 | 51.75 |
538-
| 384 | 128 | 60.21 | 60.48 | 59.56 | 103.20 | 103.58 | 102.66 |
539-
540-
##### BERT Large
541-
542-
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
543-
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
544-
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
545-
| 128 | 1 | 1.46 | 1.46 | 1.45 | 2.01 | 2.01 | 2.01 |
546-
| 128 | 2 | 1.83 | 1.85 | 1.83 | 2.80 | 2.83 | 2.75 |
547-
| 128 | 4 | 2.71 | 2.71 | 2.69 | 4.34 | 4.36 | 4.29 |
548-
| 128 | 8 | 4.33 | 4.35 | 4.28 | 8.12 | 8.20 | 8.03 |
549-
| 128 | 12 | 5.71 | 5.72 | 5.61 | 10.65 | 10.65 | 10.51 |
550-
| 128 | 16 | 7.62 | 7.64 | 7.55 | 14.57 | 14.66 | 14.55 |
551-
| 128 | 24 | 10.58 | 10.62 | 10.46 | 20.64 | 20.79 | 20.45 |
552-
| 128 | 32 | 14.18 | 14.26 | 13.99 | 28.17 | 28.31 | 28.01 |
553-
| 128 | 64 | 26.87 | 27.00 | 26.61 | 53.44 | 53.71 | 53.31 |
554-
| 128 | 128 | 52.36 | 52.71 | 51.90 | 105.42 | 105.95 | 104.96 |
555-
| 384 | 1 | 3.33 | 3.33 | 3.33 | 4.23 | 4.24 | 4.19 |
556-
| 384 | 2 | 4.26 | 4.26 | 4.23 | 6.63 | 6.65 | 6.57 |
557-
| 384 | 4 | 7.26 | 7.26 | 7.25 | 12.00 | 12.06 | 11.88 |
558-
| 384 | 8 | 12.91 | 12.99 | 12.83 | 22.61 | 22.69 | 22.45 |
559-
| 384 | 12 | 18.73 | 18.85 | 18.53 | 33.43 | 33.64 | 33.28 |
560-
| 384 | 16 | 24.06 | 24.22 | 24.02 | 44.35 | 44.64 | 44.06 |
561-
| 384 | 24 | 35.83 | 35.95 | 35.49 | 64.84 | 64.90 | 64.78 |
562-
| 384 | 32 | 47.05 | 47.27 | 46.73 | 85.89 | 86.17 | 85.11 |
563-
| 384 | 64 | 92.09 | 92.32 | 91.34 | 168.09 | 168.48 | 167.24 |
564-
| 384 | 128 | 180.47 | 180.90 | 179.75 | 330.71 | 331.31 | 329.53 |
565-
566-
##### Megatron Large with Sparsity
567-
568-
| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
569-
|-----------------|------------|-----------------|-----------------|---------|
570-
| | | 95th Percentile | 99th Percentile | Average |
571-
| 128 | 1 | 1.44 | 1.45 | 1.44 |
572-
| 128 | 2 | 1.84 | 1.84 | 1.84 |
573-
| 128 | 4 | 2.76 | 2.76 | 2.75 |
574-
| 128 | 8 | 4.12 | 4.12 | 4.11 |
575-
| 128 | 12 | 5.26 | 5.28 | 5.22 |
576-
| 128 | 16 | 7.52 | 7.52 | 7.51 |
577-
| 128 | 24 | 9.97 | 9.99 | 9.89 |
578-
| 128 | 32 | 12.84 | 12.85 | 12.80 |
579-
| 128 | 64 | 24.35 | 24.46 | 24.15 |
580-
| 128 | 128 | 46.38 | 46.60 | 45.96 |
581-
| 384 | 1 | 2.37 | 2.37 | 2.36 |
582-
| 384 | 2 | 3.88 | 3.88 | 3.87 |
583-
| 384 | 4 | 6.10 | 6.11 | 6.05 |
584-
| 384 | 8 | 11.60 | 11.63 | 11.49 |
585-
| 384 | 12 | 15.73 | 15.78 | 15.64 |
586-
| 384 | 16 | 20.95 | 21.01 | 20.90 |
587-
| 384 | 24 | 29.83 | 29.93 | 29.71 |
588-
| 384 | 32 | 40.01 | 40.09 | 39.75 |
589-
| 384 | 64 | 76.46 | 76.67 | 76.28 |
590-
| 384 | 128 | 148.96 | 149.23 | 148.11 |
591-
500+
| 384 | 4 | 3.63 | 3.64 | 3.63 |
501+
| 384 | 8 | 5.74 | 5.74 | 5.73 |
502+
| 384 | 12 | 8.22 | 8.23 | 8.21 |
503+
| 384 | 16 | 10.33 | 10.33 | 10.31 |
504+
| 384 | 24 | 14.52 | 14.52 | 14.51 |
505+
| 384 | 32 | 18.72 | 18.73 | 18.71 |
506+
| 384 | 64 | 35.79 | 35.81 | 35.50 |
507+
| 384 | 128 | 67.72 | 67.86 | 67.55 |

0 commit comments

Comments
 (0)