DNN Bench

DNN Bench is a library that lets you benchmark inference speed of your deep learning models against various frameworks, hardware and execution providers with a single command. See Comprehensive Analysis.

With DNN Bench you can answer questions like:

to which hardware should I deploy my model?
which backend should I use?
should I apply an optimisation technique, e.g. quantisation, before I deploy it?

The goal is to make it easy for developers to choose the most optimal deployment configuration (optimization on/off, backend, hardware) for their particular use-cases.

Side note: Models are benchmarked within docker containers.

Example

Performance of BERT-Squad and ResNet on c5a.4xlarge, an AWS EC2 CPU compute instance. It shows number of processed samples per second, where more is better.

See Comprehensive Analysis for more models benchmarked on different hardware.

Supported devices and backends

	PyTorch	TensorFlow	ONNX-Runtime	OpenVINO*	Nuphar*	CUDA*	TensorRT*
CPU	✅	✅	✅	✅	✅
GPU	✅	✅				✅	✅
ARM			✅

*Marked backends are executed within ONNX-Runtime framework.

Installation

Dependencies

Ubuntu

./install_dependencies.sh cpu

Replace cpu argument with gpu for nvidia-docker.

Other

Install docker.
Install nvidia-docker
Add yourself to docker group sudo usermod -aG docker $USER to run docker commands without sudo.

Deep learning backends

You can use pre-compiled images from dockerhub. They will be downloaded automatically when running ./bench_model.sh

Optional.
Prepare docker images for various deep learning backends locally.

./prepare_images.sh cpu

Replace cpu argument with gpu for gpu backends or arm for arm backends.

Usage

Benchmark an onnx model against different backends:

./bench_model.sh path_to_model --repeat=100 --number=1 --warmup=10 --device=cpu \ --tf --onnxruntime --openvino --pytorch --nuphar

Possible backends:

 --tf (with --device=cpu or gpu) --onnxruntime (with --device=cpu or arm) --openvino (with --device=cpu) --pytorch (with --device=cpu or gpu) --nuphar (with --device=cpu) --ort-cuda (with --device=gpu) --ort-tensorrt (with --device=gpu)

Additional Parameters:

 --output OUTPUT Directory of benchmarking results. Default: ./results --repeat REPEAT Benchmark repeats. Default: 1000 --number NUMBER Benchmark number. Default: 1 --warmup WARMUP Benchmark warmup repeats that are discarded. Default: 100 --device DEVICE Device backend: CPU or GPU or ARM. Default: CPU --quantize Dynamic quantization in a corresponding backend.

Results

Results are stored by default to ./results directory. Each benchmarking result is stored in a json format.

{ 'model_path': '/models/efficientnet-lite4.onnx', 'output_path': '/results/efficientnet-lite4-onnxruntime-openvino.json', 'backend': 'onnxruntime', 'backend_meta': 'openvino', 'device': 'cpu', 'number': 1, 'repeat': 100, 'warmup': 10, 'size': 51946641, 'input_size': [[1, 224, 224, 3]], 'min': 0.038544699986232445, 'max': 0.05930669998633675, 'mean': 0.04293907555596282, 'std': 0.0039751552053260125, 'data': [0.04748649999964982, 0.05760759999975562, ... ] }

model_path: path to the input model
output_path: path to the results file
backend: deep learning backend used to produce the results
backend_meta: special parameters used with the backend. Example: onnxruntime used with openvino.
device: gpu, cpu, arm, etc. where the model was benchmarked.
number: Number of inferences in a single experiment.
repeat: Number of repeated experiments.
warmup: Number of discarded experiments. Reasoning: inference might not reach its optimal performance in the first few runs.
size: Size of the model in bytes.
min: Minimum time of an experiment run.
max: Maximum time of an experiment run.
mean: Mean time of an experiment run.
std: Standard deviation of an experiment run.
data: All measurements of the experiment runs.

Plotting

A simple plotting utility to generate quick plots is available in plot_results.py.

Dependencies:
pip install seaborn matplotlib pandas
Usage:
python vis/plot_results.py results_dir plots_dir

Limitations and known issues

--quantize flag not supported for --ort-cuda, --ort-tensorrt and --tf
Current version supports onnx models only. To convert models from other frameworks
follow these examples.
The following docker images for CPU execution utilize only half of the CPUs on Linux ec2 instances:
- onnxruntime with openvino,
- pytorch
onnxruntime with nuphar utilizes total count of CPUs - 1 on Linux ec2 instances.

Troubleshoot

If running tensorflow image fails due to onnx-tf conversion, re-build the image locally: docker build -f dockerfiles/Dockerfile.tf -t toriml/tensorflow:latest .
If you have permission errors to run docker, add yourself to docker group sudo usermod -aG docker $USER and re-login su - $USER.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.circleci		.circleci
bench		bench
dockerfiles		dockerfiles
docs		docs
vis		vis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench_model.sh		bench_model.sh
install_dependencies.sh		install_dependencies.sh
prepare_images.sh		prepare_images.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DNN Bench

Example

Supported devices and backends

Installation

Dependencies

Ubuntu

Other

Deep learning backends

Usage

Results

Plotting

Limitations and known issues

Troubleshoot

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

Talmaj/DNN-bench

Folders and files

Latest commit

History

Repository files navigation

DNN Bench

Example

Supported devices and backends

Installation

Dependencies

Ubuntu

Other

Deep learning backends

Usage

Results

Plotting

Limitations and known issues

Troubleshoot

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages