Skip to content

A library that lets you easily increase efficiency of your deep learning models with no loss of accuracy.

License

Notifications You must be signed in to change notification settings

Talmaj/DNN-bench

Repository files navigation

DNN Bench

GitHub

DNN Bench is a library that lets you benchmark inference speed of your deep learning models against various frameworks, hardware and execution providers with a single command. See Comprehensive Analysis.

With DNN Bench you can answer questions like:

  • to which hardware should I deploy my model?
  • which backend should I use?
  • should I apply an optimisation technique, e.g. quantisation, before I deploy it?

The goal is to make it easy for developers to choose the most optimal deployment configuration (optimization on/off, backend, hardware) for their particular use-cases.

Side note: Models are benchmarked within docker containers.

Example

Performance of BERT-Squad and ResNet on c5a.4xlarge, an AWS EC2 CPU compute instance. It shows number of processed samples per second, where more is better.

Bert-CPU Resnet-CPU

See Comprehensive Analysis for more models benchmarked on different hardware.

Supported devices and backends

PyTorch TensorFlow ONNX-Runtime OpenVINO* Nuphar* CUDA* TensorRT*
CPU
GPU
ARM

*Marked backends are executed within ONNX-Runtime framework.

Installation

Dependencies

Ubuntu

./install_dependencies.sh cpu 

Replace cpu argument with gpu for nvidia-docker.

Other

Deep learning backends

You can use pre-compiled images from dockerhub. They will be downloaded automatically when running ./bench_model.sh

Optional.
Prepare docker images for various deep learning backends locally.

./prepare_images.sh cpu 

Replace cpu argument with gpu for gpu backends or arm for arm backends.

Usage

Benchmark an onnx model against different backends:

./bench_model.sh path_to_model --repeat=100 --number=1 --warmup=10 --device=cpu \ --tf --onnxruntime --openvino --pytorch --nuphar 

Possible backends:

 --tf (with --device=cpu or gpu) --onnxruntime (with --device=cpu or arm) --openvino (with --device=cpu) --pytorch (with --device=cpu or gpu) --nuphar (with --device=cpu) --ort-cuda (with --device=gpu) --ort-tensorrt (with --device=gpu) 

Additional Parameters:

 --output OUTPUT Directory of benchmarking results. Default: ./results --repeat REPEAT Benchmark repeats. Default: 1000 --number NUMBER Benchmark number. Default: 1 --warmup WARMUP Benchmark warmup repeats that are discarded. Default: 100 --device DEVICE Device backend: CPU or GPU or ARM. Default: CPU --quantize Dynamic quantization in a corresponding backend. 

Results

Results are stored by default to ./results directory. Each benchmarking result is stored in a json format.

{ 'model_path': '/models/efficientnet-lite4.onnx', 'output_path': '/results/efficientnet-lite4-onnxruntime-openvino.json', 'backend': 'onnxruntime', 'backend_meta': 'openvino', 'device': 'cpu', 'number': 1, 'repeat': 100, 'warmup': 10, 'size': 51946641, 'input_size': [[1, 224, 224, 3]], 'min': 0.038544699986232445, 'max': 0.05930669998633675, 'mean': 0.04293907555596282, 'std': 0.0039751552053260125, 'data': [0.04748649999964982, 0.05760759999975562, ... ] } 
  • model_path: path to the input model
  • output_path: path to the results file
  • backend: deep learning backend used to produce the results
  • backend_meta: special parameters used with the backend. Example: onnxruntime used with openvino.
  • device: gpu, cpu, arm, etc. where the model was benchmarked.
  • number: Number of inferences in a single experiment.
  • repeat: Number of repeated experiments.
  • warmup: Number of discarded experiments. Reasoning: inference might not reach its optimal performance in the first few runs.
  • size: Size of the model in bytes.
  • min: Minimum time of an experiment run.
  • max: Maximum time of an experiment run.
  • mean: Mean time of an experiment run.
  • std: Standard deviation of an experiment run.
  • data: All measurements of the experiment runs.

Plotting

A simple plotting utility to generate quick plots is available in plot_results.py.

  • Dependencies:
    pip install seaborn matplotlib pandas
  • Usage:
    python vis/plot_results.py results_dir plots_dir

Limitations and known issues

  • --quantize flag not supported for --ort-cuda, --ort-tensorrt and --tf
  • Current version supports onnx models only. To convert models from other frameworks
    follow these examples.
  • The following docker images for CPU execution utilize only half of the CPUs on Linux ec2 instances:
    • onnxruntime with openvino,
    • pytorch
  • onnxruntime with nuphar utilizes total count of CPUs - 1 on Linux ec2 instances.

Troubleshoot

  • If running tensorflow image fails due to onnx-tf conversion, re-build the image locally: docker build -f dockerfiles/Dockerfile.tf -t toriml/tensorflow:latest .
  • If you have permission errors to run docker, add yourself to docker group sudo usermod -aG docker $USER and re-login su - $USER.

About

A library that lets you easily increase efficiency of your deep learning models with no loss of accuracy.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •