gpt2-openvino (5x faster)

GPT2 using OpenVino speed up.

Installation

If you have python-3.9 installed you will have to install python-3.7.9 for this project to work. Follow the instructions below when building for first time (verified build on MacOS):

brew install pyenv # for syncing multitple versions on the machine pip3 install virtualenv # virtual-environment maker, can use any other package pyenv install 3.7.9 # install the specific version pyenv local 3.7.9 # set local (this folder) version to 3.7.9 export LOCAL_PY_VER_PATH=`pyenv which python3` # set path for convinience echo $LOCAL_PY_VER_PATH # [opt.] to check the path $LOCAL_PY_VER_PATH -m venv . # using the path above build a virtual environment in this folder source bin/activate # activate the local env pip3 install -r requirements.txt # install run dependencies

When coming back to this project simply activate the virtualenv as and the rest will be ready for you:

source bin/activate

ONNX Model

To get the model in the ONNX format first run the file convert.py, this should dump gpt2.onnx file.

python3 convert.py

From ONNX to Openvino

For this you must first have openvino installed on your system. Download from here. Now I have added most of the requirements in my requirements.txt file, however you should also install those for OpenVino. After that run the following commands to setup environment variables:

export OPENVINO_FOLDER="path/to/openvino_2021" cd $OPENVINO_FOLDER/bin source setupvars.sh cd $OPENVINO_FOLDER/deployment_tools/model_optimizer pip3 install install -r requirements.txt pip3 install install -r requirements_onnx.txt

If everything works correctly you will see an output like this:

[setupvars.sh] OpenVINO environment initialized

Now come back to this repo, Openvino environment setup works correctly only if you are in the openvino_2021/bin folder. Now we run the script mo_onnx.py:

mo_onnx.py --help # to get meanings of arguments to be passed mkdir full_precision half_precision # full_precision is FP36 and other is FP16 mo_onnx.py --input_model gpt2.onnx \ --data_type=FP32/FP16 \ --output_dir=full_precision/half_precision

If everything works correctly you should see 3 files in /fp32 folder:

gpt2.bin gpt2.mapping gpt2.xml

Tests

Local Machine

To check if everything works fine run the script run.py. You should start seeing the outputs, the following is on the machine with following configuration:

MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports) Processor: 2 GHz Quad-Core Intel Core i5 Memory: 16 GB 3733 MHz LPDDR4X Graphics: Intel Iris Plus Graphics 1536 MB

The performance results are as follows (2x boost):

---------------------------------------------------------------------- Loading Pytorch model :: Pytorch inference in 0.59065s ---------------------------------------------------------------------- Creating Inference Engine... Loading network Loading IR to the plugin... exec_net: <openvino.inference_engine.ie_api.ExecutableNetwork object at 0x12c531fb0> :: OpenVino inference in 0.26206s ----------------------------------------------------------------------

In order to test generation capabilities you can pass --g flag and get the following results:

---------------------------------------------------------------------- Loading Pytorch model Text shape: torch.Size([1, 127]) :: Pytorch inference in 0.46476s ---------------------------------------------------------------------- Testing generation :: Pytorch generation took (40 steps): 17.663s ---------------------------------------------------------------------- Creating Inference Engine... Loading network Loading IR to the plugin... exec_net: <openvino.inference_engine.ie_api.ExecutableNetwork object at 0x130aaffb0> :: OpenVino inference in 0.23262s ---------------------------------------------------------------------- Testing generation :: OpenVino generation took (40 steps): 6.220s ----------------------------------------------------------------------

Cloud Server

When running on AWS c5.12xlarge and batching the data to 128 samples in a batch we see larger performance increase.

---------------------------------------------------------------------- Loading Pytorch model Pytorch inference in 3.55126s ---------------------------------------------------------------------- Creating Inference Engine... Loading network Loading IR to the plugin... exec_net: <openvino.inference_engine.ie_api.ExecutableNetwork object at 0x12c531fb0> ---------------------------------------------------------------------- OpenVino inference in 0.78668s ----------------------------------------------------------------------

Which is a 5x boost. Using OpenVino benchmarking tool we saw even more power throughput working at 134.29ms of first inference and 17ms as average processing time across 3522 runs. This is a massive 209x speed improvement.

This proves our hypothesis that larger CPU machines can take advantage of OpenVino's performance in a super-liear fashion.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
image.png		image.png
requirements.txt		requirements.txt
run.py		run.py
text.en		text.en

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gpt2-openvino (5x faster)

Installation

ONNX Model

From ONNX to Openvino

Tests

Local Machine

Cloud Server

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

NimbleBoxAI/gpt2-openvino

Folders and files

Latest commit

History

Repository files navigation

gpt2-openvino (5x faster)

Installation

ONNX Model

From ONNX to Openvino

Tests

Local Machine

Cloud Server

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages