Skip to content

Commit fe3554e

Browse files
committed
astyled, added conference submission docs
1 parent 851c8c9 commit fe3554e

17 files changed

+1927
-1213
lines changed

benchmarks/benchmarking.ipynb

Lines changed: 572 additions & 86 deletions
Large diffs are not rendered by default.

docs/benchmarking.png

68.1 KB
Loading

docs/flow_graph.png

38.1 KB
Loading

docs/proposal.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
============================================================================================
2+
Keras2c: A simple library for converting Keras neural networks to real-time friendly C code.
3+
============================================================================================
4+
5+
Abstract
6+
********
7+
With the growth of machine learning models and neural networks in measurement and control systems comes the need to deploy these models in a way that is compatible with existing systems. Existing options for deploying neural networks either introduce very high latency, requires expensive and time consuming work to integrate into existing code bases, or only support a very limited subset of model types. We have therefore developed a new method, called Keras2c, which is a simple library for converting Keras/TensorFlow neural network models into real time compatible C code. It supports a wide range of Keras layer and model types, including multidimensional convolutions, recurrent layers, well as multi-input/output models, and shared layers. Keras2c re-implements the core components of Keras/TensorFlow required for predictive forward passes through neural networks in pure C, relying only on standard library functions. The core functionality consists of only ~1200 lines of code, making it extremely lightweight and easy to integrate into existing codebases. Keras2c has been sucessfully tested in experiments and is currently in use on the plasma control system at the DIII-D National Fusion Facility at General Atomics in San Diego.
8+
9+
Motivation
10+
**********
11+
TensorFlow is one of the most popular libraries for developing and training neural networks, and contains a high level Python API called Keras that has become extremely popular due to its ease of use and rich feature set. As the use of machine learning and neural networks grows in the field of diagnostic and control systems, one of the central challenges remains how to deploy the resulting trained models in a way that can be easily integrated into existing systems, particularly for real time predictions using machine learning models. Given that most machine learning development traditionally takes place in Python, most deployment schemes involve calling out to a Python process (often running on a distant network connected server) and using the existing Python libraries to pass data through the model. This introduces large latency, and is generally not feasible for real time applications. Other options include rewriting the entire network using the existing TensorFlow C/C++ API, though this is extremely time consuming, and requires linking the resulting code against the full TensorFlow library, containing millions of lines of code and with a binary size up to several GB. The release of TensorFlow 2.0 contained a new possibility, called "TensorFlow Lite", a reduced library designed to run on mobile and IoT devices. However, TensorFlow Lite only supports a very limited subset of the full Keras API. Therefore, we present a new option, Keras2c, a simple library for converting Keras/TensorFlow neural network models into real time compatible C code.
12+
13+
Method
14+
******
15+
16+
Keras2c consists of two primary components: a backend library of C functions that each implement a single layer of a neural net (eg, Dense, Conv2D, LSTM), and a Python script that generates C code to call the layer functions in the right order to implement the network. The total library of backend layer functions is only ~1200 lines of code, and uses only C standard library functions, yet covers a very wide range of Keras functionality, summarized below:
17+
18+
Supported Layers
19+
################
20+
- **Core Layers**: Dense, Activation, Flatten, Input, Reshape, Permute, RepeatVector
21+
- **Convolution Layers**: Convolution (1D/2D/3D, with arbitrary stride/dilation/padding), Cropping (1D/2D/3D), UpSampling (1D/2D/3D), ZeroPadding (1D/2D/3D)
22+
- **Pooling Layers**: MaxPooling (1D/2D/3D), AveragePooling (1D/2D/3D), GlobalMaxPooling (1D/2D/3D), GlobalAveragePooling (1D/2D/3D)
23+
- **Recurrent Layers**: SimpleRNN, GRU, LSTM (statefull or stateless)
24+
- **Embedding Layers**: Embedding
25+
- **Merge Layers**: Add, Subtract, Multiply, Average, Maximum, Minimum, Concatenate, Dot
26+
- **Normalization Layers**: BatchNormalization
27+
- **Layer Wrappers**: TimeDistributed, Bidirectional
28+
- **Activations**: ReLU, tanh, sigmoid, hard sigmoid, exponential, softplus, softmax, softsign, LeakyReLU, PReLU, ELU, ThresholdedReLU
29+
30+
31+
.. figure:: flow_graph.png
32+
:align: center
33+
:scale: 50 %
34+
35+
Workflow of converting Keras model to C code with Keras2C
36+
37+
The Keras2c Python script takes in a trained Keras model and extracts the weights and other parameters, and parses the graph structure to determine the order that functions should be called to obtain the correct results. It then generates C code for a predictor function, that can be called with a set of inputs to generate predictions. It also generates helper functions for initializing and cleanup, to handle memory allocation (by default all variables are declared on the stack, though it also supports the option of dynamically allocating memory before execution). In addition to simple sequential models, Keras2c also supports more complicated architectures created using the Keras functional API, including multi-input/multi-output networks with complicated branching and merging internal structures.
38+
39+
To confirm that the generated code accurately reproduces the outputs of the original model, Keras2c also generates sample input/output pairs from the original network. It then automatically tests the generated code with the same inputs to verify that the generated code produces equivalent outputs.
40+
41+
Benchmarks
42+
**********
43+
44+
Keras2c has also been benchmarked against Python Keras/TensorFlow for single CPU performance, and the generated code has been shown to be significantly faster for small to medium sized models.
45+
(All tests conducted on Intel Core i7-8750H CPU @ 2.20GHz, single threaded, 32GB RAM. Keras2c compiled with GCC 7.4.0 with -O3 optimization. Python Keras v2.2.4, TensorFlowCPU v1.13.1, mkl v2019.1)
46+
47+
.. figure:: benchmarking.png
48+
:align: center
49+
50+
Benchmarking results, Keras2c vs Keras/Tensorflow in Python.
51+
52+

docs/proposal.tex

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
\documentclass{article}
2+
\usepackage[letterpaper, portrait, margin=1in]{geometry}
3+
\usepackage[utf8]{inputenc}
4+
\usepackage{amsfonts,amsmath,amssymb,amsthm}
5+
\usepackage{bm,nicefrac}
6+
\usepackage{graphicx}
7+
\usepackage{hyperref}
8+
\usepackage{subcaption}
9+
\usepackage[section]{placeins}
10+
\usepackage{textgreek}
11+
\usepackage{changepage}
12+
\usepackage{authblk}
13+
14+
\title{Keras2c: A simple library for converting Keras neural networks
15+
to real-time friendly C}
16+
\author[1]{Rory Conlin}
17+
\author[2]{Keith Erickson}
18+
\author[3]{Joe Abbate}
19+
\author[1,2]{Egemen Kolemen}
20+
\affil[1]{Dept. of Mechanical \& Aerospace Engineering, Princeton University}
21+
\affil[2]{Princeton Plasma Physics Laboratory}
22+
\affil[3]{Dept. of Astrophysical Sciences, Princeton University}
23+
24+
25+
\begin{document}
26+
27+
\maketitle
28+
29+
30+
\section*{Abstract}\label{abstract}
31+
32+
With the growth of machine learning models and neural networks in
33+
measurement and control systems comes the need to deploy these models in
34+
a way that is compatible with existing systems. Existing options for
35+
deploying neural networks either introduce very high latency, requires
36+
expensive and time consuming work to integrate into existing code bases,
37+
or only support a very limited subset of model types. We have therefore
38+
developed a new method, called Keras2c, which is a simple library for
39+
converting Keras/TensorFlow neural network models into real time
40+
compatible C code. It supports a wide range of Keras layer and model
41+
types, including multidimensional convolutions, recurrent layers, well
42+
as multi-input/output models, and shared layers. Keras2c re-implements
43+
the core components of Keras/TensorFlow required for predictive forward
44+
passes through neural networks in pure C, relying only on standard
45+
library functions. The core functionality consists of only
46+
\textasciitilde{}1200 lines of code, making it extremely lightweight and
47+
easy to integrate into existing codebases. Keras2c has been sucessfully
48+
tested in experiments and is currently in use on the plasma control
49+
system at the DIII-D National Fusion Facility at General Atomics in San
50+
Diego.
51+
52+
53+
\section*{Motivation}\label{motivation}
54+
55+
TensorFlow is one of the most popular libraries for developing and
56+
training neural networks, and contains a high level Python API called
57+
Keras that has become extremely popular due to its ease of use and rich
58+
feature set. As the use of machine learning and neural networks grows in
59+
the field of diagnostic and control systems, one of the central
60+
challenges remains how to deploy the resulting trained models in a way
61+
that can be easily integrated into existing systems, particularly for
62+
real time predictions using machine learning models. Given that most
63+
machine learning development traditionally takes place in Python, most
64+
deployment schemes involve calling out to a Python process (often
65+
running on a distant network connected server) and using the existing
66+
Python libraries to pass data through the model. This introduces large
67+
latency, and is generally not feasible for real time applications. Other
68+
options include rewriting the entire network using the existing
69+
TensorFlow C/C++ API, though this is extremely time consuming, and
70+
requires linking the resulting code against the full TensorFlow library,
71+
containing millions of lines of code and with a binary size up to
72+
several GB. The release of TensorFlow 2.0 contained a new possibility,
73+
called "TensorFlow Lite", a reduced library designed to run on mobile
74+
and IoT devices. However, TensorFlow Lite only supports a very limited
75+
subset of the full Keras API. Therefore, we present a new option,
76+
Keras2c, a simple library for converting Keras/TensorFlow neural network
77+
models into real time compatible C code.
78+
79+
80+
\section*{Method}\label{method}
81+
82+
83+
84+
Keras2c consists of two primary components: a backend library of C
85+
functions that each implement a single layer of a neural net (eg, Dense,
86+
Conv2D, LSTM), and a Python script that generates C code to call the
87+
layer functions in the right order to implement the network. The total
88+
library of backend layer functions is only $\sim$1200 lines
89+
of code, and uses only C standard library functions, yet covers a very
90+
wide range of Keras functionality, summarized below:
91+
92+
93+
94+
\subsubsection*{Supported Functionality}\label{supported-layers}
95+
96+
\begin{itemize}
97+
\setlength\itemsep{0em}
98+
\item
99+
\textbf{Core Layers}: Dense, Activation, Flatten, Input, Reshape, Permute, RepeatVector
100+
\item
101+
\textbf{Convolution Layers}: Convolution (1D/2D/3D, with arbitrary stride/dilation/padding), Cropping (1D/2D/3D), UpSampling (1D/2D/3D), ZeroPadding (1D/2D/3D)
102+
\item
103+
\textbf{Pooling Layers}: MaxPooling (1D/2D/3D), AveragePooling (1D/2D/3D), GlobalMaxPooling (1D/2D/3D), GlobalAveragePooling (1D/2D/3D)
104+
\item
105+
\textbf{Recurrent Layers}: SimpleRNN, GRU, LSTM (statefull or stateless)
106+
\item
107+
\textbf{Embedding Layers}: Embedding
108+
\item
109+
\textbf{Merge Layers}: Add, Subtract, Multiply, Average, Maximum, Minimum, Concatenate, Dot
110+
\item
111+
\textbf{Normalization Layers}: BatchNormalization
112+
\item
113+
\textbf{Layer Wrappers}: TimeDistributed, Bidirectional
114+
\item
115+
\textbf{Activations}: ReLU, tanh, sigmoid, hard sigmoid, exponential, softplus, softmax, softsign, LeakyReLU, PReLU, ELU, ThresholdedReLU
116+
\end{itemize}
117+
118+
119+
120+
The Keras2c Python script takes in a trained Keras model and extracts
121+
the weights and other parameters, and parses the graph structure to
122+
determine the order that functions should be called to obtain the
123+
correct results. It then generates C code for a predictor function, that
124+
can be called with a set of inputs to generate predictions. It also
125+
generates helper functions for initializing and cleanup, to handle
126+
memory allocation (by default all variables are declared on the stack,
127+
though it also supports the option of dynamically allocating memory
128+
before execution). In addition to simple sequential models, Keras2c also
129+
supports more complicated architectures created using the Keras
130+
functional API, including multi-input/multi-output networks with
131+
complicated branching and merging internal structures.
132+
\begin{figure}[h!]
133+
\centering
134+
\includegraphics[width=3.5in]{flow_graph.png}
135+
\caption{Workflow of converting Keras model to C code with Keras2C}
136+
\end{figure}
137+
138+
To confirm that the generated code accurately reproduces the outputs of
139+
the original model, Keras2c also generates sample input/output pairs
140+
from the original network. It then automatically tests the generated
141+
code with the same inputs to verify that the generated code produces
142+
equivalent outputs.
143+
144+
145+
\section*{Benchmarks}\label{benchmarks}
146+
147+
Keras2c has also been benchmarked against Python Keras/TensorFlow for
148+
single CPU performance, and the generated code has been shown to be
149+
significantly faster for small to medium sized models. (All tests
150+
conducted on Intel Core i7-8750H CPU @ 2.20GHz, single threaded, 32GB
151+
RAM. Keras2c compiled with GCC 7.4.0 with -O3 optimization. Python Keras
152+
v2.2.4, TensorFlowCPU v1.13.1, mkl v2019.1)
153+
154+
\begin{figure}[h]
155+
\centering
156+
\includegraphics[width=6in]{benchmarking.png}
157+
\caption{Benchmarking results, Keras2c vs Keras/Tensorflow in Python.}
158+
\end{figure}
159+
160+
\end{document}

0 commit comments

Comments
 (0)