|
| 1 | +\documentclass{article} |
| 2 | +\usepackage[letterpaper, portrait, margin=1in]{geometry} |
| 3 | +\usepackage[utf8]{inputenc} |
| 4 | +\usepackage{amsfonts,amsmath,amssymb,amsthm} |
| 5 | +\usepackage{bm,nicefrac} |
| 6 | +\usepackage{graphicx} |
| 7 | +\usepackage{hyperref} |
| 8 | +\usepackage{subcaption} |
| 9 | +\usepackage[section]{placeins} |
| 10 | +\usepackage{textgreek} |
| 11 | +\usepackage{changepage} |
| 12 | +\usepackage{authblk} |
| 13 | + |
| 14 | +\title{Keras2c: A simple library for converting Keras neural networks |
| 15 | +to real-time friendly C} |
| 16 | +\author[1]{Rory Conlin} |
| 17 | +\author[2]{Keith Erickson} |
| 18 | +\author[3]{Joe Abbate} |
| 19 | +\author[1,2]{Egemen Kolemen} |
| 20 | +\affil[1]{Dept. of Mechanical \& Aerospace Engineering, Princeton University} |
| 21 | +\affil[2]{Princeton Plasma Physics Laboratory} |
| 22 | +\affil[3]{Dept. of Astrophysical Sciences, Princeton University} |
| 23 | + |
| 24 | + |
| 25 | +\begin{document} |
| 26 | + |
| 27 | +\maketitle |
| 28 | + |
| 29 | + |
| 30 | +\section*{Abstract}\label{abstract} |
| 31 | + |
| 32 | +With the growth of machine learning models and neural networks in |
| 33 | +measurement and control systems comes the need to deploy these models in |
| 34 | +a way that is compatible with existing systems. Existing options for |
| 35 | +deploying neural networks either introduce very high latency, requires |
| 36 | +expensive and time consuming work to integrate into existing code bases, |
| 37 | +or only support a very limited subset of model types. We have therefore |
| 38 | +developed a new method, called Keras2c, which is a simple library for |
| 39 | +converting Keras/TensorFlow neural network models into real time |
| 40 | +compatible C code. It supports a wide range of Keras layer and model |
| 41 | +types, including multidimensional convolutions, recurrent layers, well |
| 42 | +as multi-input/output models, and shared layers. Keras2c re-implements |
| 43 | +the core components of Keras/TensorFlow required for predictive forward |
| 44 | +passes through neural networks in pure C, relying only on standard |
| 45 | +library functions. The core functionality consists of only |
| 46 | +\textasciitilde{}1200 lines of code, making it extremely lightweight and |
| 47 | +easy to integrate into existing codebases. Keras2c has been sucessfully |
| 48 | +tested in experiments and is currently in use on the plasma control |
| 49 | +system at the DIII-D National Fusion Facility at General Atomics in San |
| 50 | +Diego. |
| 51 | + |
| 52 | + |
| 53 | +\section*{Motivation}\label{motivation} |
| 54 | + |
| 55 | +TensorFlow is one of the most popular libraries for developing and |
| 56 | +training neural networks, and contains a high level Python API called |
| 57 | +Keras that has become extremely popular due to its ease of use and rich |
| 58 | +feature set. As the use of machine learning and neural networks grows in |
| 59 | +the field of diagnostic and control systems, one of the central |
| 60 | +challenges remains how to deploy the resulting trained models in a way |
| 61 | +that can be easily integrated into existing systems, particularly for |
| 62 | +real time predictions using machine learning models. Given that most |
| 63 | +machine learning development traditionally takes place in Python, most |
| 64 | +deployment schemes involve calling out to a Python process (often |
| 65 | +running on a distant network connected server) and using the existing |
| 66 | +Python libraries to pass data through the model. This introduces large |
| 67 | +latency, and is generally not feasible for real time applications. Other |
| 68 | +options include rewriting the entire network using the existing |
| 69 | +TensorFlow C/C++ API, though this is extremely time consuming, and |
| 70 | +requires linking the resulting code against the full TensorFlow library, |
| 71 | +containing millions of lines of code and with a binary size up to |
| 72 | +several GB. The release of TensorFlow 2.0 contained a new possibility, |
| 73 | +called "TensorFlow Lite", a reduced library designed to run on mobile |
| 74 | +and IoT devices. However, TensorFlow Lite only supports a very limited |
| 75 | +subset of the full Keras API. Therefore, we present a new option, |
| 76 | +Keras2c, a simple library for converting Keras/TensorFlow neural network |
| 77 | +models into real time compatible C code. |
| 78 | + |
| 79 | + |
| 80 | +\section*{Method}\label{method} |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | +Keras2c consists of two primary components: a backend library of C |
| 85 | +functions that each implement a single layer of a neural net (eg, Dense, |
| 86 | +Conv2D, LSTM), and a Python script that generates C code to call the |
| 87 | +layer functions in the right order to implement the network. The total |
| 88 | +library of backend layer functions is only $\sim$1200 lines |
| 89 | +of code, and uses only C standard library functions, yet covers a very |
| 90 | +wide range of Keras functionality, summarized below: |
| 91 | + |
| 92 | + |
| 93 | + |
| 94 | +\subsubsection*{Supported Functionality}\label{supported-layers} |
| 95 | + |
| 96 | +\begin{itemize} |
| 97 | + \setlength\itemsep{0em} |
| 98 | +\item |
| 99 | + \textbf{Core Layers}: Dense, Activation, Flatten, Input, Reshape, Permute, RepeatVector |
| 100 | +\item |
| 101 | + \textbf{Convolution Layers}: Convolution (1D/2D/3D, with arbitrary stride/dilation/padding), Cropping (1D/2D/3D), UpSampling (1D/2D/3D), ZeroPadding (1D/2D/3D) |
| 102 | +\item |
| 103 | + \textbf{Pooling Layers}: MaxPooling (1D/2D/3D), AveragePooling (1D/2D/3D), GlobalMaxPooling (1D/2D/3D), GlobalAveragePooling (1D/2D/3D) |
| 104 | +\item |
| 105 | + \textbf{Recurrent Layers}: SimpleRNN, GRU, LSTM (statefull or stateless) |
| 106 | +\item |
| 107 | + \textbf{Embedding Layers}: Embedding |
| 108 | +\item |
| 109 | + \textbf{Merge Layers}: Add, Subtract, Multiply, Average, Maximum, Minimum, Concatenate, Dot |
| 110 | +\item |
| 111 | + \textbf{Normalization Layers}: BatchNormalization |
| 112 | +\item |
| 113 | + \textbf{Layer Wrappers}: TimeDistributed, Bidirectional |
| 114 | +\item |
| 115 | + \textbf{Activations}: ReLU, tanh, sigmoid, hard sigmoid, exponential, softplus, softmax, softsign, LeakyReLU, PReLU, ELU, ThresholdedReLU |
| 116 | +\end{itemize} |
| 117 | + |
| 118 | + |
| 119 | + |
| 120 | +The Keras2c Python script takes in a trained Keras model and extracts |
| 121 | +the weights and other parameters, and parses the graph structure to |
| 122 | +determine the order that functions should be called to obtain the |
| 123 | +correct results. It then generates C code for a predictor function, that |
| 124 | +can be called with a set of inputs to generate predictions. It also |
| 125 | +generates helper functions for initializing and cleanup, to handle |
| 126 | +memory allocation (by default all variables are declared on the stack, |
| 127 | +though it also supports the option of dynamically allocating memory |
| 128 | +before execution). In addition to simple sequential models, Keras2c also |
| 129 | +supports more complicated architectures created using the Keras |
| 130 | +functional API, including multi-input/multi-output networks with |
| 131 | +complicated branching and merging internal structures. |
| 132 | +\begin{figure}[h!] |
| 133 | +\centering |
| 134 | +\includegraphics[width=3.5in]{flow_graph.png} |
| 135 | +\caption{Workflow of converting Keras model to C code with Keras2C} |
| 136 | +\end{figure} |
| 137 | + |
| 138 | +To confirm that the generated code accurately reproduces the outputs of |
| 139 | +the original model, Keras2c also generates sample input/output pairs |
| 140 | +from the original network. It then automatically tests the generated |
| 141 | +code with the same inputs to verify that the generated code produces |
| 142 | +equivalent outputs. |
| 143 | + |
| 144 | + |
| 145 | +\section*{Benchmarks}\label{benchmarks} |
| 146 | + |
| 147 | +Keras2c has also been benchmarked against Python Keras/TensorFlow for |
| 148 | +single CPU performance, and the generated code has been shown to be |
| 149 | +significantly faster for small to medium sized models. (All tests |
| 150 | +conducted on Intel Core i7-8750H CPU @ 2.20GHz, single threaded, 32GB |
| 151 | +RAM. Keras2c compiled with GCC 7.4.0 with -O3 optimization. Python Keras |
| 152 | +v2.2.4, TensorFlowCPU v1.13.1, mkl v2019.1) |
| 153 | + |
| 154 | +\begin{figure}[h] |
| 155 | +\centering |
| 156 | +\includegraphics[width=6in]{benchmarking.png} |
| 157 | +\caption{Benchmarking results, Keras2c vs Keras/Tensorflow in Python.} |
| 158 | +\end{figure} |
| 159 | + |
| 160 | +\end{document} |
0 commit comments