0% found this document useful (0 votes)
507 views34 pages

Deep Learning For IoT Big Data and Streaming Analytics

MachineLearning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
507 views34 pages

Deep Learning For IoT Big Data and Streaming Analytics

MachineLearning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

1

Deep Learning for IoT Big Data and Streaming


Analytics: A Survey
Mehdi Mohammadi, Graduate Student Member, IEEE, Ala Al-Fuqaha, Senior Member, IEEE,
Sameh Sorour, Senior Member, IEEE, Mohsen Guizani, Fellow, IEEE

Abstract—In the era of the Internet of Things (IoT), an


enormous amount of sensing devices collect and/or generate
various sensory data over time for a wide range of fields
and applications. Based on the nature of the application, these
arXiv:1712.04301v1 [cs.NI] 9 Dec 2017

devices will result in big or fast/real-time data streams. Applying


analytics over such data streams to discover new information,
predict future insights, and make control decisions is a crucial
process that makes IoT a worthy paradigm for businesses and a
quality-of-life improving technology. In this paper, we provide a
thorough overview on using a class of advanced machine learning
techniques, namely Deep Learning (DL), to facilitate the analytics
and learning in the IoT domain. We start by articulating IoT
data characteristics and identifying two major treatments for
IoT data from a machine learning perspective, namely IoT big
data analytics and IoT streaming data analytics. We also discuss
why DL is a promising approach to achieve the desired analytics Fig. 1. IoT data generation at different levels and deep learning models to
in these types of data and applications. The potential of using address their knowledge abstraction.
emerging DL techniques for IoT data analytics are then discussed,
and its promises and challenges are introduced. We present a
comprehensive background on different DL architectures and
algorithms. We also analyze and summarize major reported as transportation, agriculture, urban infrastructure, security,
research attempts that leveraged DL in the IoT domain. The and retail have about 15% of the IoT market totally. These
smart IoT devices that have incorporated DL in their intelligence expectations imply the tremendous and steep growth of the IoT
background are also discussed. DL implementation approaches
on the fog and cloud centers in support of IoT applications are services, their generated data and consequently their related
also surveyed. Finally, we shed light on some challenges and market in the years ahead.
potential directions for future research. At the end of each section, In recent years, many IoT applications arose in different
we highlight the lessons learned based on our experiments and vertical domains, i.e., health, transportation, smart home, smart
review of the recent literature.
city, agriculture, education, etc. The main element of most
Keywords-Deep Learning, Deep Neural Network, Internet of of these applications is an intelligent learning mechanism
Things, On-device Intelligence, IoT Big Data, Fast data analytics,
for prediction (i.e., classification or regression), or clustering.
Cloud-based analytics.
Among the many machine learning approaches, Deep Learning
(DL) has been actively utilized in many IoT applications in
I. I NTRODUCTION recent years. These two technologies (i.e., DL and IoT) are
The vision of the Internet of Things (IoT) is to transform among the top three strategic technology trends for 2017 that
traditional objects to being smart by exploiting a wide range of were announced at Gartner Symposium/ITxpo 2016 [3]. The
advanced technologies, from embedded devices and commu- cause of this intensive publicity for DL refers to the fact that
nication technologies to Internet protocols, data analytics, and traditional machine learning approaches do not address the
so forth [1]. The potential economic impact of IoT is expected emerging analytic needs of IoT systems. Instead, IoT systems
to bring many business opportunities and to accelerate the need different modern data analytic approaches and artificial
economic growth of IoT-based services. Based on McKinsey’s intelligence (AI) methods according to the hierarchy of IoT
report for the economic impact of IoT by 2025 [2], the annual data generation and management as illustrated in Figure 1.
economic impact of IoT would be in the range of $2.7 to The growing interest in the Internet of Things (IoT) and
$6.2 trillion. Healthcare constitutes the major part, about 41% its derivative big data need stakeholders to clearly understand
of this market, followed by industry and energy with 33% their definition, building blocks, potentials and challenges. IoT
and 7% of the IoT market, respectively. Other domains such and big data have a two way relationship. On one hand, IoT
Mehdi Mohammadi and Ala Al-Fuqaha are with the Department of is a main producer of big data, and on the other hand, it is an
Computer Science, Western Michigan University, Kalamazoo, MI 49008 important target for big data analytics to improve the processes
USA (E-mail: {mehdi.mohammadi,ala.al-fuqaha}@wmich.edu.). Sameh and services of IoT [4]. Moreover, IoT big data analytics
Sorour (E-mail: samehsorour@uidaho.edu) and Mohsen Guizani (E-mail:
mguizani@ieee.org) are with the Department of Electrical and Computer have proven to bring value to the society. For example, it is
Engineering, University of Idaho, Moscow, ID 83844 USA. reported that, by detecting damaged pipes and fixing them, the
2

Department of Park Management in Miami has saved about close to or at the source of data to remove unnecessary and
one million USD on their water bills [5]. prohibitive communication delays.
IoT data are different than the general big data. To better
understand the requirements for IoT data analytics, we need to A. Survey Scope
explore the properties of IoT data and how they are different DL models in general bring two important improvements
from those of general big data. IoT data exhibits the following over the traditional machine learning approaches in the two
characteristics [5]: phases of training and prediction. First, they reduce the need
• Large-Scale Streaming Data: A myriad of data capturing for hand crafted and engineered feature sets to be used for
devices are distributed and deployed for IoT applications, the training. Consequently, some features that might not be
and generate streams of data continuously. This leads to apparent to a human view can be extracted easily by DL
a huge volume of continuous data. models. In addition, DL models improve the accuracy of the
• Heterogeneity: Various IoT data acquisition devices prediction model using large amount of data.
gather different information resulting in data heterogene- In this paper, we review a wide range of deep neural network
ity. (DNN) architectures and explore the IoT applications that
• Time and space correlation: In most of IoT applications, have benefited from DL algorithms. The paper identifies five
sensor devices are attached to a specific location, and thus main foundational IoT services that can be used in different
have a location and time-stamp for each of the data items. vertical domains beyond the specific services in each domain.
• High noise data: Due to tiny pieces of data in IoT It will also discuss the characteristics of IoT applications and
applications, many of such data may be subject to errors the guide to matching them with the most appropriate DL
and noise during acquisition and transmission. model. This survey focuses on the confluence of two emerging
Although obtaining hidden knowledge and information out technologies, one in communication networks, i.e., IoT and the
of big data is promising to enhance the quality of our lives, it other in artificial intelligence, i.e., DL, detailing their potential
is not an easy and straightforward task. For such a complex applications and open issues. The survey does not cover
and challenging task that goes beyond the capabilities of the traditional machine learning algorithms for IoT data analytics
traditional inference and learning approaches, new technolo- as there are some other attempts, mentioned in section I-B,
gies, algorithms, and infrastructures are needed [6]. Luckily, that have covered such approaches. Moreover, this survey also
the recent progresses in both fast computing and advanced does not go into the details of the IoT infrastructure from a
machine learning techniques are opening the doors for big communications and networking perspective.
data analytics and knowledge extraction that is suitable for
IoT applications. B. Related Work
Beyond the big data analytics, IoT data calls for another new To the best of our knowledge, there does not exist an article
class of analytics, namely fast and streaming data analytics, to in the literature that is dedicated to surveying the specific
support applications with high-speed data streams and requir- relation between IoT data and DL as well as applications of
ing time-sensitive (i.e., real-time or near real-time) actions. DL methods in IoT. There are few works presenting common
Indeed, applications such as autonomous driving, fire pre- data mining and machine learning methods that have been
diction, driver/elderly posture (and thus consciousness and/or used in IoT environments. The work presented in [9] by Tsai
health condition) recognition demands for fast processing of et al. focused on data mining approaches in IoT. It addressed
incoming data and quick actions to achieve their target. Several different classification, clustering, and frequent pattern mining
researchers have proposed approaches and frameworks for algorithms for the IoT infrastructure and services. However,
fast streaming data analytics that leverage the capabilities of that work did not consider DL approaches, which is the focus
cloud infrastructures and services [7], [8]. However, for the of our survey. Moreover, their focus is mainly on offline data
aforementioned IoT applications among others, we need fast mining, while we also consider learning and mining for both
analytics in smaller scale platforms (i.e., at the system edge) or real-time (i.e., fast) and big data analytics.
even on the IoT devices themselves. For example, autonomous In [10], Perera et al. have reviewed different classes of
cars need to make fast decisions on driving actions such as machine learning approaches (supervised and unsupervised,
lane or speed change. Indeed, this kind of decisions should rules, fuzzy logic, etc.) in the reasoning phase of a context-
be supported by fast analytics of possibly multi-modal data aware computing system, and have discussed the potentials of
streaming from several sources, including the multiple vehicle applying those methods in IoT systems. Nonetheless, they also
sensors (e.g., cameras, radars, lidars, speedometer, left/right did not study the role of DL on the context reasoning.
signals, etc.), communications from other vehicles, and traffic The work in [11] by Alsheikh et al. provides a survey of ma-
entities (e.g., traffic light, traffic signs). In this case, transfer- chine learning methods for wireless sensor networks (WSNs).
ring data to a cloud server for analysis and returning back the In that work, the authors studied machine learning methods in
response is subject to latency that could cause traffic violations the functional aspects of WSNs, such as routing, localization,
or accidents. A more critical scenario would be detecting and clustering, as well as non-functional requirements, such
pedestrians by such vehicles. Accurate recognition should be as security and quality of service. They reviewed several
performed in strict real-time to prevent fatal accidents. These algorithms in supervised, unsupervised, and reinforcement
scenarios imply that fast data analytics for IoT have to be learning approaches. This work focuses on the infrastructure of
3

Fig. 2. Structure of the survey.

WSN (which is one potential infrastructure for implementing applications.


IoT applications), while our work is not dependent on the The rest of this paper is organized as follows. In section II,
sources of data (i.e., IoT infrastructures) and covers a wide we highlight the IoT data characteristics and describe what
range of IoT applications and services. Moreover, the focus IoT big data as well as fast and streaming data are, and
of [11] was on traditional machine learning methods, whereas how they are different from the general big data. Section
this article focuses on advanced and DL techniques. III presents several common and successful architectures of
Finally, Fadlullah et al. [12] addressed DL approaches in DNNs. It also includes a brief description of advancements
network traffic control systems. While this work primarily toward real-time and fast DL architectures as well as state-of-
focuses on the infrastructure of network, it differs from our the-art algorithms that are joint with DL. A succinct review
work that focuses on the usage of DL in IoT applications. of several frameworks and tools with different capabilities
and algorithms that support DNNs is also presented. IoT
C. Contributions applications in different domains (e.g., healthcare, agriculture,
This paper is intended for IoT researchers and developers ITS, etc.) that have used DL will be surveyed in section IV.
who want to build analytics, AI systems, and learning solutions Section V reviews the attempts to bring DNN to the resource
on top of their IoT infrastructure, using the emerging DL constraint devices. Section VI explains the works that investi-
machine learning approaches. The contributions of this paper gated bringing the DNN models to the scale of fog and cloud
can be summarized as follows: computing. Future research direction and open challenges are
• In order to adopt DL approaches in the IoT ecosystems,
presented in section VII. The paper is concluded in Section
we identify the key characteristics and issues of IoT data. VIII with a summary of its main take-away messages. Figure 2
• Compared to some related work in the literature that have
depicts the structure of the paper.
addressed machine learning for IoT, we review the state-
of-the-art DL methods and their applicability in the IoT II. I OT DATA C HARACTERISTICS AND R EQUIREMENTS
domain both for big data and streaming data analytics. FOR A NALYTICS
• We review a wide range of IoT applications that have IoT data can be streamed continuously or accumulated
used DL in their context. We also provide a comparison as a source of big data. Streaming data refers to the data
and a guideline for using different types of DNN in the generated or captured within tiny intervals of time and need
various IoT domains and applications. to be promptly analyzed to extract immediate insights and/or
• We review the recent approaches and technologies for make fast decisions. Big data refers to huge datasets that the
deploying DL on all levels of IoT hierarchy from resource commonly used hardware and software platforms are not able
constrained devices to the fog and the cloud. to store, manage, process, and analyze. These two approaches
• We highlight the challenges and future research directions should be treated differently since their requirements for
for the successful and fruitful merging of DL and IoT analytic response are not the same. Insight from big data
4

analytics can be delivered after several days of data generation, produced by IoT such as text, audio, video, sensory data
but insight from streaming data analytics should be ready in and so on.
a range of few hundreds of milliseconds to few seconds. • Veracity: Veracity refers to the quality, consistency, and
trustworthiness of the data, which in turn leads to accurate
analytics. This property needs special attention to hold
A. IoT fast and streaming data
for IoT applications, especially those with crowd-sensing
Many research attempts suggested streaming data analytics data.
that can be mainly deployed on high-performance computing • Variability: This property refers to the different rates of
systems or cloud platforms. The streaming data analytics on data flow. Depending on the nature of IoT applications,
such frameworks is based on data parallelism and incremental different data generating components may have inconsis-
processing [13]. By data parallelism, a large dataset is parti- tent data flows. Moreover, it is possible for a data source
tioned into several smaller datasets, on which parallel analytics to have different rates of data load based on specific
are performed simultaneously. Incremental processing refers times. For example, a parking service application that
to fetching a small batch of data to be processed quickly in utilizes IoT sensors may have a peak data load in rush
a pipeline of computation tasks. Although these techniques hours.
reduce time latency to return a response from the streaming • Value: Value is the transformation of big data to useful
data analytic framework, they are not the best possible solution information and insights that bring competitive advantage
for time-stringent IoT applications. By bringing streaming data to organizations. A data value highly depends on both the
analytics closer to the source of data (i.e., IoT devices or underlying processes/services and the way that data is
edge devices) the need for data parallelism and incremental treated. For example, a certain application (e.g., medical
processing is less sensible as the size of the data in the vital sign monitoring) may need to capture all sensor data,
source allows it to be processed rapidly. However, bringing while a weather forecast service may need just random
fast analytics on IoT devices introduces its own challenges samples of data from its sensors. As another example, a
such as limitation of computing, storage, and power resources credit card provider may need to keep data for a specific
at the source of data. period of time and discard them thereafter.
Beyond the aforementioned properties, researchers [14] [16]
B. IoT Big data have identified other characteristics such as:
• Big data can be a byproduct or footprint of a digital
IoT is well-known to be one of the major sources of big activity or IoT interplay. The use of Google’s most
data, as it is based on connecting a huge number of smart common search terms to predict seasonal flu is a good
devices to the Internet to report their frequently captured status example of such digital byproduct.
of their environments. Recognizing and extracting meaningful • Big data systems should be horizontally scalable, that
patterns from enormous raw input data is the core utility of is, big data sources should be able to be expanded to
big data analytics as it results in higher levels of insights for multiple datasets. This attribute also leads to the com-
decision-making and trend prediction. Therefore, extracting plexity attribute of big data, which in turn imposes other
these insights and knowledge from the big data is of extreme challenges like transferring and cleansing data.
importance to many businesses, since it enables them to
Performing analytics over continuous data flows are typi-
gain competitive advantages. In social sciences, Hilbert [14]
cally referred to as stream processing or sometimes complex
compares the impact of big data analytics to that of the
event processing (CEP) in the literature. Strohbach et al. [18]
invention of the telescope and microscope for astronomy and
proposed a big data analytics framework for IoT to support
biology, respectively.
the volume and velocity attributes of IoT data analytics. The
Several works have described the general features of big integration of IoT big data and streaming data analytics, an
data from different aspects [14]–[17] in terms of volume, open issue that needs more investigation, has been also studied
velocity, and variety. However, we adopt the general definition as part of that work. However, their proposed framework is
of big data to characterize the IoT big data through the designed to be deployed on cloud infrastructures. Moreover,
following “6V’s” features: their focus is on the data management aspect of the framework
• Volume: Data volume is a determining factor to consider and did not use advanced machine learning models such as
a dataset as big data or traditional massive/ very large DL. Other on-the-shelf products such as Apache Storm are
data. The quantity of generated data using IoT devices is also available for real-time analytics on the cloud. A big gap
much more than before and clearly fits this feature. in this area is the lack of frameworks and algorithms that can
• Velocity: The rate of IoT big data production and pro- be deployed on the fog (i.e., system edge) or even on the IoT
cessing is high enough to support the availability of big devices. When DL comes to play in such cases, a trade-off
data in real-time. This justifies the needs for advanced between the depth and performance of the DNN should be
tools and technologies for analytics to efficiently operate considered.
given this high rate of data production.
• Variety: Generally, big data comes in different forms and III. D EEP L EARNING
types. It may consist of structured, semi-structured, and DL consists of supervised or unsupervised learning tech-
unstructured data. A wide variety of data types may be niques based on many layers of artificial neural networks
5

Fig. 4. A neuron is a unit of artificial neural networks, with several inputs


and trainable weights and bias.

Fig. 3. Google Trend showing more attention toward deep learning in recent
years.

that are able to learn hierarchical representations in deep


architectures. DL architectures consist of multiple processing
layers. Each layer is able to produce non-linear responses
based on the data from its input layer. The functionality of DL
is imitated from the mechanisms of human brain and neurons
for processing of signals.
DL architectures have gained more attention in recent years
compared to the other traditional machine learning approaches.
Such approaches are considered as being shallow-structured
learning architectures versions (i.e., a limited subset) of DL.
Figure 3 shows the searching trend of five popular machine
learning algorithms in Google trends, in which DL is becom-
ing more popular among the others. Although Artificial Neural Fig. 5. The overall mechanism of training of a DL model.
Networks (ANNs) have been introduced in the past decades,
the growing trend for DNNs started in 2006 when G. Hinton
et al. presented the concept of deep belief networks [19]. In the training process, the input layer assigns (usually
Thereafter, the state-of-the-art performance of this technology randomly) weights to the input training data and passes it
has been observed in different fields of AI including image to the next layer. Each subsequent layer also assigns weights
recognition, image retrieval, search engines and information to their input and produces their output, which serves as the
retrieval, and natural language processing. input for the following layer. At the last layer, the final output
representing the model prediction will be produced. A loss
function determines how right or wrong is this prediction by
A. Architectures computing the error rate between the prediction and true value.
In this section, we present a brief overview of several com- The error rate is propagated back across the network to the
mon DL models as well as the most cutting-edge architectures input layer. The network then repeats this training cycle, after
that have been introduced in recent years. Interested readers balancing the weights on each neuron in each cycle, until the
can refer to other literature that surveyed the models and error rate falls below a desired threshold. At this point, the
architectures of DL in more details, such as [20]. Table I sum- DNN is trained and is ready for inference. In Figure 5, the
marizes these models, their attributes, and their characteristics. high level mechanism of training for DL models is illustrated.
A DNN consists of an input layer, several hidden layers, In a broad categorization, DL models fall into three
and an output layer. Each layer includes several units called categories, namely generative, discriminative, and hybrid
neurons. A neuron receives several inputs, performs a weighted models. Though not being a firm boundary, discriminative
summation over its inputs, then the resulting sum goes through models usually provide supervised learning approaches, while
an activation function to produce an output. Each neuron has a generative models are used for unsupervised learning. Hybrid
vector of weights associated to its input size as well as a bias models incorporate the benefits of both discriminative and
that should be optimized during the training process. Figure 4 generative models.
depicts the structure of a neuron.
6

1) Convolutional Neural Networks (CNNs):


For vision-based tasks, DNNs with a dense connection be-
tween layers are hard to train and do not scale well. One
important reason is the translation-invariance property of such
models. They thus do not learn the features that might trans-
form in the image (e.g., rotation of hand in pose detection).
CNNs have solved this problem by supporting translation-
equivariance computations. A CNN receives a 2-D input (e.g.,
an image or speech signal) and extracts high level features
through a series of hidden layers. The hidden layers consist of
convolution layers as well as fully connected layers at the end.
The convolution layer is at the core of a CNN and consists of
a set of learnable parameters, called filters, that have the same
shape as the input’s shape but with smaller dimensions. In the
training process, the filter of each convolutional layer goes
through the whole input volume (e.g., in case of an image, it Fig. 6. Structure of a recurrent neural network.
goes across the width and length of the image) and calculates
an inner product of the input and the filter. This computation
over the whole input leads to a feature map of the filter. used. Due to the existence of cycles on the neurons, we cannot
Another building block of a CNN is the pooling layers, use the original backpropagation here, since it works based on
which operate on the feature maps. The objective of having error derivation with respect to the weight in their upper layer,
pooling layers is to reduce the spatial size of the representa- while we do not have a stacked layer model in RNNs. The core
tion, in order to both cut down the number of parameters and of BPTT algorithm is a technique called unrolling the RNN,
computation times and to reduce the chance of overfitting. Max such that we come up with a feed-forward network over time
pooling is a common approach that partitions the input space spans. Figure 6 depicts the structure of an RNN and unrolled
into non-overlapping regions and picks the maximum value concept.
for each region. Traditional RNNs can be considered as deep models since
The last important component in CNN is the Rectified they can be seen as several non-linear layers of neurons
Linear Units (ReLU), which consist of neurons with activation between the input layer and the output layer when they are
function in the form of f (x) = max(0, x). The introduction unfolded in time [22]. However, considering the architecture
of this activation function in CNN results in a faster training and the functionality of RNNs, the hidden layers in RNNs
time without affecting the generalization of the network in a are supposed to provide a memory instead of a hierarchical
sensible negative way [21]. representation of features [23]. There are several approaches
A main difference between CNNs and fully connected to make RNNs deeper, including adding more layers between
networks is that each neuron in CNNs is connected only the input and hidden layers, stacking more hidden layers, and
to a small subset of the input. This decreases the total adding more layers between hidden layers and the output
number of parameters in the network and enhances the time layer [22].
complexity of the training process. This property is called
local connectivity. 3) Long Short Term Memory (LSTM):
LSTM is an extension of RNNs. Different variations of LSTM
2) Recurrent Neural Networks (RNNs): have been proposed, though most of them have followed the
In many tasks, prediction is dependent on several previ- same design of the original network [24]. LSTM uses the
ous samples such that, in addition to classifying individual concept of gates for its units, each computing a value between
samples, we also need to analyze the sequences of inputs. 0 and 1 based on their input. In addition to a feedback loop
In such applications, a feed-forward neural network is not to store the information, each neuron in LSTM (also called a
applicable since it assumes no dependency between input and memory cell) has a multiplicative forget gate, read gate, and
output layers. RNNs have been developed to address this issue write gate. These gates are introduced to control the access
in sequential (e.g., speech or text) or time-series problems to memory cells and to prevent them from perturbation by
(sensor’s data) with various length. The input to an RNN irrelevant inputs. When the forget gate is active, the neuron
consists of both the current sample and the previous observed writes its data into itself. When the forget gate is turned off
sample. In other words, the output of an RNN at time step t−1 by sending a 0, the neuron forgets its last content. When the
affects the output at time step t. Each neuron is equipped with write gate is set to 1, other connected neurons can write to that
a feedback loop that returns the current output as an input for neuron. If the read gate is set to 1, the connected neurons can
the next step. This structure can be expresses such that each read the content of the neuron. Figure 7 depicts this structure.
neuron in an RNN has an internal memory that keeps the An important difference of LSTMs compared to RNNs
information of the computations from the previous input. is that LSTM units utilize forget gates to actively control
To train the network, an extension of the backpropagation the cell states and ensure they do not degrade. The gates
algorithm, called Backpropagation Through Time (BPTT), is can use sigmoid or tanh as their activation function. In fact,
7

Fig. 7. Structure of a LSTM memory cell. Solid arrow lines show the flow
of data and dashed arrow lines show the signals coming from gates.
Fig. 8. Structure of an autoencoder network.

these activation functions cause the problem of vanishing


gradient during backpropagation in the training phase of
other models using them. By learning what data to remember
in LSTMs, stored computations in the memory cells are not
distorted over time. BPTT is a common method for training
the network to minimize the error.

4) Autoencoders (AEs):
AEs consist of an input layer and an output layer that are
connected through one or more hidden layers. AEs have the
same number of input and output units. This network aims to
reconstruct the input by transforming inputs into outputs with
the simplest possible way, such that it does not distort the
input very much. This kind of neural networks has been used
mainly for solving unsupervised learning problems as well as
transfer learning [25].
AEs have two main components: An encoder and a Fig. 9. Structure of a variational autoencoder network.
decoder. The encoder receives the input and transforms it
to a new representation, which is usually called a code or
latent variable. The decoder receives the generated code form distribution qφ (z|x) helps the encoder in estimating
at the encoder, and transforms it to a reconstruction of the posterior distribution pθ (z|x). The model consists of two
the original input. The training procedure in AEs involves networks: One generating samples and the other performing
minimizing reconstruction error, i.e., the output and input approximate inference. A schematic of the VAE is depicted
showing minimal difference. Figure 8 illustrates the structure in Figure 9.
of a typical AE. There are several variations and extensions
of AEs like denoising AE, contractive AE, stacked AE, sparse 6) Generative Adversarial Networks (GANs):
AE, and variational AE. GANs, introduced by Goodfellow et al. [28], consist of two
neural networks, namely the generative and discriminative
5) Variational Autoencoders (VAEs): networks, which work together to produce synthetic and high-
VAEs, introduced in 2013, are a popular generative model quality data. The former network (a.k.a. the generator) is
framework whose assumptions on the structure of the data in charge of generating new data after it learns the data
is not strong, while having a fast training process through distribution from a training dataset. The latter network (a.k.a.
backpropagation [26]. Moreover, this model has been used the discriminator) performs discrimination between real data
for semi-supervised learning [27]. Therefore, it is a good (coming from training data) and fake input data (coming from
fit for IoT solutions that present variety of data as well as the generator). The generative network is optimized to produce
scarcity of labeled data. For each data point x, there is a input data that is deceiving for the discriminator (i.e., data that
vector of corresponding latent variables denoted by z. The the discriminator cannot easily distinguish whether it is fake
training architecture of a VAE consists of an encoder and or real). In other words, the generative network is competing
a decoder with parameters φ and θ, respectively. A fixed with an adversary discriminative network. Figure 10 depicts
8

Fig. 10. Concept of a generative adversarial network.

Fig. 11. Structure of a restricted Boltzmann machine. The visible and hidden
layers have separate bias.
the concept of GANs.
The objective function in GANs is based on minimax
games, such that one network tries to maximize the value
function and the other network wants to minimize it. In
each step of this imaginary game, the generator, willing to
fool the discriminator, plays by producing a sample data
from random noise. On the other hand, the discriminator
receives several real data examples from the training set
along with the samples from the generator. Its task is then
to discriminate real and fake data. The discriminator is
considered to perform satisfactorily if its classifications
are correct. The generator also is performing well if its
examples have fooled the discriminator. Both discriminator
and generator parameters then are updated to be ready for
the next round of the game. The discriminator’s output helps
the generator to optimize its generated data for the next round.

7) Restricted Boltzmann Machine (RBMs): Fig. 12. Structure of a deep belief network. The dash arrows show the feature
An RBM is a stochastic ANN that consists of two layers: extraction path and solid arrows show the generative path.
A visible layer that contains the input that we know, and a
hidden layer that contains the latent variables. The restriction
in RBMs is applied to the connectivity of neurons compared 8) Deep Belief Network (DBNs):
to Boltzmann machine. RBMs should build a bipartite graph, DBNs are a type of generative ANNs that consist of a
such that each visible neuron should be connected to all hidden visible layer (corresponding to the inputs) and several hidden
neurons and vice versa, but there is no connection between any layers (corresponding to latent variables). They can extract
two units in a same layer. Moreover, the bias unit is connected hierarchical representation of the training data as well as
to all of the visible and hidden neurons. RBMs can be stacked reconstruct their input data. By adding a classifier layer like
to form DNNs. They are also the building block of deep belief softmax, it can be used for prediction tasks.
networks. The training of a DBN is performed layer by layer, such
The training data is assigned to visible units. The training that each layer is treated as an RBM trained on top of the
procedure can use backpropagation and gradient descent previous trained layer. This mechanism makes a DBN an
algorithms to optimize the weights of the network. The efficient and fast algorithm in DL [29]. For a given hidden
objective of training RBM is to maximize the product of all layer in DBN, the hidden layer of previous RBM acts as the
probabilities of the visible units. The functionality of RBM input layer. Figure 12 shows the structure of a typical DBN.
is similar to the AEs as it uses forward feeding to compute
the latent variables, which are in turn used to reconstruct the 9) Ladder Networks:
input using backward feeding. The structure of an RBM is Ladder networks were proposed in 2015 by Valpola et al. [30]
shown in Figure 11. to support unsupervised learning. Later, they were extended to
work in semi-supervised settings [31] and have shown state-
9

For convolutional networks, the architecture proposed by


Ren et al., called Faster R-CNN [35] (based on Fast R-
CNN [36]), aims to detect objects in images in real-time.
Object detection in images needs more computations and
hence consumes more energy compared to the image classifi-
cation tasks, since the system has a large number of potential
object suggestions that need to be evaluated. The proposed
architecture is based on applying region proposal algorithms
in full CNNs that perform object bounds prediction and
objectness score computation at each position at the same time.
Their evaluation of the proposed object detection architecture
indicates that the run time of the system is between 5-17
frames per second (fps). Mao et al. [37] also used Fast R-
CNN for embedded platforms reporting a run time of 1.85 fps
in embedded CPU+GPU platform, which have been shown
to be energy-efficient with a close-to-real-time performance.
However, for image processing tasks, we can consider an
approach to be truly real-time when it can process and analyze
Fig. 13. Ladder network structure with two layers.
30 fps or better. Redmon et al. [38] developed YOLO that has
reached the performance of 45 fps, and even a smaller version
of it, Fast YOLO, achieving 155 fps, which are suitable for
of-the-art performance for several tasks, such as handwritten smart cameras.
digits recognition and image classification. The architecture of
a ladder network consists of two encoders and one decoder.
The encoders act as the supervised part of the network and the
C. Joint DL with Other Approaches
decoder performs unsupervised learning. One of the encoders,
called clean encoder, produces the normal computations while DL architectures also have been used jointly in other
the other encoder, called corrupted encoder, adds Gaussian machine learning approaches to make them more efficient.
noise to all layers. The nonlinear function approximation of DL models that
Using a denoising function, the decoder can reconstruct the can support thousands or even billions of parameters is
representations at each layer given the corresponding corrupted a strong motivation to use this method in other machine
data. The difference between the reconstructed and clean data learning approaches in need of such functions. Moreover,
at each layer is used for computing the denoising cost of the automatic feature extraction in deep models is another
that layer. In the encoder side, the cost function uses the motivating reason to exploit these models jointly with
difference between the corrupted output of encoder layers other approaches. In the following subsections, a summary of
and the corresponding clean outputs. The training objective such approaches that are suitable for IoT scenarios is provided.
is to minimize the sum of cost in the supervised part and
unsupervised network. Figure 13 shows the structure of a 1) Deep Reinforcement Learning:
ladder network. Deep Reinforcement Learning (DRL) [39] is a combination
of reinforcement learning (RL) with DNNs. It aims to create
software agents that can learn by themselves to establish
B. Fast and Real-time DL Architectures
successful policies for gaining maximum long-term rewards.
The literature for fast and real-time analytics using DL In this approach, RL finds the best policy of actions over
models over the stream of data are still in their infancy. An the set of states in an environment from a DNN model. The
initial work in this area is done by Liang et al. [32]. It need for a DNN in an RL model becomes evident when the
has extended the extreme learning machine (ELM) networks underlying environment can be represented by a large number
to apply an online sequential learning algorithm to single of states. In such situation, traditional RL is not efficient
hidden layer feed-forward networks. Their framework, called enough. Instead, a DL model can be used to approximate the
OS-ELM, learns the training data one-by-one as well as action values in order to estimate the quality of an action in
chunk-by-chunk, and only newly arrived data go through the a given state. Systems that use DRL in their context are in
training process. This architecture is the base for the real- their infancy, but already have showed very promising results.
time manufacturing execution system that is proposed in [33]. In the field of IoT, the work presented in [40] uses DRL in
In this work, OS-ELM has been used for shop floor object a semi-supervised setting for localization in smart campus
localization using RFID technology. Zou et al. [34] have environments. Figure 14 shows a sample result of such
also reported using this architecture for an indoor localization method when a DNN model helps for gaining more rewards
algorithm based on WiFi fingerprinting, in which the OS-ELM in a semi-supervised setting (left sub-figure in Figure 14)
model can bear well the dynamic environmental changes while and its reward interpretation to the accuracy (right sub-figure).
still showing good accuracy.
10

TABLE I
S UMMARY OF DEEP LEARNING MODELS .

Model Category Learning model Typical input data Characteristics


• Suitable for feature extraction,
dimensionality reduction
AE Generative Unsupervised Various • Same number of input and output units
• The output reconstructs input data
• Works with unlabeled data
• Processes sequences of data
through internal memory
RNN Discriminative Supervised Serial, time-series
• Useful in IoT applications with
time-dependent data
• Suitable for feature extraction,
Unsupervised,
RBM Generative Various dimensionality reduction, and classification
Supervised
• Expensive training procedure
• Suitable for hierarchical features
Unsupervised,
DBN Generative Various discovery
Supervised
• Greedy training of the network layer by layer
• Good performance whit data of
Serial, time-series,
LSTM Discriminative Supervised long time lag
long time dependent data
• Access to memory cell is protected by gates
• Convolution layers take
biggest part of computations
CNN Discriminative Supervised 2-D (image, sound, etc.)
• Less connection compared to DNNs.
• Needs a large training dataset for visual tasks.
• A class of Auto-encoders
VAE Generative Semi-supervised Various
• Suitable for scarcity of labeled data
• Suitable for noisy data
GAN Hybrid Semi-supervised Various • Composed of two networks: one generator and
one discriminator
• Suitable for noisy data
Ladder Net Hybrid Semi-supervised Various • Composed of three networks: two encoders
and one decoder

Fig. 14. Deep reinforcement learning (supervised and semi-supervised): Obtaining rewards (left) and their corresponding accuracy measurement (right) [40].

2) Transfer Learning with Deep Models: for one platform, the model can be transferred to the other
Transfer learning, which falls in the area of domain adap- platform without re-collecting another set of training data for
tation and multi-task learning, involves the adaptation and the new platform.
improvement of learning in a new domain by transferring
the knowledge representation that has been learned from data
of a related domain [41]. Transfer learning is an interesting DL models are well matched to transfer learning due to their
potential solution for many IoT applications where gathering ability to learn both low-level and abstract representations
training data is not an easy task. For example, considering from input data. Specifically, Stacked denoising AEs [41]
the training of a localization system through Bluetooth Low and other variations of AEs [42] have been shown to perform
Energy (BLE) or WiFi fingerprinting using smart phones, the very well in this area. Transfer learning with DNNs is still
RSSI values at a same time and location for different platforms an ongoing and active research area in AI community, and
(e.g., iOS and Android) vary. If we have a trained model we have not seen reported real-world applications in IoT.
11

3) Online Learning Algorithms joint with DL: library in optimizing the complicated codes that need to be
As the stream of data generated from IoT applications goes run on GPUs. It also allows parallelism on CPUs. Theano
through the cloud platforms for analysis, the role of online uses graph representations for symbolic mathematical expres-
machine learning algorithms becomes more highlighted, as the sions. Through this representation, symbolic differentiation
training model needs to be updated by the incremental volume of mathematical expressions is supported in Theano. Several
of data. This is opposed to what the current technologies wrappers including Pylearn2, Keras, and Lasagne provide
support, which is based on batch learning techniques, where easier programming experience on top of Theano [51].
the whole training data set should be available for training Caffe: Caffe [52] is an open source framework for DL
and, thereafter, the trained model cannot evolve by new algorithms and a collection of reference models. It is based
data. Several research works report applying online learning on C++, supports CUDA for GPU computations, and provides
techniques on various DL models, including stacked denoising interfaces for Python and Matlab. Caffe separates model
AEs [43], sum-product networks [44], and RBMs [45]. representation from its implementation. This has been made
possible by defining models by configurations without hard-
D. Frameworks coding them in the source code. Switching between platforms
(e.g., CPU to GPU or mobile devices) is easy by only changing
The rapid growth of interest to use DL architectures in
a flag. Its speed on GPU is reported to be 1 ms/image for
different domains has been supported by introducing several
prediction and 4 ms/image for training.
DL frameworks in recent years. Each framework has its own
Neon: Neon2 is another open source DL framework based
strength based on its supported DL architectures, optimization
on Python with high performance for modern DNNs, such
algorithms, and ease of development and deployment [46].
as AlexNet [21], VGG, and GoogleNet. It supports develop-
Several of these frameworks have been used widely in research
ing several commonly used models, such as CNNs, RNNs,
for efficient training of DNNs. In this section, we review some
LSTMs, and AEs, on both CPUs and GPUs. The list is being
of these frameworks.
extended as they implemented GANs for semi-supervised
H2O: H2O is a machine learning framework that provides
learning using DL models. It also supports easy changing of
interfaces for R, Python, Scala, Java, JSON, and Coffee-
the hardware platform back-ends.
Script/JavaScript [47]. H2O can be run in different modes
Bahrampour et al. in [46] have provided a comparative study
including standalone mode, on Hadoop, or in a Spark Cluster.
for four of the aforementioned tools namely, Caffe, Neon,
In addition to common machine learning algorithms, H2O
Theano and Torch. Although the performance of each tool
includes an implementation of a DL algorithm, which is
varies in different scenarios, Torch and Theano showed the
based on feed-forward neural networks that can be trained
overall best performance in most of the scenarios. Another
by Stochastic Gradient Descent (SGD) with backpropagation.
benchmarking is provided in [53], comparing the running
H2O’s DL AE is based on the standard deep (multi-layer)
performance of Caffe, TensorFlow, Torch, CNTK, and MXNet.
neural net architecture, where the entire network is learned
Table II summarizes and compares different DL frameworks.
together, instead of being stacked layer-by-layer.
Tensorflow: Initially developed for Google Brain project,
Tensorflow is an open source library for machine learning E. Lessons Learned
systems using various kinds of DNNs [48]. It is used by In this section, we reviewed several common DL architec-
many Google products including Google Search, Google Maps tures that can serve in the analytics component of various IoT
and Street View, Google Translate, YouTube and some other applications. Most of these architectures work with various
products. Tensorflow uses graph representations to build neu- types of input data generated by IoT applications. However,
ral network models. Developers can also take advantage of to get better performance for serial or time-series data, RNNs
TensorBoard, which is a package to visualize neural network and their variations are recommended. For the cases where
models and observe the learning process including updating input data is more than one-dimensional, variations of CNNs
parameters. Keras1 also provides a high level of programming work better. Table I summarizes DL architectures.
abstraction for Tensorflow. A few attempts toward making DL architectures fast and
Torch: Torch is an open source framework for machine real-time responsive were also discussed. This avenue needs
learning containing a wide range of DL algorithms for easy more exploration and research to be applicable in many
development of DNN models [49]. It has been developed upon time-sensitive IoT applications. Emerging machine learning
Lua programming language to be light-weight and fast for architectures and techniques that both benefit from DL and
training DL algorithms. It is used by several companies and address the specific IoT application requirements were also
research labs like Google, Facebook, and Twitter. It supports highlighted. Indeed, DRL can support autonomousness of IoT
developing machine learning models for both CPUs and GPUs, applications, transfer learning can fill the gap of lack of
and provides powerful parallelization packages for training training data sets, and online learning matches the need for
DNNs. stream analysis of IoT data.
Theano: Theano is an open source Python-based framework We also reviewed several common and powerful frame-
for efficient machine learning algorithms, which supports works for the development of DL models. For IoT appli-
compiling for CPUs and GPUs [50]. It uses the CUDA cations, training times, run times, and dynamic update of
1 https://keras.io/ 2 http://neon.nervanasys.com
12

TABLE II
P ROPERTIES OF F RAMEWORKS FOR D EVELOPING D EEP L EARNING (A S OF S EPTEMBER 2017).

Core Used in IoT


Frameworks Interface Pros Cons
Language Application
• Limited number of
R, Python,
H2O Java • Wide range of interfaces models are supported [54]
Scala, REST API
• Is not flexible
• Fast on LSTM training • Slower training
Python, Java,
Tensorflow C++ • Support to visualize compared to other [55]
C, C++, Go
networks Python-based frameworks
• Supports various models
Theano Python Python • Fast on LSTM training • Many low level APIs [56]
on GPU
• Supports various models
• Good documentation
Torch Lua C, C++ • Learning a new language [55] [57]
• Helpful error debugging
messages
• Provides a collection of
reference models
Python, • Not very good for
Caffe C++ • Easy platform switching [58]–[60]
Matlab recurrent networks
• Very good at convolutional
networks
• Fast training time
• Easy platform switching • Not supporting CPU
Neon Python Python [61]
• Supports modern multi-threading
architectures like GAN
• Supports modern
architectures
• Slower forward computation
Chainer [62] Python Python • Easier to implement [63]
in some scenarios
complex architectures
• Dynamic change of model
• Distributed training
• Imports models from
Python, Scala, major frameworks • Longer training time
Deeplearning4j Java [64], [65]
Clojure (e.g., TensorFlow, Caffe, compared to other tools
Torch and Theano)
• Visualization tools

the trained models are determining factors for a reliable and We identify several kinds of these services as foundational
efficient analytic module. Most of the current frameworks services on which other IoT applications can be built. The
follow the pattern of “define-and-run” instead of “define- common property of these services is that they should be
by-run” [62]. The former does not allow dynamic updates treated in a fast analytic mode instead of piling their data
of the model while the latter supports such modifications. for later analytics. Indeed, each domain may have specific
Chainer [62] is a framework that follows the latter pattern services beyond these foundational services. Figure 15 shows
and can handle dynamic changes of the model. the foundational services and the IoT applications on top of
them.
IV. DL A PPLICATIONS IN I OT In the following subsections, we first review foundational
DL methods have been shown promising with state-of-the- services of IoT that use DL as their intelligence engine,
art results in several areas, such as signal processing, natural then highlight the IoT applications and domains where a
language processing, and image recognition. The trend is combination of foundational services as well as specific ones
going up in IoT verticals. Some neural network models work may be utilized.
better in special domains. For example, convolutional networks
provide better performance in applications related to vision, A. Foundational Services
while AEs perform very well with anomaly detection, data 1) Image Recognition:
denoising, and dimensionality reduction for data visualization. A large portion of IoT applications addresses scenarios in
It is important to make this link between the kind of neural which the input data for DL is in the form of videos or images.
network model that best fits each of the different application Ubiquitous mobile devices equipped with high resolution
domains. cameras facilitate generating images and videos by everyone,
In this section, we review successful applications of DL everywhere. Moreover, intelligent video cameras are used in
in IoT domains. Based on our observation, many IoT related many places like smart homes, campuses, and manufacturers
applications utilize vision and image classification (like traffic for different applications. Image recognition/classification and
sign recognition, or plant disease detection that we will discuss object detection are among the fundamental usages of such
in Section IV-B) as their base intelligent service. There are devices.
other services, such as human pose detection, which are uti- One issue with the IoT-related systems that have addressed
lized for smart home applications or intelligent car assistance. image recognition is the use of specific source datasets
13

Fig. 15. IoT applications and the foundational services.

for evaluation of their performance. Most of these systems network is a high likelihood, then the third network, having
employ the available common image datasets such as the highest energy consumption, is triggered to run to identify
MNIST, VGG, etc. Though being good for comparison with individual words.
other approaches, those datasets do not show the specific
characteristics of IoT systems. For example, the input for the 3) Indoor Localization:
task of vehicle detection in smart cars would not be always Providing location aware services, such as indoor navigation
a clear image, and there are cases where the input image is and location aware marketing in retailers, are becoming preva-
at night, or in a rainy or foggy weather. These cases are not lent in indoor environments. Indoor localization may also have
handled through the available datasets and hence the models applications in other sectors of IoT, such as in smart homes,
trained based on these datasets are not comprehensive enough. smart campuses, or hospitals. The input data generated from
such applications usually comes from different technologies,
2) Speech/Voice Recognition: such as vision, visible light communication (VLC), infrared,
With the massive proliferation of smart mobile devices and ultrasound, WiFi, RFID, ultrawide band, and Bluetooth. For
wearables, automatic speech recognition is becoming a more the approaches based on WiFi or Bluetooth, most of the
natural and convenient way for people to interact with their literature have used mobile phones for receiving signals from
devices [66]. Also, the small size of mobile devices and the fixed transmitters (i.e., access points or iBeacons), which
wearables nowadays lower the possibility of having touch are called fingerprints. Among these fingerprinting approaches,
screens and keyboards as a means of input and interaction several attempts reported the use of DL models to predict the
with these devices. However, the main concern for providing location [68]–[70].
speech/voice recognition functionality on resource-constrained DL has been used successfully to locate indoor positions
devices is its energy-intensiveness, especially when the data with high accuracy. In a system called DeepFi [68], a DL
is processed through neural networks. In a typical speech method over fingerprinting WiFi channel state information
recognition neural network model, voice data is represented as data has been utilized to identify user positions. This system
the raw input to the network. The data is processed through consists of offline training and online localization phases.
the hidden layers, and the likelihood of the voice data to a In the offline training phase, DL is exploited to train all
particular speech sound is presented at the output layer. the weights based on the previously stored channel state
Price et al. [67] have reported that they have built a special- information fingerprints. Other works [69], [70] report using
purpose low-power DL chip for automatic speech recognition. variations of DL models in conjunction with other learning
The new specialized chip consumes a tiny amount of energy methods to extract features and estimate positions. These
between 0.2 and 10 milliwatts, 100 times lesser than the experiments assert that the number of hidden layers and units
energy consumption for running a speech recognition tool in in DL models has a direct effect on the localization accuracy.
current mobile phones. In the new chip, DNNs for speech In [71], a CNN is used for indoor localization by fusion of
recognition have been implemented. For the sake of energy both magnetic and visual sensing data. Moreover, a CNN has
saving, three levels of voice activity recognition are designed been trained in [72] to determine the indoor positions of users
with three separate neural networks, each of which having a by analyzing an image from their surrounding scene.
different level of complexity. A lowest complexity network, Lu et al. have also used LSTM networks for localizing
thus consuming the lowest amount of energy, detects voice soccer robots [73]. In this application, data collected from
activity by monitoring the noise in the environment. If this several sensors, namely Inertia Navigation System (INS) and
network identifies a voice, the chip runs the next complexity vision perceptions, are analyzed to predict the position of the
level recognition network whose task is acoustic modeling to robot. The authors reported improved accuracy and efficiency
identify if the voice looks like speech. If the output of this compared to two baseline methods, namely standard Extended
14

Kalman Filtering (EKF) and the static Particle Filtering. Indeed, the validity of the functionality of the systems depends
on protecting their machine learning tools and processes from
4) Physiological and Psychological State Detection: attackers.
IoT combined with DL techniques have been also employed False Data Injection (FDI) is a common type of attack on
to detect various physiological and psychological states of data-driven systems. In [83], He et al. proposed a Conditional
humans, such as pose, activity, and emotions. Many IoT DBN to extract FDI attack features from the historical data of
applications incorporate a module for human pose estimation smart grids, and use these features for attack detection in real-
or activity recognition to deliver their services, e.g., smart time. The work in [84] is also related to anomaly detection
homes, smart cars, entertainment (e.g., XBox), education, that may occur in vehicle networks.
rehabilitation and health support, sports, and industrial man- Smart phones as great contributers to IoT data and ap-
ufacturing. For example, convenient applications in smart plications are also under serious threats of hacker attacks.
homes are built based on the analysis of occupant’s poses. The Consequently, protecting these devices from a variety of
cameras transfer the video of the occupant to a DNN to find security issues is necessary for IoT perspectives beyond the
out the pose of the person and take the most appropriate action users’ concerns. Yuan et al. [85] proposed a DL framework to
accordingly. Toshev et al. [74] report a system employing a identify malwares in Android apps. Their architecture is based
CNN model to achieve this goal. This sort of services can also on a DBN by which they reported accuracy of 96.5% to detect
be used in education to monitor the attention of students, and malware apps.
in retail stores to predict the shopping behavior of customers The security and privacy preservation of deep machine
[75]. learning approaches are the most important factors for the
Ordonez et al. [76] have proposed a DL framework that acceptance of using these methods in IoT sectors. Shokri et
combines the strength of CNN and LSTM neural networks al. [86] proposed a method to address the privacy preservation
for human activity recognition from wearable sensor data issues in DL models when they are subject to distributed
(accelerometer, gyroscope, etc.). Their model consists of four learning. Their approach was able to preserve both the privacy
convolutional layers with rectified linear units (ReLUs) fol- of participants’ training data and the accuracy of the models at
lowed by two LSTM layers and a softmax layer. They showed the same time. The core of their approach is based on the fact
that this combination outperformed a baseline model that is that stochastic gradient descent optimization algorithms, used
just based on convolutional layers by 4% on average. The in many DL architectures, can be performed in a parallel and
work of Tao et al. [77] also used LSTM architecture for human asynchronous way. Individual participants can thus indepen-
activity recognition based on mobile phone sensor’s data. Li dently train the model on their own data and share a portion
et al. [78] also report the usage of raw data from passive of their model parameters with other participants. Abadi et
FRID tags for detecting medical activities in a trauma room al. [87] also proposed a method for privacy guarantee in DL
(e.g., blood pressure measurement, mouth exam, cardiac lead models using differentially private stochastic gradient descent
placement, etc.) based on a deep CNN. algorithm.
In [79], a combined model of CNN and RNN was pro-
posed for gesture recognition in video frames. This model
showed better results compared to the models without such B. Applications
combination, and asserted the importance of the recurrence 1) Smart Homes:
component for such task. In [80], Fragkiadaki et al. proposed The concept of smart homes involve a broad range of applica-
a DNN model called Encoder-Recurrent-Decoder (ERD) for tions based on IoT, which can contribute to enhancing homes’
human body pose recognition and motion prediction in videos energy usage and efficiency, as well as the convenience,
and motion capture data sets. The proposed model consisted productivity, and life-quality of their occupants. Nowadays,
of an RNN with an encoder before the recurrent layers and a home appliances can connect to the Internet and provide
decoder after them. This architecture was shown to outperform intelligent services. For example, Microsoft and Liebherr in a
Conditional Restricted Boltzmann Machines (CRBMs) for this collaborative project are applying Cortana DL to the informa-
application. tion gathered from inside the refrigerator [88]. These analytics
Beyond the physical movements, emotion estimation of and predictions can help the household to have a better control
humans from video frames has been also investigated in on their home supplies and expenses, and, in conjunction with
[81] using a model that consists of a CNN, DBN, and AE. other external data, can be used for monitoring and predicting
Furthermore, the work in [82] used mobile inertial sensor health trends.
data for motion detection. It confirmed that human motion Over one third of the generated electricity in the U.S. is
patterns can be used as a source of user identification and consumed by the residential sector [89], with HVAC and
authentication. The employed model in this system is a lighting devices consisting the largest source of such con-
combination of convolutional layers and clockwork RNN. sumption in buildings. This demand is expected to grow in
a slower pace by smart management of energy as well as the
5) Security and Privacy: efficiency improvements in appliances. Hence, the ability to
Security and privacy is a major concern in all IoT domains control and improve energy efficiency and predict the future
and applications. Smart homes, ITS, Industry, smart grid, and need is a must for smart home systems. In the smart home
many other sectors consider security as a critical requirement. applications, electricity load prediction are the most common
15

TABLE III
applications that employ different DL networks to figure out T YPICAL I OT-BASED S ERVICES IN S MART C ITY
the task. Manic et al. [89] performed a comparison analysis of
load forecasting for home energy consumption using three DL Service Reference Input data DL architecture
architectures, including LSTM, LSTM Sequence-to-Sequence Crowd density/ [92] GPS/ transition mode LSTM
(S2S) and CNN. Their results show that LSTM S2S predicts transportation Telecommunication
[93] RNN
the future usage better than the other architectures, followed by prediction data/CDR
CNN, and then LSTM. They also compared the same dataset Waste
[58] Garbage images CNN
management
over a conventional ANN, and all of the aforementioned
Parking lot Images of parking
models outperformed the ANN model. [94], [95] CNN
management spaces
Feng et al. [90] report the use of RBMs and DBNs for
fall detection in a home care environment. Normal postures
in such environment are standing, sitting, bending, and lying.
Lying on the floor longer than a threshold is considered as a ID, time, location, and the telecommunication action of the
fallen posture. Their evaluation shows that RBM outperforms user. They built their system based on an RNN model for metro
DBN in terms of classification accuracy. The lack of large stations, and reported more accurate predictions compared to
datasets and performing offline detection are the restrictions nonlinear autoregressive neural network models.
of their method. Waste management and garbage classification is another
related task for smart cities. A straightforward method to
2) Smart City: perform this automation is through vision-based classifications
Smart city services span over several IoT domains, such as using deep CNNs as it has been done in [58].
transportation, energy, agriculture, etc. However, this area is Amato et al. in [94] developed a decentralized system to
more interesting from a machine learning perspective as the identify the occupied and empty spots in parking lots using
heterogeneous data coming from different domains lead to big smart cameras and deep CNNs. The authors deployed a small
data, which can result in high-quality output when analyzed architecture of a CNN on smart cameras, which are equipped
using DL models. with Raspberry Pi 2 model. These embedded devices in smart
Toshiba has recently developed a DL testbed jointly with cameras can thus run the CNN on each device to classify
Dell Technologies, and used this testbed in a Smart Commu- images of individual parking spaces as occupied or empty.
nity Center in Kawasaki, Japan, to evaluate the data collected The cameras then send only the classification output to a
in the Center [91]. The aim of running the testbed is to measure central server. Valipour et al. [95] also developed a system
the effectiveness of using DL architectures in IoT ecosystems, for detecting parking spots using CNN, which has shown
and identify the best practices for service improvement includ- satisfactory results compared to SVM baselines. Table III
ing increasing machines’ availability, optimizing monitoring summarizes the aforementioned attempts.
sensors, and lowering maintenance expenses. The big data that
feeds the testbed were gathered from building management, air 3) Energy:
conditioning and building security. The two way communication between energy consumers and
One important issue for smart city is predicting crowd the smart grid is a source of IoT big data. In this context, smart
movements patterns, and their use in public transportation. meters are in the role of data generation and acquisition for
Song et al. [92] developed a system based on DNN models the fine grained level of energy consumption measurement.
to achieve this goal on a city level. Their system is built upon Energy providers are interested to learn the local energy
a four-layer LSTM neural network to learn from human mo- consumption patterns, predict the needs, and make appropriate
bility data (GPS data) joint with their transportation transition decisions based on real-time analytics. Mocanu et al. in [96]
modes (e.g., stay, walk, bicycle, car, train). They treated the have developed a kind of RBM to identify and predict the
prediction of people’s mobility and transportation mode as two buildings’ energy flexibility in real-time. The advantage of
separated tasks instead of joining all these features together. this model beyond showing good performance and accuracy is
Consequently, their learning system is based on a multi-task that flexibility identification can be performed with flexibility
deep LSTM architecture to jointly learn from the two sets prediction concurrently. In [97], two variations of RBMs are
of features. The choice of LSTM was driven by the spatio- used to forecast energy consumption for short term intervals
temporal nature of human mobility patterns. The authors assert in residential houses. The model includes a Conditional RBM
that their approach based on multi-task deep LSTM achieves (CRBM) and a Factored Conditional RBM (FCRBM). Their
better performance compared to both shallow LSTMs having results indicate that FCRBM performs better that CRBM,
only one single LSTM layer as well as deep LSTMs without RNN and ANN. Moreover, by extending the forecasting hori-
multi-tasking. zon, FCRBM and CRBM show more accurate predictions than
Liang et al. [93] presented a real-time crowd density predic- the RBM and ANN.
tion system in transportation stations that leverages the mobile On the smart grid side, forecasting the power from solar,
phone users’ telecommunication data known as caller detail wind, or other types of natural sustainable sources of energy
record (CDR). CDR data are gathered when a user takes a is an active research field. DL is increasingly used in many
telecommunication action (i.e., call, SMS, MMS, and Internet applications in this domain. For example, in [98], Gensler et
access) on the phone, which usually includes data about user al. investigate the performance of several DL models, such
16

as DBNs, AEs, and LSTMs, as well as multilayer perceptron measure of at least 0.89 in their results with data having
(MLP) for predicting the solar power of 21 photovoltaic illumination changes. To have a faster inference engine, they
plants. For solar power prediction, a main element of the used a CNN with two convolutional layers.
input is a numeric value for weather forecasting in a given Furthermore, self-driving cars use DNNs in performing
time horizon. From their evaluation, the combination of AEs many tasks, such as detecting pedestrians, traffic signs,
and LSTMs (Auto-LSTM) has been shown to produce the obstacles, etc. There are several startups that use DL in their
best results compared to other models, followed by DBN. The self-driving cars to perform different tasks when driving in
reason for obtaining a good prediction score by Auto-LSTM the streets [102].
is that they are able to extract features from raw data, which is
not the case for ANN and MLP. In [63], an online forecasting 5) Healthcare and Wellbeing:
system based on LSTM is proposed to predict the solar flare IoT combined with DL have been also employed in providing
power 24 hours ahead. healthcare and wellbeing solutions for individuals and commu-
nities. For instance, developing solutions based on mobile apps
4) Intelligent Transportation Systems: to accurately measure dietary intakes is a track of research
Data from Intelligent Transportation Systems (ITS) is another that can help control the health and wellbeing of individuals.
source of big data that is becoming ubiquitous every day. Liu et al. in [103] and [59] developed a system to recognize
Ma et al. [56] presented a system of transportation network food images and their relevant information, such as types
analysis based on DL. They have employed RBM and RNN and portion sizes. Their image recognition algorithm is based
architectures as their models in a parallel computing environ- on CNNs that achieved competitive results compared to the
ment, and GPS data from participating taxies as the input of baseline systems.
the models. The accuracy of their system to predict traffic DL for classification and analysis of medical images is a
congestion evolution over one hour of aggregated data is hot topic in the healthcare domain. For example, Pereira et
reported to be as high as 88% which was computed within less al. [104] used the idea of recognizing handwritten images
than 6 minutes. [99] also reported the investigation on short- by CNNs to help identifying Parkinson’s disease in its early
term traffic flow prediction. They used LSTM as their learning stages. Their model learns features from the signals of a smart
model and reported better accuracy for LSTM compared to pen that uses sensors to measure handwritten dynamics during
other methods including SVM, simple feed forward neural the individual’s exam. Muhammad et al. [105] propose a voice
networks, and stacked AEs. For different intervals (15, 30, pathology detection system using IoT and cloud frameworks,
45, and 60 min) LSTM showed the lowest mean absolute in which patients’ voice signals are captured through sensor
percentage error (MAPE) rate. However, for short intervals of devices and are sent to a cloud server for analytics. They
15 minutes, the error rate of SVM is slightly higher than the used an extreme learning machine trained by voice signals
LSTM model. This result can be interpreted by the fast that the to diagnose the pathology. In [106], DL was employed for
small number of data points in short intervals does not make detection of cardiovascular diseases from mammograms. In
stronger discrimination boundaries for the classification task their study, Wang et al. used breast arterial calcification
in the LSTM model compared to the SVM model. In another (BAC) revealed in mammograms as a sign of coronary artery
study [84], ITS data are exposed to an intrusion detection disease. They developed a CNN with twelve layers to identify
system based on DNN to improve the security of in-vehicular the existence of BAC in a patient. Their results show that
network communications. the accuracy of their DL model is as good as the human
ITS also motivate the development of methods for traf- experts. Although this work has been done offline, it shows
fic sign detection and recognition. Applications such as au- the potential of developing or extending mammogram devices
tonomous driving, driver assistance systems, and mobile map- in IoT contexts for online and early detection of such diseases.
ping need such sort of mechanisms to provide reliable services. Researchers also used time series medical data in
Cireşan et al. [100] presented a traffic sign recognition system conjunction with RNN based models for early diagnosis and
based on DNNs of convolutional and max-pooling layers. They prediction of diseases. Lipton et al. [107] investigated the
introduced a multi-column DNN architecture that includes performance of LSTM networks to analyze and recognize
several columns of separate DNNs, and reported increased patterns in multivariate time series of medical measurements
accuracy with this approach. The input is preprocessed by sev- in intensive care units (ICUs). The input data in their system
eral different preprocessors, and a random number of columns consist of sensor data of vital signs as well as lab test results.
receives the preprocessed input to proceed with training. The Their performance results show that an LSTM model trained
final prediction output is the average of all the DNNs’ outputs. on raw time-series data outperforms a MLP network. A
Their results show that this proposed method, achieving a survey of DL in health informatics is provided in [108].
recognition rate of 99.46%, has been able to recognize traffic
signs better than the humans on the task with 0.62% more 6) Agriculture:
accuracy. Producing healthy crops and developing efficient ways of
In order to be applicable in real scenarios, these analytics growing plants is a requirement for a healthy society and
need to be performed in real-time. Lim et al. in [101] proposed sustainable environment. Disease recognition in plants using
a real-time traffic sign detection based on CNN that has been DNNs is a direction that have shown to be a viable solution.
integrated with a general purpose GPU. They reported F1 In a study that is reported by Sladojevic et al. [60], the
17

authors developed a plant disease recognition system based achieve a better performance. Yang et al. [117] proposed a
on the classification of leave images. They have used a deep method for predicting student grades in MOOCs. They use
convolutional network model implemented using the Caffe clickstream data collected from lecture videos when students
framework. In this model, diseased leaves in 13 categories are watching the video and interacting with it. Clickstream
can be identified from the healthy ones with an accuracy of data are fed to a time series DNN that learns from both prior
about 96%. Such recognition model can be exploited as a smart performance and clickstream data. In addition, Piech et al.
mobile applications for farmers to identify the fruit, vegetable, applied RNN and LSTM networks to model the prediction of
or plant disease based on their leaf images captured by their educator answers to exercises and quizzes, based on their past
mobile devices. It can also allow them to select remedies or activities and interactions in MOOCs [118]. Results showed
pesticides in conjunction with complementary data. improvement over Bayesian Knowledge Tracing (BKT) meth-
DL also has been used in remote sensing for land and crop ods, which employ a Hidden Markov Model (HMM) for
detection and classification [109] [110] [111]. The direction updating probabilities of single concepts. Mehmood et al. [54]
established in these works enabled the automated monitoring also used DNNs for a personalized ubiquitous e-teaching and
and management of the agricultural lands in large scales. In e-learning framework, based on IoT technologies, aiming for
most of such works, deep convolutional networks are used the development and delivery of educational content in smart
to learn from images of the land or crops. In [109], it is cities. Their proposed framework is built on top of an IoT
reported that using CNN has yielded an accuracy of 85% in infrastructure (e.g., smart phone sensors, smart watch sensors,
detecting major crops, including wheat, sunflower, soybeans, virtual reality technologies) connecting the users in order to
and maize, while outperforming other approaches such as MLP optimize the teaching and learning processes. They used DNN
and random forest (RF). for human activity recognition to deliver adaptive educational
Furthermore, DL has been reported to be utilized for predic- content to the students.
tion and detection tasks for automatic farming. For example, Classroom occupancy monitoring is another application
[112] has used a DL model based on deep CNNs for obstacle that has been investigated by Conti et al. [119]. In this
detection in agricultural fields, which enables autonomous work, the authors propose two methods for head detection
machines to operate safely in them. The proposed system was and density estimation, both based on CNN architecture for
able to detect a standardized object with an accuracy between counting students in a classroom. The algorithms have been
90.8% to 99.9%, based on the field (e.g., row crops or grass deployed on off-the-shelf embedded mobile ARM platform.
mowing). Their algorithm receives the images that are taken from the
Moreover, fruit detection and finding out the stage of fruit cameras in three classrooms with a rate of three pictures
(raw or ripe) is critical for automated harvesting. In [113], every 10 minutes. They report that the root-mean-square
Sa et al. used a variation of CNN, called Region-based (RMS) error of their algorithms is at most 8.55.
CNN, for image analysis of fruits. The input image of the
system comes in two modes: one containing RGB colors 8) Industry:
and the other is near-infrared. The information of these For the industry sector, IoT and cyber-physical systems (CPS)
images are combined in the model and has achieved detection are the core elements to advance manufacturing technolo-
improvement compared to pixel-based training models. gies toward smart manufacturing (a.k.a Industry 4.0). Pro-
viding high-accuracy intelligent systems is critical in such
7) Education: applications, as it directly leads to increased efficiency and
IoT and DL contribute to the efficiency of education systems, productivity in assembly/product lines, as well as decreased
from kindergarten to higher education. Mobile devices can maintenance expenses and operation costs. Therefore, DL
gather learners’ data and deep analytical methods can be can play a key role in this field. Indeed, a wide range of
used for prediction and interpretation of learners progress and applications in industry (such as visual inspection of product
achievements. Augmented reality technology combined with lines, object detection and tracking, controlling robots, fault
wearables and mobile devices are also potential applications diagnosis, etc.) can benefit from introduction of DL models.
for DL methods in this area to make students motivated, In [55], visual inspection is investigated using CNN ar-
lessons and studies to be interesting, and make educational chitectures including AlexNet and GoogLeNet over different
learning methods to be efficient [114], [115]. Moreover, DL platforms (Caffe, Tensorflow, and Torch). In this work, several
can be used as a personalized recommendation module [116] images of produced vehicles in the assembly line along with
to recommend more relevant content to the educator. The their annotation are submitted to a DL system. It has been
applications of DL in other domains, such as natural language found that the best performance is achieved using Tensorflow
translation and text summarization, would be of help for smart with accuracy of 94%. Moreover, Tensorflow was the fastest
education when it comes to online learning on mobile devices. framework in terms of training time, where the model reached
Furthermore, the advent of Massive Open Online Courses its peak accuracy in a shorter time, followed by Torch and then
(MOOCs) and their popularity among the students has led to Caffe.
generating a huge amount of data from the learners’ behavior Shao et al. [120] used DNNs for feature extraction in a fault
in such courses. MOOCs analysis can help identify struggling diagnosis (also referred as fault detection and classification
students in early sessions of a course, and provide sufficient (FDC)) system for rotating devices. Models using denoising
support and attention from instructors to those students to auto-encoder (DAE) and contractive auto-encoder (CAE) were
18

developed. The learned features from these models were both using DNNs that gets its data through crowd-sourcing, which
refined using a method called locality preserving projection can be enabled by IoT devices. Citizens can report the
(LPP), and fed to a softmax classifier for fault diagnosis. damage through a mobile app to a platform. However, these
Based on their experiments for fault diagnosis of rotor and citizens have no expert knowledge to accurately assess the
locomotive bearing devices, the proposed approach is reported status of road damage, which may lead to uncertain and/or
to outperform CNN and shallow learning methods. wrong assessments. To eliminate these instances, the app
In another study reported in [121], a DBN model was can determine the status of the road damage by analyzing
proposed in conjunction with an IoT deployment and cloud the image of the scene. The analysis is performed by a
platform to support fault detection of defect types in cars’ deep CNN that is trained by citizen reports as well as road
headlight modules in a vehicle manufacturer setting. Their manager inspection results. Since the training phase is out of
results confirmed the superior performance of the DBN model the capability of mobile phones, the DL model is created on
over two baseline methods, using SVM and radial basis a server and trained everyday. An Android application can
function (RBF), in terms of error rate in test dataset. However, then download the latest model from the server upon each
the reported error rate for their training dataset in the DBN launch, and identify the status of road damages reported by
model is comparable to that of the SVM model. images. Evaluations showed a damage classification accuracy
For the problem of fault detection and classification (FDC) of 81.4% in 1 second of analysis on the mobile devices.
in noisy settings, [122] employed stacked denoising AEs
(SdA) to both reduce the noise of sensory data caused by 10) Sport and Entertainment:
mechanical and electrical disturbances, and perform fault Sports analytics have been evolving rapidly during the recent
classification. Their system was applied for fault detection in years and plays an important role to bring a competitive
wafer samples of a photolithography process. Results show advantage for a team or player. Professional sport teams
that SdA leads to 14% more accuracy in noisy situations nowadays have dedicated departments or employees for their
compared to several baseline methods including K-Nearest analytics [127]. Analytics and predictions in this field can
Neighbors and SVM. Yan et al. [123] have also used SdA be used to track the players’ behavior, performance, score
joint with extreme learning machines for anomaly detection capturing, etc. DL is new to this area and only few works
in the behavior of gas turbine combustion system. Based on have used DNNs in different sports.
their results, the use of learned features by SdA leads to a In [128], a DL method has been proposed for making an
better classification accuracy compared to the use of hand intelligent basketball arena. This system makes use of SVM
crafted features in their system. to choose the best camera for real-time broadcasting from
among the available cameras around the court. They also
9) Government: fed basketball energy images to a CNN to capture the shoot
Governments can gain great potential advantages through and scoring clips from the non-scoring ones, hence providing
enhanced and intelligent connectivity that comes from the accurate online score reporting and interesting highlight clips.
convergence of IoT and DL. Indeed, a wide variety of tasks This system was shown to achieve an accuracy of 94.59% in
that pertains to the governments or city authorities require capturing score clips with 45 ms of processing time for each
precise analysis and prediction. For instance, the recognition frame.
and prediction of natural disasters (landslide, hurricane, forest In another work by Wang et al. [129], an RNN has been
fires, etc.) and environmental monitoring is of high importance used for classification of offensive basketball plays in NBA
for governments to take appropriate actions. Optical remote games. The authors used video clips of the games from
sensing images that are fed to a deep AEs network and SportVU3 dataset. This dataset provides videos of the rate
softmax classifiers were proposed by Liu et al. [124] to predict of 25 frames per second to detect players’ unique ID, their
geological landslides. An accuracy of 97.4% was reported location on the court, and the position of the ball. Their model
for the proposed method, thus outperforming SVM and ANN is shown to achieve accuracy of 66% and 80% for top-1 and
models. In another study [125], an LSTM network is used for top-3 accuracies, respectively. Similarly, [130] used an RNN
the prediction of earthquakes. They used the historical data with LSTM units over the same dataset to predict the success
from US Geological Survey website for training. Their system rates of three-point shots, and reported better classification
was shown to achieve an accuracy of 63% and 74% with 1-D accuracy compared to gradient boosted machine (GBM) and
and 2-D input data, respectively. In another study by Liu et generalized linear model (GLM).
al. [61], a CNN architecture is used for detection of extreme Kautz et al. [131] investigated players’ activity recognition
climate events, such as tropical cyclones, atmospheric rivers in volleyball. Wearable sensor data and CNN were employed
and weather fronts. Training data in their system included to achieve this task, and a classification accuracy of 83.2% to
image patterns of climate events. The authors developed their identify players activities was observed.
system in Neon framework and achieved accuracy of 89%- Group activity recognition is another interesting direction
99%. for sport teams. Ibrahim et al. [132] investigated this option
In addition, damage detection in the infrastructures of the in a volleyball team using a hierarchical LSTM model. In this
cities, such as roads, water pipelines, etc., is another area work, a single LSTM model was built to derive the activities
where IoT and DL can provide benefits to governments. In
[126], the problem of road damage detection was addressed 3 http://go.stats.com/sportvu
19

of each player, and a top-level LSTM model was designed


to aggregate the individual models to identify the overall
behavior of the team. A CNN model was utilized to extract
features from video frames, and feed them to the individual
LSTM models. Compared to several baseline models, the
proposed hierarchical model obtained better classification
results.

11) Retail:
Due to the proliferation of mobile devices, online shopping
has increased greatly. A recent shift toward product image
retrieval through visual search techniques was noticed [133].
CNNs have been used for this visual search of clothes and
fashion market, to find items in online shops that are identical
or similar to what you have seen in a movie [134] or in the Fig. 16. The percentage of surveyed papers that have used DL models
street [135].
Moreover, shopping for visually impaired people needs
different fields. However, the utilization of security and privacy
to be made convenient. A combination of IoT technologies,
services is shown to be limited. This is the gap in developing
including smart carts, integrated with DL methods can be a
intelligent IoT applications, where the potential activities of
solution to this problem. In [136], a visual grocery assistance
hackers and attackers are ignored. Also, voice recognition with
system that includes smart glasses, gloves, and shopping carts
DL has not been used widely in IoT applications belonging
was designed to help visually impaired people in shopping.
to several domains, such as smart homes, education, ITS,
This system also used a CNN to detect items in the aisles.
and industry. There are works that use voice recognition with
Moreover, check-out counters in retail stores are usually the
traditional machine learning approaches. Voice recognition has
bottlenecks where people queue up to pay their shoppings. The
shown remarkable advancement with DL. One reason for the
development of smart carts can enable real-time self check-out
few appearance of this technique in IoT applications is the lack
and enhancing such system with prediction capabilities can
of comprehensive training datasets for each domain, as there
offer an item that a customer may need based on his/her past
is a need for large training datasets to train voice recognition
shopping.
DNNs.
Furthermore, recommending items to shoppers is a popular
Foundational services need fast data analytics to be efficient
application of IoT for retails that uses different technologies,
in their context. Despite several works in this direction,
like BLE signals or visual cameras. The latter approach can be
IoT fast data analytics based on DL has many spaces for
done through identifying the shop items or shoppers actions
development of algorithms and architectures.
(e.g., reach to a shelf, retract from a shelf, etc.) [137] and
Table V summarizes the research in each domain, and their
providing a list of related items for the detected action.
DL model. Figure 16 also depicts the frequency of different
To analyze the customer interest in merchandise, Liu et
models that have been used in the different research works.
al. [75] proposed a customer pose and orientation estimation
About 43% of the papers have used CNN in building their
system based on a DNN consisting of a CNN and RNN.
proposed systems while DBN are less used compared to other
The input data comes from surveillance cameras. The CNN
models (about 7%). RNNs and LSTMs together, as time-
network is used to extract image features. The image features
series models, have been used in 30% of the works. The table
and the last predicted orientation features are then fed to an
also emphasizes the great impact of works related to image
RNN to get the output pose and orientation.
recognition on IoT applications. Moreover, one third of the
IoT applications are related to time-series or serial data, in
C. Lessons Learned which employing RNNs is a helpful approach.
In this section, we have identified five classes of IoT services
as the foundational services that can be used in a wide range V. DL ON I OT D EVICES
of IoT applications. We discussed how DL have been used Prior to the era of IoT, most research on DL targeted
to achieve these services. Moreover, we went through a wide the improvement of its models and algorithms to efficiently
range of IoT domains to find out how they exploit DL to operate when the scale of the problem grows to the big
deliver an intelligent service. Table IV shows the works that data, by trying to deploy efficient models on cloud platforms.
utilized foundational services in IoT domains. The emergence of IoT has then opened up a totally different
Many IoT domains and applications have greatly benefited direction when the scale of the problems shrank down to
from image recognition. The interest is expected to grow faster resource-constrained devices and to the need for real-time
as the high-resolution cameras embedded in smart phones will analytics.
result in easier generation of image and video data. The usage Smart objects need to support some sort of light-weight
of other fundamental applications, especially physiological and intelligence. Due to DL’s successful results in speech and
psychological detections as well as localization, can be seen in video applications, which are among the fundamental services
20

TABLE IV
U SAGE OF F OUNDATIONAL S ERVICES IN I OT D OMAINS .

IoT Foundational Services


Physiological &
Security &
Image Recognition Voice Recognition Psychological Localization
Privacy
Detection
Smart Home [90]
Smart City [58], [94], [95] [92], [93]
Energy [83]
ITS [100], [101] [84]
IoT Domains

Healthcare [59], [103], [104], [106] [105]


Agriculture [60], [109]–[113]
Education [119] [54]
Industry [55], [121] [33]
Government [61], [124], [126]
Sport [128]–[130] [131], [132] [73]
Retail [133]–[136] [75] [137]

TABLE V
T HE COMMON USE OF DIFFERENT DNN MODELS IN I OT DOMAINS .

Usage of DNNs
Domain
AE CNN DBN LSTM RBM RNN
[58]–[60], [94], [95]
[100], [101], [104], [106]
Image Recognition [124] [121] [130] [129], [130]
[109], [112], [119]
[128], [132], [134]
Physiological & [74], [76], [78]
[80], [81] [81] [76], [77] [79], [80], [82]
Psychological Detection [79], [81], [82]
Localization [69], [70] [71], [72] [73] [68]
Privacy and Security [86] [83], [85]
Smart home [89] [90] [89] [90]
Smart city [58], [94], [95] [92] [93]
Energy [98] [98] [98] [63] [96] [97] [97]
ITS [100], [101] [84] [99] [56] [56]
Healthcare [59], [103], [104], [106] [107]
Agriculture [60], [109]–[113]
Education [119] [118] [117], [118]
Industry [120], [122], [123] [55] [121]
Government [124] [61], [126] [125]
Sport [128], [131], [132] [130], [132] [129]
Retail [133]–[136] [137]

and common uses of IoT, adapting its models and approaches A. Methods and Technologies
for deployment on resource-constrained devices became a very
DL models may consist of millions or even billions of
crucial point of study. So far, DL methods can hardly be used
parameters which need sophisticated computing and large
in IoT and resource-constrained devices since they will acquire
storage resources. In this section, we discuss several state-of-
a large portion of resources, such as the processors, battery
the-art approaches that bring DL models to IoT embedded
energy, and memory. In some cases, the available resources are
and resource constrained devices.
even not sufficient for running a DL algorithm [57]. Luckily,
it has been recently shown that many parameters that are
stored in DNNs may be redundant [138]. It is also sometimes 1) Network Compression:
unnecessary to use a large number of hidden layers to get One way of adopting DNNs in resource-constrained devices is
a high accuracy [139]. Consequently, efficiently removing network compression, in which a dense network is converted to
these parameters and/or layers will considerably reduce the a sparse network. This approach helps in reducing the storage
complexity of these DNNs without significant degradation of and computational requirements of DNNs. The main limitation
the output [138], [139]. In the remaining of this section, we of this approach is that they are not general enough to support
will discuss methods and technologies to achieve this results, all kinds of networks. It is only applicable to specific network
and illustrate their applications in different domains. models that can exhibit such sparsity.
Another interesting study to adopt compressed DL models
on IoT devices is the one performed by Lane et al. [57]. In
21

this study, the authors measure different factors that embedded,


mobile, and wearable devices can bear for running DL algo-
rithms. These factors included measurements of the running
time, energy consumption, and memory footprint. The study
focused on investigating the behavior of CNNs and DNNs
in three hardware platforms that are used in IoT, mobile,
and wearable applications, namely Snapdragon 800 used in
some models of smart phones and tablets, Intel Edison used
in wearable and form-factor sensitive IoT, and Nvidia Tegra
K1 employed in smart phones as well as IoT-enabled vehicles.
Torch has been used for developing and training DNNs, and Fig. 17. The overall concept of pruning a DNN.
AlexNet [21] was the dominant model used in these platforms.
Their measurement of energy usage indicated that all the
platforms, including Intel Edison (which is the weakest one), The authors evaluated this approach on four models related to
were able to run the compressed models. In terms of execution vision, namely AlexNet, VGG-16, LeNet-300-100, and LeNet-
time for CNNs, it has been shown that the later convolutional 5. The models were compressed at least 9 times for AlexNet
layers tend to consume less time as their dimensions decrease. and 13 times at VGG-16, while the accuracy of the models
Moreover, it is known that feed-forward layers are much were almost preserved. One limitation of this approach is that
faster than the convolutional layers in CNNs. Consequently, it cannot be used for other types of DNN models. Moreover,
a good approach for improving CNN models on resource- the resulting compressed networks are not efficient enough on
constrained devices is to replace convolutional layers with all hardware platforms and CPU architectures, and thus need
feed-forward layers whenever possible. In addition, choosing new kinds of accelerators that can handle dynamic activation
the employed activation function in DNNs can have a great sparsity and weight sharing. Figure 17 illustrates the concept
effect on time efficiency. For instance, several tests have shown of pruning a DNN.
that ReLU functions are more time-efficient followed by Tanh, In [142], an inference engine, called EIE, was designed
and then Sigmoid. However, the overall runtime reduction of with a special hardware architecture and SRAMs instead
such selection is not significant (less than 7%) compared to the of DRAMs, and was shown to work well with compressed
execution time of layers (at least 25%). In terms of memory network models. In this architecture, customized sparse matrix
usage, CNNs use less storage than DNNs due to the fewer vector multiplication and weight sharing are handled without
stored parameters in convolutional layers compared to their losing the efficiency of the network. The engine consists of
counterpart in DNNs. a scalable array of processing elements (PEs), each of which
As previously stated, reducing the number of employed keeping a part of the network in an SRAM and performing its
parameters in DNNs, by pruning redundant and less-important corresponding computations. Since most of the energy that
ones, is another important approach to make DNNs imple- is used by neural networks is consumed for accessing the
mentable on resource-constrained devices. One of the first memory, the energy usage is reported to be 120 times fewer
works on this approach is Optimal Brain Damage [140] in with this designed accelerator than the energy consumption of
1989. At the core of this method, the second order derivative the corresponding original network.
of the parameters are used to compute the importance of In HashedNets [143], the neural network connection weights
the parameters and prune unimportant parameters from the are randomly grouped into hash buckets using a hash function.
network. The method presented in [141] also works based on All connections that fall into a same bucket are represented by
pruning redundant and unnecessary connections and neurons a single parameter. Backpropagation is used to fine-tune the
as well as using weight sharing mechanisms. Weight sharing parameters during the training. Testing results show that the
replaces each weight with an n bit index from a shared table accuracy of this hash-based compressed model outperforms all
that has 2n possible values. The steps to prune the network as other compression baseline methods.
describe by Han et al. in [141] consist of: The work in [144] by Courbariaux et al. proposed to
binarize network weights and neurons at both the inference
• Train the network to find the connections with high phase and the entire training phase, in order to reduce the
weights. memory footprint and accesses. The network can also perform
• Prune the unimportant connections that have a weight less most of the arithmetic operations through bit-wise operations,
than a threshold. leading to a decreased power consumption. MNIST, CIFAR-
• After pruning, there may remain some neurons with 10 and SVHN datasets were tested over Torch7 and Theano
no input nor output connections. The pruning process frameworks using this approach, and results were found to
identifies these neurons and removes them as well as all be promising.
their remaining connections.
• Retrain the network to fine-tune the weight of the updated 2) Approximate Computing:
model. The weights should be transferred from the pre- Approximate computing is another approach to both
vious trained steps instead of initializing them, otherwise implement machine learning tools on IoT devices and
the performance will degrade to some extent. contribute to the energy saving of their hosting devices [145],
22

TABLE VI
M ETHODS AND T ECHNOLOGIES TO BRING DL ON I OT D EVICES

Method / Technology Reference Pros Cons


• Not general for all DL models
Network Compression [141] [143] [144] • Reduce storage and computation • Need specific hardware
• The pruning process bring overload to training
• Makes fast DL models
Approximate Computing [145], [146] • Not suitable for precise systems
• Save energy
[67] [142] [147] • Integrates DL model with the hardware
Accelerators • Does not work with the traditional hardware platforms
[148] [149] [150] • Efficient computations
• Good for time-critical IoT apps
Tinymote with DL [151] • Energy-efficient • Special-purpose networks
• Provides more security and privacy for data

[146]. The validity of this approach arises from the fact across available processors. This software accelerator can
that, in many IoT applications, machine learning outputs be a complementary solution for the hardware accelerator
(e.g. predictions) need not to be exact, but rather to be in designs.
an acceptable range providing the desired quality. Indeed,
these approaches need to define quality thresholds that 4) Tinymotes:
the output must not pass. Integrating DL models with In addition to all prior solutions, developing tiny size proces-
approximate computing can lead to more efficient DL sors (micromotes) with strong DL capabilities is on the rise
models for resource-constrained devices. Venkataramani et [155] [151]. Designed within the range of one cubic millime-
al. [145] proposed the extension of approximate computing ter, such processors can be operated by batteries, consuming
to neural networks, and converted a neural network to an only about 300 microwatts while performing on-board analysis
approximate neural network. In their approach, the authors using deep network accelerators. By this technology, many
extend backpropagation to identify the neurons with the time-critical IoT applications can perform decision-making
least effect on output accuracy. Then, the approximate NN on the device instead of sending data to high performance
is formed by substituting less important neurons in the computers and waiting for their response. For applications
original network with their approximate counterparts. Making where data security and privacy are the main concerns, this
approximate neurons is performed by an approximate design integration of hardware and DL alleviates these concerns to
technique called precision scaling. Instead of using a typical some extent, as no or only limited data needs to be sent to the
fixed number of bits (16-bit or 32-bit format) to present cloud for analysis. Moons et al. [156] also developed a tiny
computations, various number of bits (4 - 10 bits) are used in processor for CNNs (total active area of 1.2×2 mm2 ) that is
this technique. After forming the approximate network, the power efficient (power consumption is from 25 to 288 mW).
precisions of the inputs and weights of neurons are adjusted
to come up with an optimal trade-off between accuracy and B. Applications
energy. There are also other attempts that have reported
There already exist mobile apps that employ DNNs to
applying approximate computing with precision scaling on
perform their tasks, such as using a CNN to identify garbages
CNNs [146] and DBNs [152].
in images [58]. However, resource consumption on these apps
is still very high. Indeed, [58] reports about 5.6 seconds for
3) Accelerators: returning a prediction response, while consuming 83% of the
Designing specific hardware and circuits is another active CPU and 67 MB of memory.
research direction aiming to optimize the energy efficiency Amato et al. [94] run a CNN on Raspberry Pi boards
[148] and memory footprint [149] of DL models in IoT that were incorporated in smart cameras to find out the
devices. In [153], several approaches for improving the in- empty parking slots. Ravi et al. in [157] have reported the
telligence of IoT devices are identified including designing development of a fitness app for mobile devices that uses DL
accelerators for DNNs, and using Post-CMOS technologies for classifying human activities. The DL model is trained on a
such as spintronics that employs electron spinning mecha- standard machine and then transferred to the mobile platform
nism [154]. This latter technology suggests a direction toward for activity recognition. However, the input of the DL model
the development of hybrid devices that can store data, perform is mixed with several engineered features to improve the ac-
computations and communications within the same material curacy. As the authors describe, the small number of layers in
technology. the DNN models dedicated for a resource-constrained device
The works in [147] and [148] have reported an investigation is a potential reason for them achieving a poor performance.
of developing accelerators for DNNs and CNNs, respectively. In addition, the performance would not be satisfactorily if the
Beyond the hardware accelerators, the work in [150] proposed training data is not well representing the entire ecosystem.
the use of a software accelerator for the inference phase Nguyen et al. [158] proposed a conceptual software-
of DL models on mobile devices. It employs two resource hardware framework to support IoT smart applications. Their
control algorithms at run-time, one that compresses layers framework consists of a cognitive engine and a smart con-
and the other that decomposes deep architecture models nectivity component. The cognitive engine, which provides
23

cognitive functionality to smart objects, utilizes both DL operability and compatibility with existing hardware platforms
algorithms and game-theoretic decision analytics. To be suit- remain as clear challenges.
able for IoT, these algorithms must be deployed on low- Table VI summarizes the methods and technologies utilized
power application-specific processors. The smart connectivity in the recent literature to host DL analytics on IoT devices
component integrates with cognitive radio transceivers and along with their pros and cons.
baseband processors to cater flexible and reliable connections We also reviewed some applications that have implemented
to IoT smart objects. DL on resource constrained devices. Due to the aforemen-
tioned challenges, there are not many well developed applica-
C. Lessons Learned tions in this category. However, by resolving these challenges
and barriers, we will see the rise of many IoT applications
In this section, the need to move toward supporting DL where their core DL model is embedded into the sensors,
on IoT embedded and resource constrained devices were actuators, and IoT smart objects.
discussed. The adversary characteristics of IoT devices and
DL techniques make this direction more challenging since
IoT devices can rarely host DL models due to their resource
VI. F OG AND C LOUD - CENTRIC DL FOR I OT
constraints. To tackle these challenges, several methods were
introduced in the recent literature including:
Cloud computing is considered a promising solution for
• DNN compression
IoT big data analytics. However, it may not be ideal for IoT
• Approximate computing for DL
data with security, legal/policy restrictions (e.g., data should
• Accelerators
not be transferred into cloud centers that are hosted outside
• Tinymotes with DL.
of national territory), or time constraints. On the other hand,
Network compression involves identifying unimportant con- the high-level abstraction of data for some analytics purposes
nections and neurons in a DNN through several rounds of should be acquired by aggregating several sources of IoT
training. While this is a promising approach for getting close data; hence, it is insufficient to deploy analytic solutions on
to real-time analytics on IoT devices, more investigations need individual IoT nodes in these cases.
to be performed to handle several challenges such as: Instead of being only on the cloud, the idea of bringing
• It is not clear whether network compression approaches computing and analytics closer to the end-users/devices has
are suitable for data streaming, especially when the DL been recently proposed under the name of fog computing.
model is dynamic and may evolve over time. Relying on fog-based analytics, we can benefit from the
• The compression methods for time-series architectures, advantages of cloud computing while reducing/avoiding its
such as RNN and LSTM, have not been well investigated, drawbacks, such as network latency and security risks. It has
and there is a gap to see if the existing compression been shown that, by hosting data analytics on fog computing
methods are applicable to these DL architectures. nodes, the overall performance can be improved due to the
• There is a need to specify the trade-off between the avoidance of transmitting large amounts of raw data to distant
rate of compression and accuracy of a DNN, as more cloud nodes [159]. It is also possible to perform real-time
compression leads to degraded accuracy. analytics to some extent since the fog is hosted locally close
More recently, approximate computing approaches have to the source of data. Smart application gateways are the
also been utilized in making DL models simpler and more core elements in this new fog technology, performing some
energy-efficient, in order to operate them on resource con- of the tasks currently done by cloud computing such as data
strained devices. Similar to network compression techniques, aggregation, classification, integration, and interpretation, thus
these methods also take advantage of insignificant neurons. facilitating the use of IoT local computing resources.
However, instead of manipulating the network structure, they The work in [160] proposed an intelligent IoT gateway that
preserve the structure but change the computation representa- supports mechanisms by which the end users can control the
tions through bit-length reduction. For that reason, they seem application protocols in order to optimize the performance.
applicable to a variety of DL architectures and can even cover The intelligent gateway primarily supports the inter-operation
the dynamic evolution of network models during run-time. of different types of both IoT and resource-rich devices,
Keeping a balance between accuracy and energy usage is their causing them to be treated similarly. In the proposed intelligent
common goal. Nonetheless, more works are needed to find out gateway, a lightweight analytic tool is embedded to increase
the superiority of one of these approaches for embedding DL the performance at the application level. Equipping IoT gate-
models in IoT devices. ways and edge nodes with efficient DL algorithms can localize
Moreover, we discussed the emergence of special and small many complex analytical tasks that are currently performed on
form-factor hardware that is designed to efficiently run DL the cloud. Table VII summarizes several products that have
models on embedded and resource constrained devices. These incorporated DL in their intelligent core, and can serve IoT
architectures can be utilized in wearable, mobile, and IoT domains in the fog or cloud.
devices, due to their reduced resource demands and their In the following subsection, we review several state-of-the-
applicability to time-sensitive IoT applications. However, their art enabling technologies that facilitate deep learning on the
generality to support any kind of DNN as well as their inter- fog and cloud platforms.
24

TABLE VII
A. Enabling Technologies and Platforms S OME P RODUCTS THAT USED D EEP L EARNING AND SERVING I OT
Despite introducing DL analytics on fog infrastructure, D OMAINS ON THE F OG OR C LOUD .
cloud computing remains the only viable solution for analytics Product Description Application Platform
in many IoT applications that cannot be handled by fog Intelligent
computing. For example, complex tasks such as video analysis Amazon Alexa personal Smart home Fog
require large and complex models with a lot of computing assistant (IPA)
Microsoft Cortana IPA Smart Car, XBox Fog
resources. Thus, designing scalable and high performance Smart Car,
cloud-centric DNN models and algorithms, which can perform Google Assistant IPA Fog
Smart home
analytics on massive IoT data, is still an important research Cognitive
IBM Watson IoT domains Cloud
framework
area. Coates et al. [161] proposed a large-scale system, based
on a cluster of GPU servers, which can perform the training
of neural networks with 1 billion parameters on 3 machines analytics capabilities (e.g., one node runs CNN models
in few days. The system can be also scaled to train networks for image detection, another node runs RNNs for time-
with 11 billion parameters on 16 machines. series data prediction, etc.). So, the devices need to
Project Adam [162] is another attempt to develop a scalable identify the sources of appropriate analytic providers
and efficient DL model. The system is based on distributed through some sort of extended service discovery protocols
DL, where the computation and communication of the whole for DL analytics.
system are optimized for high scalability and efficiency. The • DL model and task distribution: Partitioning the execution
evaluation of this system using a cluster of 120 machines of DL models and tasks among the fog nodes, and opti-
shows that training a large DNN with 2 billion connection mally distributing of the data stream among the available
achieves two times higher accuracy compared to a baseline nodes are critical for time-sensitive applications [166].
system, while using 30 times fewer machines. Aggregating the final results from the computing nodes
Google’s Tensor Processing Unit (TPU) [163] is a special- and returning the action with the least latency are the
ized co-processor for DNNs in Google’s data centers. It was other side of the coin.
designed in 2015 with the aim to accelerate the inference phase • Design factors: Since fog computing environments are in
of DNNs that are written by TensorFlow framework. From their infancy and are expected to evolve, it is worthwhile
95% of DNN representatives in their data centers, CNNs only to investigate how the design factors of the fog environ-
constitute about 5% of the workload, while MLPs and LSTMs ment (e.g., architectures, resource management, etc.) and
cover the other 90%. Performance evaluation showed that TPU the deployment of DL models in this environment can
outperforms its contemporary GPUs or CPUs on average, by impact the quality of analytic services. Alternatively, it
achieving 15 to 30 times faster operation execution, while would be also interesting to see how far these design
consuming 30 to 80 times fewer energy per TeraOps/second. factors can be tailored/extended to improve the operation
Beyond the infrastructural advancements to host scalable and quality of DL analytics.
DL models on cloud platforms, there is a need for mechanisms • Mobile edge: Through the ubiquity of mobile edge com-
and approaches to make DL models accessible through APIs, puting environments and their contribution to the IoT an-
in order to be easily integrated into IoT applications. This alytics, it is important to consider the dynamicity of such
aspect has not been investigated much, and only a few products environments for designing edge-assisted DL analytics
are available, such as Amazon’s AWS DL AMIs4 , Google since mobile devices may join and leave the system. Also,
cloud ML5 , and IBM Watson6 . This creates opportunities the energy management of mobile edge devices should be
for cloud providers to offer “DL models as a service” as a accurate when analytic tasks are delegated to them.
new sub-category of Software as a Service (SaaS). However,
A few attempts reported the integration of DL on fog
this imposes several challenges for cloud providers, since DL
nodes in the IoT ecosystems. For example, a proof of concept
tasks are computationally intensive and may starve other cloud
for deploying CNN models on fog nodes for machine health
services. Moreover, due to the data thirstiness of DL models,
prognosis was proposed by Qaisar et al. [167]. In their work,
data transfers to the cloud may become a bottleneck. In order
a thorough search among fog nodes is done to find free nodes
to deliver DL analytics on the cloud, Figure 18 presents a
to delegate analytic tasks to. Also, Li et al. [168] proposed a
general stack for DL models as a service. Different providers
system that leverages the collaboration of mobile and edge
may use their customized intelligence stack [164] [165].
devices running CNN models for object recognition.
B. Challenges
When DL analytics come to fog nodes, several challenges C. Lessons Learned
need to be addressed, including: In this section, we highlighted the role of cloud and fog
• DL service discovery: Fog nodes are densely distributed computing and their enabling technologies, platforms and
in geographical regions, and each node may have specific challenges to deliver DL analytics to IoT applications. The
4 https://aws.amazon.com/amazon-ai/amis/ great success of cloud computing in support of DL is backed
5 https://cloud.google.com/products/machine-learning/ by the advancement and employment of optimized proces-
6 https://www.ibm.com/watson/ sors for neural networks as well as scalable distributed DL
25

VII. I OT C HALLENGES FOR D EEP L EARNING , AND


F UTURE D IRECTIONS
In this section we first review several challenges that are
important from the machine learning point of view to imple-
ment and develop IoT analytics. Then we point out research
directions that can fill the existing gaps for IoT analytics based
on DL approaches.

A. Challenges
1) Lack of Large IoT Dataset:
The lack of availability of large real-world datasets for IoT
applications is a main hurdle for incorporating DL models in
IoT, as more data is needed for DL to achieve more accuracy.
Moreover, more data prevents the overfitting of the models.
This shortage is a barrier for deployment and acceptance of
IoT analytics based on DL since the empirical validation and
evaluation of the system should be shown promising in the
Fig. 18. A general stack of DL models as a service in the cloud platforms. natural world. Access to the copyrighted datasets or privacy
considerations are another burdens that are more common in
domains with human data such as healthcare and education.
Also, a portfolio of appropriate datasets would be of a lot
of help for developers and researchers. A general list of
algorithms. Deploying DL models on fog platforms for IoT useful datasets has been compiled in Wikipedia [169]. For the
applications, such as smart homes and smart grids, would convenience of researchers in machine learning applications
draw the attention of the end users due to the ease of in IoT, table VIII presents a collection of common datasets
accessibility and fast response time. Nevertheless, cloud-based suitable to use for DL.
DL analytics would be of great importance for long-term and
complex data analytics that exceed the capabilities of fog 2) Preprocessing:
computing. Some smart city applications, government sector, Preparing raw data in an appropriate representation to be
and nation-wide IoT deployments need to utilize cloud-based fed in DL models is another challenge for IoT applications.
DL infrastructures. Most DL approaches need some sort of preprocessing to
yield good results. For example, image processing techniques
Currently, the integration of DL analytics into IoT appli- by CNNs work better when the input data at the pixel level
cations is limited to RESTful APIs, which are based on the are normalized, scaled into a specific range, or transformed
HTTP protocol. While there exist several other application into a standard representation [21] [39]. For IoT applications,
protocols that are extensively used in IoT applications, such preprocessing is more complex since the system deals with
as Message Queue Telemetry Transport (MQTT), Constrained data from different sources that may have various formats
Application Protocol (CoAP), Extensible Messaging and Pres- and distributions while showing missing data.
ence Protocol (XMPP), and Advanced Message Queuing Pro-
tocol (AMQP), the integration of these protocols with the 3) Secure and Privacy Preserving Deep Learning:
DL analytic interfaces calls for enhancing their compatibility Ensuring data security and privacy is a main concern in
with the aforementioned protocols to eliminate the need for many IoT applications, since IoT big data will be transferred
message conversion proxies, which imposes extra overhead through the Internet for analytics, and can be thus observed
on the analytics response time. around the world. While anonymization is used in many
applications, these techniques can be hacked and re-identified
We identified several challenges related to the deployment as anonymized data. Moreover, DL training models are also
and usage of DL models in support of analytics on fog subject to malicious attacks, such as False Data Injection
nodes. DL service discovery is a necessary requirement due or adversarial sample inputs, by which many functional
to the dense deployment of fog nodes, which makes brute- or non-functional (e.g., availability, reliability, validity,
force search for available services an inefficient approach. trustworthiness, etc.) requirements of the IoT systems may
Currently used service discovery protocols in IoT applications, be in jeopardy. Indeed, DL models learn the features from
such as multicast DNS (mDNS) or DNS Service Discovery the raw data, and can therefore learn from any invalid data
(DNS-SD) [1], need to be extended to support DL service feed to it. In this case, DL models must be enhanced with
discovery (e.g., declare the type of analytics, DL model, input some mechanism to discover abnormal or invalid data. A
shape, etc.). Efficient distribution of DL models and tasks, and data monitoring DL model accompanying the main model
distribution of data streams on fog nodes and the aggregation should work in such scenarios. Papernot et al. [170] have
of the results are other requirements that need to be addressed. investigated the vulnerability of DNNs in adversarial settings
26

TABLE VIII
C OMMON DATA SETS FOR D EEP L EARNING IN I OT.

Dataset Name Domain Provider Notes Address/Link


High-resolution climate
Agriculture,
CGIAR dataset CCAFS datasets for a variety http://www.ccafs-climate.org/
Climate
of fields including agricultural
Recordings of 115 subjects’ http://archive.ics.uci.edu/ml/
Educational
University activities through a logging datasets/Educational+Process+
Process Education
of Genova application while learning Mining+%28EPM%29%3A+
Mining
with an educational simulator A+Learning+Analytics+Data+Set
Energy related data set
Commercial
Energy, from a commercial building
Building IIITD http://combed.github.io/
Smart Building where data is sampled
Energy Dataset
more than once a minute.
Individual
EDF R&D, One-minute sampling rate http://archive.ics.uci.edu/ml/
household Energy,
Clamart, over a period of almost datasets/Individual+household+
electric power Smart home
France 4 years electric+power+consumption
consumption
AMPds contains electricity,
water, and natural gas
Energy,
AMPds dataset S. Makonin measurements at one minute http://ampds.org/
Smart home
intervals for 2 years of
monitoring
Power demand from five
houses. In each house both
UK Domestic
Energy, Kelly and the whole-house mains http://www.doc.ic.ac.uk/
Appliance-Level
Smart Home Knottenbelt power demand as well as ∼dk3810/data/
Electricity
power demand from individual
appliances are recorded.
PhysioBank Archive of over 80 https://physionet.org/physiobank
Healthcare PhysioNet
databases physiological datasets. /database/
A collection of voice
Universität http://www.stimmdatenbank.
Saarbruecken recordings from more than
Healthcare des coli.uni-saarland.de/
Voice Database 2000 persons for pathological
Saarlandes help en.php4
voice detection.
An RGB-D dataset and
CMP at
evaluation methodology for
Czech
T-LESS Industry detection and 6D pose http://cmp.felk.cvut.cz/t-less/
Technical
estimation of texture-less
University
objects
CityPulse
CityPulse Dataset Road Traffic Data, Pollution http://iot.ee.surrey.ac.uk:8080
Smart City EU FP7
Collection Data, Weather, Parking /datasets.html
project
Open Data Weather, Air quality,
Telecom
Institute - node Smart City Electricity, http://theodi.fbk.eu/openbigdata/
Italia
Trento Telecommunication
A broad range of categories
City of http://datosabiertos.malaga.eu
Málaga datasets Smart City such as energy, ITS,
Malaga /dataset
weather, Industry, Sport, etc.
Recordings of 8 gas sensors
Gas sensors for Univ. of http://archive.ics.uci.edu/ml
under three conditions
home activity Smart home California /datasets/Gas+sensors+for+
including background, wine
monitoring San Diego home+activity+monitoring
and banana presentations.
Several public datasets related
CASAS datasets Washington to Activities of Daily Living
http://ailab.wsu.edu/casas/
for activities of Smart home State (ADL) performance in a two-
datasets.html
daily living University story home, an apartment,
and an office settings.
Human activity recognition
ARAS Human Bogazici datasets collected from two https://www.cmpe.boun.edu.tr
Smart home
Activity Dataset University real houses with multiple /aras/
residents during two months.
Motion sensor data of
Mitsubishi
residual traces from a
Smart home, Electric
MERLSense Data network of over 200 sensors http://www.merl.com/wmd
building Research
for two years, containing
Labs
over 50 million records.
Video of basketball and
SportVU Sport Stats LLC soccer games captured from http://go.stats.com/sportvu
6 cameras.
Includes a wide range of
physical activities (warm up, http://orestibanos.com/
RealDisp Sport O. Banos
cool down and fitness datasets.htm
exercises).
27

TABLE VIII – Continued from previous page.


Dataset Name Domain Provider Notes Address/Link
Prediction Trajectories performed by
http://www.geolink.pt/
Taxi Service Challenge, all the 442 taxis
Transportation ecmlpkdd2015-challenge/
Trajectory ECML running in the city
dataset.html
PKDD 2015 of Porto, in Portugal.
A GPS trajectory by a https://www.microsoft.com
GeoLife GPS
Transportation Microsoft sequence of time-stamped /en-us/download/details.aspx?
Trajectories
points id=52367
https://www.microsoft.com/
T-Drive trajectory Contains a one-week
Transportation Microsoft en-us/research/publication/
data trajectories of 10,357 taxis
t-drive-trajectory-data-sample/
Bus traces from the
Chicago Bus Chicago Transport Authority http://www.ibr.cs.tu-bs.de/
Transportation M. Doering
Traces data for 18 days with a rate users/mdoering/bustraces/
between 20 and 40 seconds.
About 20 million Uber
Uber trip FiveThirty- https://github.com/fivethirtyeight/
Transportation pickups in New York City
data Eight uber-tlc-foil-response
during 12 months.
Three datasets: Korean
daytime, Korean nighttime, https://figshare.com/articles
Traffic Sign
Transportation K. Lim and German daytime /Traffic Sign Recognition
Recognition
traffic signs based on Testsets/4597795
Vienna traffic rules.
End-To-End DAVIS http://sensors.ini.uzh.ch/
DDD17 Transportation J. Binas
Driving Dataset. databases.html

where an adversary tries to provide some inputs that lead to the input data is not coming from a trustworthy source. Data
an incorrect output classification and hence corrupting the validation and trustworthiness should be checked at each level
integrity of the classification. Developing further techniques of big data analytics, especially when we are dealing with
to defend and prevent the effect of this sort of attacks on DL online streams of input data to an analytic engine [171].
models is necessary for reliable IoT applications. Moreover, the variability of IoT big data (variation in the
data flow rates) rises challenges for online analytics. In case
4) Challenges of 6V’s: of immense streams of data, DL techniques, and in particular
Despite the recent advancement in DL for big data, there are the online ones, handle them. Data sampling techniques would
still significant challenges that need to be addressed to mature be beneficial in these scenarios.
this technology. Each characteristic of IoT big data imposes Finally, a main challenge for business managers to adopt
a challenge for DL techniques. In the following we highlight big data is that it is not clear for them how to use big
these challenges. data analytics to get value out of it and improve their
The massive volume of data poses a great challenge for DL, business [172]. Beyond that, the analytic engine may produce
especially for time and structure complexity. The voluminous abstractions that are not important for the stakeholders, or are
number of input data, their broad number of attributes, and not clear enough for them.
their high degree of classification result in a very complex
DL model and affect running time performance. Running DL 5) Deep learning for IoT Devices:
on distributed frameworks or clusters of CPUs with parallel Developing DL on IoT devices poses a new challenge for IoT
processing is a viable solution that has been developed [6]. device designers, to consider the requirements of handling
The high volume of IoT big data also brings another challenge, DNNs in resource-constrained devices. These requirements
namely the noisy and unlabeled data. Even though DL is very are expected to grow as the datasets sizes are growing every
good at tolerating noisy data and learning from unlabeled data, day, and new algorithms arise to be part of the solutions for
it is not clear to what extent DL models can be accurate in DL in IoT.
the presence of such abnormal data.
The variety of IoT data formats that come from various 6) Deep Learning Limitations:
sources pops up the challenge of managing conflicts between Despite showing impressive results in many applications, DL
different data sources. In case of no conflict in data sources, models still have several limitations. Nguyen et al. [173]
DL has the ability to effectively work on heterogeneous data. reported about the false confidence of DDN for predicting
The high velocity of IoT big data, i.e., the high rate of data images that are unrecognizable by humans. By producing
generation, also brings the challenge of high speed processing fooling examples that are totally unrecognizable by humans,
and analysis of data. Online learning is a solution for high the DNN classifies them as familiar objects.
velocity and has been proposed for DNNs. However, more The other limitation is the focus of DL models on clas-
research is needed to augment DL with online learning and sification, while many IoT applications (e.g., electricity load
sequential learning techniques. forecasting, temperature forecasting) need a kind of regression
The veracity of IoT big data also presents challenges for at their analytics core. Few works tried to enrich DNNs with
DL analytics. The IoT big data analytics will not be useful if regression capabilities, such as the work in [174] proposing
28

the ensemble of DBN and Support Vector Regression (SVR)


for regression tasks. However, more investigation is required
to clear many aspects of regression with DL.

B. Future Directions
1) IoT Mobile Data:
One remarkable part of IoT data comes from mobile devices.
Investigating efficient ways to utilize mobile big data in
conjunction with DL approaches is a way to come up with
better services for IoT domains, especially in smart city
scenarios. In [175], the capabilities of DL models in mobile
big data analytics were investigated using a distributed
learning framework that executes an iterative MapReduce Fig. 19. Deep reinforcement learning with only labeled data (supervised)
task on several parallel Spark workers. vs. with labeled and unlabeled data (semisupervised). At each epoch, semi-
supervised model outperforms the supervised model both in terms of total
received rewards and closeness to the target.
2) Integrating Contextual Information:
The environment’s situation cannot be understood by the IoT
sensor data alone. Therefore, IoT data needs to be fused localization experiments. In their experiments, only 15% of
with other sources of data, namely context information that data was labeled but the results were strengthened by utilizing
complement the understanding of the environment [10]. This unlabeled data in the algorithm.
integration can also help for fast data analytics and quick
reasoning due to the bounded search space for the reasoning 5) Dependable and Reliable IoT Analytics:
engine. For example, a smart camera with capability of face As we rely more on CPS and IoT in large scales, the
pose recognition can perform its job in various contexts such need for mechanisms to ensure the safety of the system
as security gates in smart homes or government buildings, or against malicious attacks as well as failures become more
in smart cars for driving assistance. In all these situations, crucial [178]. DL approaches can be applied in these
complementary contextual information (e.g., time within the directions by analyzing the huge amount of log traces of CPS
day, daily habits, etc.) helps the system to reason about the and IoT systems, in order to identify and predict weak points
best action that can be done based on the detected pose of of the system where attacks may occur or the functionality is
the person. defected. This will help the system to prevent or recover from
faults and consequently increase the level of dependability of
3) Online Resource Provisioning for IoT analytics: CPS and IoT systems.
The deployment of fast DL based data analytics on the fog
and cloud would require online provisioning of fog or cloud 6) Self-organizing Communication Networks:
resources to host the stream of data. Due to the streaming With a huge number of IoT devices, the configuration
nature of IoT data, knowing the volume of data sequence in and maintenance of their underlying physical M2M
advance is not feasible. In this regard, we need a new class of communications and networking become harder. Although
algorithms that work based on the current stream of data and the large body of network nodes and their relation is a
do not rely on the prior knowledge of the data stream. A DL challenge for traditional machine learning approaches, it
mechanism and an online auctioning algorithm are proposed opens the opportunity for DL architectures to prove their
in [65] and [176], respectively, to support online provisioning competency in this area by providing a range of self-services
of fog and cloud resources for IoT applications. such as self-configuration, self-optimization, self-healing,
and self-load balancing. Valente et al. [179] have provided
4) Semi-supervised Analytic Frameworks: a survey of traditional machine learning approaches for
Most of the analytic algorithms are supervised, thus needing self-organizing cellular networks.
a large amount of training labeled data that is either not
available or comes at a great expense to prepare. Based on 7) Emerging IoT Applications:
IDC’s report [177], it is estimated that by 2012 only about Unmanned aerial vehicles: The usage of Unmanned aerial
3% of all data in the digital universe has been annotated, vehicles (UAVs) is a promising application that can improve
which implies the poor source of training datasets for DL. service delivery in hard-reaching regions or in critical situa-
A combination of advanced machine learning algorithms tions. UAVs have been also used for many image analysis in
designed for semi-supervised settings fits well for smart real-time such as surveillance tasks, search-and-rescue opera-
cities systems, where a small training dataset can be used tions, and infrastructure inspection [180]. These devices face
while the learning agent improves its accuracy using a several challenges for their adoption, including routing, energy
large amount of unlabeled data [4]. Figure 19 illustrates the saving, avoiding private regions, and obstacle avoidance [181]
role of semi-supervised learning in improving the output etc. DL can be of great impact in this domain for the prediction
accuracy for deep reinforcement learning [40] in indoor and decision-making of tasks to get the best out of UAVs.
29

Moreover, UAVs can be seen as on-the-fly analytics platforms L IST O F ACRONYMS


that potentially can provide temporarily fog computing ana-
AE Auto-encoder
lytic services as well as distributed analytics.
AI Artificial Intelligence
Virtual/Augmented Reality: Virtual/augmented reality is an-
AMQP Advanced Message Queuing Protocol
other application area that can benefit from both IoT and
ANN Artificial Neural Network
DL. The latter can be used in this field to provide services
BAC Breast Arterial Calcification
such as object tracking [182], activity recognition, image
BLE Bluetooth Low Energy
classification, and object recognition [183] to name a few.
BPTT Backpropagation Through Time
Augmented reality can greatly affect several domains such as
CAE Contractive Auto-encoder
education, museums, smart cars, etc.
CDR Caller Detail Record
CNN Convolutional Neural Network
VIII. C ONCLUSION CoAP Constrained Application Protocol
DL and IoT have drawn the attention of researchers and CPS Cyber-physical System
commercial verticals in recent years, as these two technology CRBM Conditional Restricted Boltzmann Machine
trends have proven to make a positive effect on our lives, cities, DAE Denoising Auto-encoder
and world. IoT and DL constitute a chain of data producer- DBN Deep Belief Network
consumer, in which IoT generates raw data that is analyzed DL Deep Learning
by DL models and DL models produce high-level abstraction DNN Deep Neural Network
and insight that is fed to the IoT systems for fine-tuning and DNS-SD DNS Service Discovery
improvement of services. DRL Deep Reinforcement Learning
In this survey, we reviewed the characteristics of IoT data FDC Fault Detection and Classification
and its challenges for DL methods. In specific, we highlighted FDI False Data Injection
IoT fast and streaming data as well as IoT big data as the two GAN Generative Adversarial Network
main categories of IoT data generation and their requirements GBM Gradient Boosted Machine
for analytics. We also presented several main architectures of GLM Generalized Linear Model
DL that is used in the context of IoT applications followed by HMM Hidden Markov Model
several open source frameworks for development of DL archi- HVAC Heating, Ventilation and Air Conditioning
tectures. Reviewing different applications in various sectors INS Inertia Navigation System
of IoT that have utilized DL was another part of this survey IoT Internet of Things
in which we identified five foundational services along with IPA Intelligent Personal Assistant
eleven application domains. By distinguishing foundational ITS Intelligent Transportation System
services, as well as IoT vertical applications, and reviewing LSTM Long Short-term Memory
their DL approaches and use cases, the authors provided a MAPE Mean Absolute Percentage Error
basis for other researchers to understand the principle compo- mDNS multicast DNS
nents of IoT smart services and apply the relevant techniques MLP Multi-layer Perceptron
to their problems. The new paradigm of implementing DL on MOOC Massive Open Online Courses
IoT devices was surveyed and several approaches to achieve it MQTT Message Queue Telemetry Transport
were introduced. DL based on fog and cloud infrastructures to RBN Restricted Boltzmann Machine
support IoT applications was another part of this survey. We ReLU Rectified Linear Units
also identified the challenges and future research direction in RNN Recurrent Neural Network
the path of DL for IoT applications. SaaS Software as a Service
SdA Stacked denoising Autoencoder
SGD Stochastic Gradient Descent
SVM Support Vector Machine
SVR Support Vector Regression
TPU Tensor Processing Unit
UAV Unmanned Aerial Vehicle
VAE Variational Auto-encoder
VLC Visual Light Communication
WSN Wireless Sensor Network
XMPP Extensible Messaging and Presence Protocol

R EFERENCES

[1] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and


M. Ayyash, “Internet of Things: A survey on enabling technologies,
protocols, and applications,” IEEE Communications Surveys & Tutori-
als, vol. 17, no. 4, pp. 2347–2376, 2015.
30

[2] J. Manyika, M. Chui, J. Bughin, R. Dobbs, P. Bisson, and A. Marrs, [26] C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint
Disruptive technologies: Advances that will transform life, business, arXiv:1606.05908v2 [stat.ML], 2016.
and the global economy. McKinsey Global Institute San Francisco, [27] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-
CA, 2013, vol. 180. supervised learning with deep generative models,” in Advances in
[3] K. Panetta. (2016) Gartner’s top 10 strategic technology trends for Neural Information Processing Systems, 2014, pp. 3581–3589.
2017. [Online]. Available: http://www.gartner.com/smarterwithgartner/ [28] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
gartners-top-10-technology-trends-2017/ S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
[4] M. Mohammadi and A. Al-Fuqaha, “Enabling cognitive smart cities Advances in Neural Information Processing Systems, 2014, pp. 2672–
using big data and machine learning: Approaches and challenges,” 2680.
IEEE Communications Magazine, vol. PP, no. 99, pp. 1–8, 2017. [29] Y. Bengio et al., “Learning deep architectures for AI,” Foundations and
[5] M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big data: related trends R in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
technologies, challenges and future prospects. Springer, 2014. [30] H. Valpola, “From neural pca to deep unsupervised learning,” Advances
[6] X.-W. Chen and X. Lin, “Big data deep learning: challenges and in Independent Component Analysis and Learning Machines, pp. 143–
perspectives,” IEEE Access, vol. 2, pp. 514–525, 2014. 171, 2015.
[7] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mccauley, [31] A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko,
M. Franklin, S. Shenker, and I. Stoica, “Fast and interactive analytics “Semi-supervised learning with ladder networks,” in Advances in
over hadoop data with spark,” USENIX Login, vol. 37, no. 4, pp. 45–51, Neural Information Processing Systems, 2015, pp. 3546–3554.
2012. [32] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A
[8] C. Engle, A. Lupher, R. Xin, M. Zaharia, M. J. Franklin, S. Shenker, fast and accurate online sequential learning algorithm for feedforward
and I. Stoica, “Shark: fast data analysis using coarse-grained distributed networks,” IEEE Transactions on neural networks, vol. 17, no. 6, pp.
memory,” in Proceedings of the 2012 ACM SIGMOD International 1411–1423, 2006.
Conference on Management of Data. ACM, 2012, pp. 689–692. [33] Z. Yang, P. Zhang, and L. Chen, “RFID-enabled indoor positioning
[9] C.-W. Tsai, C.-F. Lai, M.-C. Chiang, L. T. Yang et al., “Data mining method for a real-time manufacturing execution system using OS-
for internet of things: A survey.” IEEE Communications Surveys and ELM,” Neurocomputing, vol. 174, pp. 121–133, 2016.
Tutorials, vol. 16, no. 1, pp. 77–97, 2014. [34] H. Zou, H. Jiang, X. Lu, and L. Xie, “An online sequential extreme
[10] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Context learning machine approach to wifi based indoor positioning,” in Internet
aware computing for the internet of things: A survey,” IEEE Commu- of Things (WF-IoT), 2014 IEEE World Forum on. IEEE, 2014, pp.
nications Surveys & Tutorials, vol. 16, no. 1, pp. 414–454, 2014. 111–116.
[11] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Machine learning [35] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
in wireless sensor networks: Algorithms, strategies, and applications,” time object detection with region proposal networks,” IEEE transac-
IEEE Communications Surveys & Tutorials, vol. 16, no. 4, pp. 1996– tions on pattern analysis and machine intelligence, vol. 39, no. 6, pp.
2018, 2014. 1137–1149, 2017.
[12] Z. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and [36] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE international
K. Mizutani, “State-of-the-art deep learning: Evolving machine intelli- conference on computer vision, 2015, pp. 1440–1448.
gence toward tomorrow’s intelligent network traffic control systems,” [37] H. Mao, S. Yao, T. Tang, B. Li, J. Yao, and Y. Wang, “Towards real-
IEEE Communications Surveys Tutorials, vol. PP, no. 99, 2017. time object detection on embedded systems,” IEEE Transactions on
Emerging Topics in Computing, vol. PP, no. 99, pp. 1–15, 2017.
[13] B. Li, Y. Diao, and P. Shenoy, “Supporting scalable analytics with
[38] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
latency constraints,” Proceedings of the VLDB Endowment, vol. 8,
once: Unified, real-time object detection,” in Proceedings of the IEEE
no. 11, pp. 1166–1177, 2015.
Conference on Computer Vision and Pattern Recognition, 2016, pp.
[14] M. Hilbert, “Big data for development: a review of promises and
779–788.
challenges,” Development Policy Review, vol. 34, no. 1, pp. 135–174,
[39] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
2016.
D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement
[15] W. Fan and A. Bifet, “Mining big data: current status, and forecast to learning,” arXiv preprint arXiv:1312.5602v1 [cs.LG], 2013.
the future,” ACM sIGKDD Explorations Newsletter, vol. 14, no. 2, pp. [40] M. Mohammadi, A. Al-Fuqaha, M. Guizani, and J.-S. Oh, “Semi-
1–5, 2013. supervised deep reinforcement learning in support of IoT and smart
[16] H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward scalable systems for city services,” IEEE Internet of Things Journal, vol. PP, no. 99, pp.
big data analytics: A technology tutorial,” IEEE Access, vol. 2, pp. 1–12, 2017.
652–687, 2014. [41] Y. Bengio et al., “Deep learning of representations for unsupervised and
[17] Y. Demchenko, P. Grosso, C. De Laat, and P. Membrey, “Addressing transfer learning.” ICML Unsupervised and Transfer Learning, vol. 27,
big data issues in scientific data infrastructure,” in Collaboration pp. 17–36, 2012.
Technologies and Systems (CTS), 2013 International Conference on. [42] J. Deng, R. Xia, Z. Zhang, Y. Liu, and B. Schuller, “Introducing shared-
IEEE, 2013, pp. 48–55. hidden-layer autoencoders for transfer learning and their application
[18] M. Strohbach, H. Ziekow, V. Gazis, and N. Akiva, “Towards a big data in acoustic emotion recognition,” in Acoustics, Speech and Signal
analytics framework for iot and smart city applications,” in Modeling Processing (ICASSP), 2014 IEEE International Conference on. IEEE,
and processing for next-generation big-data technologies. Springer, 2014, pp. 4818–4822.
2015, pp. 257–282. [43] P. Wu, S. C. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao, “Online mul-
[19] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of timodal deep similarity learning with application to image retrieval,” in
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, Proceedings of the 21st ACM international conference on Multimedia.
2006. ACM, 2013, pp. 153–162.
[20] L. Deng, “A tutorial survey of architectures, algorithms, and ap- [44] P. Jaini, A. Rashwan, H. Zhao, Y. Liu, E. Banijamali, Z. Chen, and
plications for deep learning,” APSIPA Transactions on Signal and P. Poupart, “Online algorithms for sum-product networks with contin-
Information Processing, vol. 3, pp. 1–29, 2014. uous variables,” in Proceedings of the Eighth International Conference
[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification on Probabilistic Graphical Models, 2016, pp. 228–239.
with deep convolutional neural networks,” in Advances in neural [45] G. Chen, R. Xu, and S. N. Srihari, “Sequential labeling with online
information processing systems, 2012, pp. 1097–1105. deep learning: Exploring model initialization,” in Joint European Con-
[22] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct ference on Machine Learning and Knowledge Discovery in Databases.
deep recurrent neural networks,” arXiv preprint arXiv:1312.6026v5 Springer, 2016, pp. 772–788.
[cs.NE], 2013. [46] S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah, “Com-
[23] M. Hermans and B. Schrauwen, “Training and analysing deep recur- parative study of deep learning software frameworks,” arXiv preprint
rent neural networks,” in Advances in neural information processing arXiv:1511.06435v3 [cs.LG], 2016.
systems, 2013, pp. 190–198. [47] A. Candel, V. Parmar, E. LeDell, and A. Arora, “Deep learning with
[24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural h2o,” 2015.
computation, vol. 9, no. 8, pp. 1735–1780, 1997. [48] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
[25] P. Baldi, “Autoencoders, unsupervised learning, and deep architec- Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale
tures.” ICML unsupervised and transfer learning, vol. 27, no. 37-50, machine learning on heterogeneous distributed systems,” arXiv preprint
p. 1, 2012. arXiv:1603.04467v2 [cs.DC], 2016.
31

[49] R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A matlab-like [69] Y. Gu, Y. Chen, J. Liu, and X. Jiang, “Semi-supervised deep extreme
environment for machine learning,” in BigLearn, NIPS Workshop, no. learning machine for wi-fi based localization,” Neurocomputing, vol.
EPFL-CONF-192376, 2011. 166, pp. 282–293, 2015.
[50] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Berg- [70] W. Zhang, K. Liu, W. Zhang, Y. Zhang, and J. Gu, “Deep neural
eron, N. Bouchard, D. Warde-Farley, and Y. Bengio, “Theano: new networks for wireless localization in indoor and outdoor environments,”
features and speed improvements,” arXiv preprint arXiv:1211.5590v1 Neurocomputing, vol. 194, pp. 279–287, 2016.
[cs.SC], 2012. [71] Z. Liu, L. Zhang, Q. Liu, Y. Yin, L. Cheng, and R. Zimmer-
[51] S. Raschka and V. Mirjalili, Python Machine Learning, 2nd ed. mann, “Fusion of magnetic and visual sensors for indoor localization:
Birmingham, UK: Packt Publishing, 2017. Infrastructure-free and more effective,” IEEE Transactions on Multi-
[52] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, media, vol. 19, no. 4, pp. 874–888, 2017.
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for [72] M. Becker, “Indoor positioning solely based on user’s sight,” in
fast feature embedding,” in Proceedings of the 22nd ACM international International Conference on Information Science and Applications.
conference on Multimedia. ACM, 2014, pp. 675–678. Springer, 2017, pp. 76–83.
[53] S. Shi, Q. Wang, P. Xu, and X. Chu, “Benchmarking state-of-the- [73] W. Lu, J. Zhang, X. Zhao, J. Wang, and J. Dang, “Multimodal sensory
art deep learning software tools,” arXiv preprint arXiv:1608.07249v7 fusion for soccer robot self-localization based on long short-term
[cs.DC], 2016. memory recurrent neural network,” Journal of Ambient Intelligence
and Humanized Computing, pp. 1–9, 2017.
[54] R. Mehmood, F. Alam, N. N. Albogami, I. Katib, A. Albeshri, and
[74] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via
S. M. Altowaijri, “Utilearn: A personalised ubiquitous teaching and
deep neural networks,” in Proceedings of the IEEE Conference on
learning system for smart societies,” IEEE Access, vol. 5, pp. 2615–
Computer Vision and Pattern Recognition, 2014, pp. 1653–1660.
2635, 2017.
[75] J. Liu, Y. Gu, and S. Kamijo, “Joint customer pose and orientation
[55] A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, and estimation using deep neural network from surveillance camera,” in
B. Vorster, “Deep learning in the automotive industry: Applications and Multimedia (ISM), 2016 IEEE International Symposium on. IEEE,
tools,” in Big Data (Big Data), 2016 IEEE International Conference 2016, pp. 216–221.
on. IEEE, 2016, pp. 3759–3768. [76] F. J. Ordóñez and D. Roggen, “Deep convolutional and lstm recurrent
[56] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Large-scale transportation neural networks for multimodal wearable activity recognition,” Sensors,
network congestion evolution prediction using deep learning theory,” vol. 16, no. 1, p. 115, 2016.
PloS one, vol. 10, no. 3, p. e0119044, 2015. [77] D. Tao, Y. Wen, and R. Hong, “Multi-column bi-directional long short-
[57] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar, term memory for mobile devices-based human activity recognition,”
“An early resource characterization of deep learning on wearables, IEEE Internet of Things Journal, 2016.
smartphones and internet-of-things devices,” in Proceedings of the 2015 [78] X. Li, Y. Zhang, I. Marsic, A. Sarcevic, and R. S. Burd, “Deep learning
International Workshop on Internet of Things towards Applications. for rfid-based activity recognition,” in Proceedings of the 14th ACM
ACM, 2015, pp. 7–12. Conference on Embedded Network Sensor Systems. ACM, 2016, pp.
[58] G. Mittal, K. B. Yagnik, M. Garg, and N. C. Krishnan, “Spotgarbage: 164–175.
smartphone app to detect garbage using deep learning,” in Proceedings [79] L. Pigou, A. Van Den Oord, S. Dieleman, M. Van Herreweghe,
of the 2016 ACM International Joint Conference on Pervasive and and J. Dambre, “Beyond temporal pooling: Recurrence and temporal
Ubiquitous Computing. ACM, 2016, pp. 940–945. convolutions for gesture recognition in video,” International Journal
[59] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, Y. Ma, S. Chen, of Computer Vision, pp. 1–10, 2015.
and P. Hou, “A new deep learning-based food recognition system for [80] K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent network
dietary assessment on an edge computing service infrastructure,” IEEE models for human dynamics,” in Proceedings of the IEEE International
Transactions on Services Computing, 2017. Conference on Computer Vision, 2015, pp. 4346–4354.
[60] S. Sladojevic, M. Arsenovic, A. Anderla, D. Culibrk, and D. Stefanovic, [81] S. E. Kahou, X. Bouthillier, P. Lamblin, C. Gulcehre, V. Michal-
“Deep neural networks based recognition of plant diseases by leaf ski, K. Konda, S. Jean, P. Froumenty, Y. Dauphin, N. Boulanger-
image classification,” Computational Intelligence and Neuroscience, Lewandowski et al., “Emonets: Multimodal deep learning approaches
vol. 2016, 2016. for emotion recognition in video,” Journal on Multimodal User Inter-
[61] Y. Liu, E. Racah, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel, faces, vol. 10, no. 2, pp. 99–111, 2016.
M. Wehner, and W. Collins, “Application of deep convolutional neural [82] N. Neverova, C. Wolf, G. Lacey, L. Fridman, D. Chandra, B. Barbello,
networks for detecting extreme weather in climate datasets,” Int’l Conf. and G. Taylor, “Learning human identity from motion patterns,” IEEE
on Advances in Big Data Analytics, 2016. Access, vol. 4, pp. 1810–1820, 2016.
[62] S. Tokui, K. Oono, S. Hido, and J. Clayton, “Chainer: a next-generation [83] Y. He, G. J. Mendis, and J. Wei, “Real-time detection of false data
open source framework for deep learning,” in Proceedings of workshop injection attacks in smart grid: A deep learning-based intelligent
on machine learning systems (LearningSys) in the twenty-ninth annual mechanism,” IEEE Transactions on Smart Grid, 2017.
conference on neural information processing systems (NIPS), 2015. [84] M.-J. Kang and J.-W. Kang, “Intrusion detection system using deep
neural network for in-vehicle network security,” PloS one, vol. 11,
[63] Y. Hada-Muranushi, T. Muranushi, A. Asai, D. Okanohara, R. Ray-
no. 6, p. e0155781, 2016.
mond, G. Watanabe, S. Nemoto, and K. Shibata, “A deep-learning
[85] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, “Droid-sec: deep learning in
approach for operation of an automated realtime flare forecast,” SPACE
android malware detection,” in ACM SIGCOMM Computer Communi-
WEATHER, 2016.
cation Review, vol. 44, no. 4. ACM, 2014, pp. 371–372.
[64] J. A. C. Soto, M. Jentsch, D. Preuveneers, and E. Ilie-Zudor, “Ceml: [86] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
Mixing and moving complex event processing and machine learning Proceedings of the 22nd ACM SIGSAC conference on computer and
to the edge of the network for iot applications,” in Proceedings of the communications security. ACM, 2015, pp. 1310–1321.
6th International Conference on the Internet of Things. ACM, 2016, [87] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov,
pp. 103–110. K. Talwar, and L. Zhang, “Deep learning with differential privacy,”
[65] M. Borkowski, S. Schulte, and C. Hochreiner, “Predicting cloud re- in Proceedings of the 2016 ACM SIGSAC Conference on Computer
source utilization,” in Proceedings of the 9th International Conference and Communications Security. ACM, 2016, pp. 308–318.
on Utility and Cloud Computing. ACM, 2016, pp. 37–42. [88] T. J. Hazen. (2016) Microsoft and liebherr collaborating on
[66] H. Larry. (2017) Voice control everywhere: Low-power special- new generation of smart refrigerators. [Online]. Available: http:
purpose chip could make speech recognition ubiquitous //blogs.technet.microsoft.com/machinelearning/2016/09/02/
in electronics. [Online]. Available: http://news.mit.edu/2017/ [89] M. Manic, K. Amarasinghe, J. J. Rodriguez-Andina, and C. Rieger,
low-power-chip-speech-recognition-electronics-0213 “Intelligent buildings of the future: Cyberaware, deep learning powered,
[67] M. Price, J. Glass, and A. Chandrakasan, “A scalable speech recognizer and human interacting,” IEEE Industrial Electronics Magazine, vol. 10,
with deep-neural-network acoustic models and voice-activated power no. 4, pp. 32–49, 2016.
gating,” in Proceedings of the IEEE ISSCC2017, 2017. [90] P. Feng, M. Yu, S. M. Naqvi, and J. A. Chambers, “Deep learning for
[68] X. Wang, L. Gao, S. Mao, and S. Pandey, “Deepfi: Deep learning posture analysis in fall detection,” in Digital Signal Processing (DSP),
for indoor fingerprinting using channel state information,” in 2015 2014 19th International Conference on. IEEE, 2014, pp. 12–17.
IEEE Wireless Communications and Networking Conference (WCNC). [91] Toshiba:Press-Release. (2016) Toshiba and dell technologies’ deep
IEEE, 2015, pp. 1666–1671. learning testbed for iot is first approved by industrial internet
32

consortium. [Online]. Available: https://www.toshiba.co.jp/about/press/ [114] M. B. Ibáñez, Á. Di Serio, D. Villarán, and C. D. Kloos, “Exper-
2016 10/pr1702.htm imenting with electromagnetism using augmented reality: Impact on
[92] X. Song, H. Kanasugi, and R. Shibasaki, “Deeptransport: Prediction flow student experience and educational effectiveness,” Computers &
and simulation of human mobility and transportation mode at a Education, vol. 71, pp. 1–13, 2014.
citywide level.” IJCAI, 2016. [115] L.-f. Kwok, “A vision for the development of i-campus,” Smart
[93] V. C. Liang, R. T. Ma, W. S. Ng, L. Wang, M. Winslett, H. Wu, S. Ying, Learning Environments, vol. 2, pp. 1–12, 2015.
and Z. Zhang, “Mercury: Metro density prediction with recurrent neural [116] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learning
network on streaming cdr data,” in Data Engineering (ICDE), 2016 for recommender systems,” in Proceedings of the 21th ACM SIGKDD
IEEE 32nd International Conference on. IEEE, 2016, pp. 1374–1377. International Conference on Knowledge Discovery and Data Mining.
[94] G. Amato, F. Carrara, F. Falchi, C. Gennaro, C. Meghini, and C. Vairo, ACM, 2015, pp. 1235–1244.
“Deep learning for decentralized parking lot occupancy detection,” [117] T.-Y. Yang, C. G. Brinton, C. Joe-Wong, and M. Chiang, “Behavior-
Expert Systems with Applications, 2017. based grade prediction for moocs via time series neural networks,”
[95] S. Valipour, M. Siam, E. Stroulia, and M. Jagersand, “Parking-stall va- IEEE Journal of Selected Topics in Signal Processing, 2017.
cancy indicator system, based on deep convolutional neural networks,” [118] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and
in 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), 2016, J. Sohl-Dickstein, “Deep knowledge tracing,” in Advances in Neural
pp. 655–660. Information Processing Systems, 2015, pp. 505–513.
[96] D. C. Mocanu, E. Mocanu, P. H. Nguyen, M. Gibescu, and A. Liotta, [119] F. Conti, A. Pullini, and L. Benini, “Brain-inspired classroom occu-
“Big iot data mining for real-time energy disaggregation in buildings,” pancy monitoring on a low-power mobile platform,” in Proceedings
in Proceedings of the IEEE International Conference on Systems, Man, of the IEEE Conference on Computer Vision and Pattern Recognition
and Cybernetics, 2016, pp. 9–12. Workshops, 2014, pp. 610–615.
[97] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deep [120] H. Shao, H. Jiang, F. Wang, and H. Zhao, “An enhancement deep fea-
learning for estimating building energy consumption,” Sustainable ture fusion method for rotating machinery fault diagnosis,” Knowledge-
Energy, Grids and Networks, vol. 6, pp. 91–99, 2016. Based Systems, vol. 119, pp. 200–220, 2017.
[98] A. Gensler, J. Henze, B. Sick, and N. Raabe, “Deep learning for [121] H. Lee, “Framework and development of fault detection classification
solar power forecasting—an approach using autoencoder and lstm using iot device and cloud environment,” Journal of Manufacturing
neural networks,” in Systems, Man, and Cybernetics (SMC), 2016 IEEE Systems, 2017.
International Conference on. IEEE, 2016, pp. 2858–2865. [122] H. Lee, Y. Kim, and C. O. Kim, “A deep learning model for robust
[99] Y. Tian and L. Pan, “Predicting short-term traffic flow by wafer fault monitoring with sensor measurement noise,” IEEE Trans-
long short-term memory recurrent neural network,” in Smart actions on Semiconductor Manufacturing, vol. 30, no. 1, pp. 23–31,
City/SocialCom/SustainCom (SmartCity), 2015 IEEE International 2017.
Conference on. IEEE, 2015, pp. 153–158. [123] W. Yan and L. Yu, “On accurate and reliable anomaly detection for
[100] D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column gas turbine combustors: A deep learning approach,” in Proceedings
deep neural network for traffic sign classification,” Neural Networks, of the Annual Conference of the Prognostics and Health Management
vol. 32, pp. 333–338, 2012. Society, 2015.
[124] Y. Liu and L. Wu, “Geological disaster recognition on optical remote
[101] K. Lim, Y. Hong, Y. Choi, and H. Byun, “Real-time traffic sign
sensing images using deep learning,” Procedia Computer Science,
recognition based on a general purpose gpu and deep-learning,” PLoS
vol. 91, pp. 566–575, 2016.
one, vol. 12, no. 3, p. e0173317, 2017.
[125] Q. Wang, Y. Guo, L. Yu, and P. Li, “Earthquake prediction based
[102] E. Ackerman and A. Pagano, “Deep-learning first: Drive.ai’s path to
on spatio-temporal data mining: An lstm network approach,” IEEE
autonomous driving,” IEEE Spectrum, 2017.
Transactions on Emerging Topics in Computing, 2017.
[103] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, and Y. Ma, “Deepfood:
[126] H. Maeda, Y. Sekimoto, and T. Seto, “Lightweight road manager:
Deep learning-based food image recognition for computer-aided dietary
smartphone-based automatic determination of road damage status by
assessment,” in International Conference on Smart Homes and Health
deep neural network,” in Proceedings of the 5th ACM SIGSPATIAL
Telematics. Springer, 2016, pp. 37–48.
International Workshop on Mobile Geographic Information Systems.
[104] C. R. Pereira, D. R. Pereira, J. P. Papa, G. H. Rosa, and X.-S. ACM, 2016, pp. 37–45.
Yang, “Convolutional neural networks applied for parkinson’s disease [127] L. Steinberg. (2015) Forbes - Changing the
identification,” in Machine Learning for Health Informatics. Springer, game: The rise of sports analytics. [Online].
2016, pp. 377–390. Available: https://www.forbes.com/sites/leighsteinberg/2015/08/18/
[105] G. Muhammad, S. M. M. Rahman, A. Alelaiwi, and A. Alamri, “Smart changing-the-game-the-rise-of-sports-analytics
health solution integrating IoT and cloud: A case study of voice [128] W. Liu, J. Liu, X. Gu, K. Liu, X. Dai, and H. Ma, “Deep learning
pathology monitoring,” IEEE Communications Magazine, vol. 55, based intelligent basketball arena with energy image,” in International
no. 1, pp. 69–73, 2017. Conference on Multimedia Modeling. Springer, 2017, pp. 601–613.
[106] J. Wang, H. Ding, F. Azamian, B. Zhou, C. Iribarren, S. Molloi, and [129] K.-C. Wang and R. Zemel, “classifying nba offensive plays using neural
P. Baldi, “Detecting cardiovascular disease from mammograms with networks,” in Proc. MIT SLOAN Sports Analytics Conf, 2016.
deep learning,” IEEE Transactions on Medical Imaging, 2017. [130] R. Shah and R. Romijnders, “Applying deep learning to basketball
[107] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning to trajectories,” in ACM KDD’16, 2016.
diagnose with lstm recurrent neural networks,” in ICLR 2016, 2016. [131] T. Kautz, B. H. Groh, J. Hannink, U. Jensen, H. Strubberg, and
[108] D. Ravı̀, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-Perez, B. Lo, B. M. Eskofier, “Activity recognition in beach volleyball using a deep
and G.-Z. Yang, “Deep learning for health informatics,” IEEE journal convolutional neural network,” Data Mining and Knowledge Discovery,
of biomedical and health informatics, vol. 21, no. 1, pp. 4–21, 2017. pp. 1–28, 2017.
[109] N. Kussul, M. Lavreniuk, S. Skakun, and A. Shelestov, “Deep learning [132] M. S. Ibrahim, S. Muralidharan, Z. Deng, A. Vahdat, and G. Mori, “A
classification of land cover and crop types using remote sensing data,” hierarchical deep temporal model for group activity recognition,” in
IEEE Geoscience and Remote Sensing Letters, 2017. Proceedings of the IEEE Conference on Computer Vision and Pattern
[110] K. Kuwata and R. Shibasaki, “Estimating crop yields with deep Recognition, 2016, pp. 1971–1980.
learning and remotely sensed data,” in Geoscience and Remote Sensing [133] S. Bell and K. Bala, “Learning visual similarity for product design
Symposium (IGARSS), 2015 IEEE International. IEEE, 2015, pp. 858– with convolutional neural networks,” ACM Transactions on Graphics
861. (TOG), vol. 34, no. 4, p. 98, 2015.
[111] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and [134] L. Xiao and X. Yichao, “Exact clothing retrieval approach based
C. H. Davis, “Training deep convolutional neural networks for land– on deep neural network,” in Information Technology, Networking,
cover classification of high-resolution imagery,” IEEE Geoscience and Electronic and Automation Control Conference, IEEE. IEEE, 2016,
Remote Sensing Letters, vol. 14, no. 4, pp. 549–553, 2017. pp. 396–400.
[112] K. A. Steen, P. Christiansen, H. Karstoft, and R. N. Jørgensen, “Using [135] M. Hadi Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg,
deep learning to challenge safety standard for highly autonomous “Where to buy it: Matching street clothing photos in online shops,” in
machines in agriculture,” Journal of Imaging, vol. 2, no. 1, p. 6, 2016. Proceedings of the IEEE International Conference on Computer Vision,
[113] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, and C. McCool, 2015, pp. 3343–3351.
“Deepfruits: A fruit detection system using deep neural networks,” [136] S. Advani, P. Zientara, N. Shukla, I. Okafor, K. Irick, J. Sampson,
Sensors, vol. 16, no. 8, p. 1222, 2016. S. Datta, and V. Narayanan, “A multitask grocery assist system for the
33

visually impaired: Smart glasses, gloves, and shopping carts provide [158] N. Nguyen-Thanh, L. Yang, D. H. Nguyen, C. Jabbour, B. Murmann
auditory and tactile feedback,” IEEE Consumer Electronics Magazine, et al., “Cognitive computation and communication: A complement so-
vol. 6, no. 1, pp. 73–81, 2017. lution to cloud for iot,” in Advanced Technologies for Communications
[137] B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, “A multi- (ATC), 2016 International Conference on. IEEE, 2016, pp. 222–230.
stream bi-directional recurrent neural network for fine-grained action [159] B. Tang, Z. Chen, G. Hefferman, S. Pei, W. Tao, H. He, and Q. Yang,
detection,” in Proceedings of the IEEE Conference on Computer Vision “Incorporating intelligence in fog computing for big data analysis in
and Pattern Recognition, 2016, pp. 1961–1970. smart cities,” IEEE Transactions on Industrial Informatics, 2017.
[138] M. Denil, B. Shakibi, L. Dinh, N. de Freitas et al., “Predicting [160] A. Al-Fuqaha, A. Khreishah, M. Guizani, A. Rayes, and M. Moham-
parameters in deep learning,” in Advances in Neural Information madi, “Toward better horizontal integration among iot services,” IEEE
Processing Systems, 2013, pp. 2148–2156. Communications Magazine, vol. 53, no. 9, pp. 72–79, 2015.
[139] J. Ba and R. Caruana, “Do deep nets really need to be deep?” in [161] A. Coates, B. Huval, T. Wang, D. J. Wu, A. Y. Ng, and B. Catanzaro,
Advances in neural information processing systems, 2014, pp. 2654– “Deep learning with cots hpc systems,” in Proceedings of the 30th
2662. International Conference on Machine Learning, vol. 28. JMLR: W
[140] Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D. Jackel, & CP, 2013.
“Optimal brain damage.” in NIPs, vol. 2, 1989, pp. 598–605. [162] T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, “Project
[141] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights adam: Building an efficient and scalable deep learning training sys-
and connections for efficient neural network,” in Advances in Neural tem.” in 11th USENIX Symposium on Operating Systems Design and
Information Processing Systems, 2015, pp. 1135–1143. Implementation, vol. 14, 2014, pp. 571–582.
[142] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and [163] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa,
W. J. Dally, “Eie: efficient inference engine on compressed deep neural S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter
network,” arXiv preprint arXiv:1602.01528v2 [cs.CV], 2016. performance analysis of a tensor processing unit,” in 44th International
[143] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, Symposium on Computer Architecture (ISCA), 2017.
“Compressing neural networks with the hashing trick,” in Proceedings [164] K. Daniel, “Lessons learned from deploying deep learning at
of the 32nd International Conference on Machine Learning, vol. 37. scale,” O’Reilly Artificial Intelligence conference, 2016. [On-
JMLR: W&CP, 2015. line]. Available: http://conferences.oreilly.com/artificial-intelligence/
[144] M. Courbariaux and Y. Bengio, “Binarized neural networks: Training ai-ny-2016/public/schedule/detail/54098
deep neural networks with weights and activations constrained to+ 1 [165] N. Hemsoth, “GPU Platforms Set to Lengthen
or-1,” arXiv preprint arXiv:1602.02830v3 [cs.LG], 2016. Deep Learning Reach,” The Next Platform, 2015.
[145] S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN: [Online]. Available: http://www.nextplatform.com/2015/12/07/
energy-efficient neuromorphic systems using approximate computing,” gpu-platforms-emerge-for-longer-deep-learning-reach/
in Proceedings of the 2014 international symposium on Low power [166] Y. Simmhan and S. Perera, “Big data analytics platforms for real-time
electronics and design. ACM, 2014, pp. 27–32. applications in iot,” in Big Data Analytics. Springer, 2016, pp. 115–
[146] B. Moons, B. De Brabandere, L. Van Gool, and M. Verhelst, “Energy- 135.
efficient convnets through approximate computing,” in Applications of [167] S. B. Qaisar and M. Usman, “Fog networking for machine health
Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, prognosis: A deep learning perspective,” in International Conference
2016, pp. 1–8. on Computational Science and Its Applications. Springer, 2017, pp.
[147] S. G. Ramasubramanian, R. Venkatesan, M. Sharad, K. Roy, and 212–219.
A. Raghunathan, “Spindle: Spintronic deep learning engine for large- [168] D. Li, T. Salonidis, N. V. Desai, and M. C. Chuah, “Deepcham:
scale neuromorphic computing,” in Proceedings of the 2014 interna- Collaborative edge-mediated adaptive deep learning for mobile object
tional symposium on Low power electronics and design. ACM, 2014, recognition,” in Edge Computing (SEC), IEEE/ACM Symposium on.
pp. 15–20. IEEE, 2016, pp. 64–76.
[148] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy- [169] Wikipedia. (2017) List of datasets for machine learning research.
efficient reconfigurable accelerator for deep convolutional neural net- [Online]. Available: https://en.wikipedia.org/wiki/List of datasets for
works,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127– machine learning research
138, 2017. [170] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
[149] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, A. Swami, “The limitations of deep learning in adversarial settings,”
“Diannao: A small-footprint high-throughput accelerator for ubiquitous in Security and Privacy (EuroS&P), 2016 IEEE European Symposium
machine-learning,” in ACM Sigplan Notices, vol. 49, no. 4. ACM, on. IEEE, 2016, pp. 372–387.
2014, pp. 269–284. [171] M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya,
[150] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, R. Wald, and E. Muharemagic, “Deep learning applications and chal-
L. Qendro, and F. Kawsar, “Deepx: A software accelerator for low- lenges in big data analytics,” Journal of Big Data, vol. 2, no. 1, p. 1,
power deep learning inference on mobile devices,” in Information Pro- 2015.
cessing in Sensor Networks (IPSN), 2016 15th ACM/IEEE International [172] S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz,
Conference on. IEEE, 2016, pp. 1–12. “Big data, analytics and the path from insights to value,” MIT sloan
[151] S. Bang, J. Wang, Z. Li, C. Gao, Y. Kim, Q. Dong, Y.-P. Chen, management review, vol. 52, no. 2, p. 21, 2011.
L. Fick, X. Sun, R. Dreslinski et al., “14.7 a 288µw programmable [173] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily
deep-learning processor with 270kb on-chip weight storage using non- fooled: High confidence predictions for unrecognizable images,” in
uniform memory hierarchy for mobile intelligence,” in Solid-State Proceedings of the IEEE Conference on Computer Vision and Pattern
Circuits Conference (ISSCC), 2017 IEEE International. IEEE, 2017, Recognition, 2015, pp. 427–436.
pp. 250–251. [174] X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga,
[152] X. Xu, S. Das, and K. Kreutz-Delgado, “ApproxDBN: Approximate “Ensemble deep learning for regression and time series forecasting,” in
Computing for Discriminative Deep Belief Networks,” arXiv preprint Computational Intelligence in Ensemble Learning (CIEL), 2014 IEEE
arXiv:1704.03993v3 [cs.NE], 2017. Symposium on. IEEE, 2014, pp. 1–6.
[153] S. Venkataramani, K. Roy, and A. Raghunathan, “Efficient embedded [175] M. A. Alsheikh, D. Niyato, S. Lin, H.-P. Tan, and Z. Han, “Mobile big
learning for iot devices,” in 2016 21st Asia and South Pacific Design data analytics using deep learning and apache spark,” IEEE Network,
Automation Conference (ASP-DAC). IEEE, 2016, pp. 308–311. vol. 30, no. 3, pp. 22–29, 2016.
[154] D. D. Awschalom and M. E. Flatté, “Challenges for semiconductor [176] A. Gharaibeh, A. Khreishah, M. Mohammadi, A. Al-Fuqaha, I. Khalil,
spintronics,” Nature Physics, vol. 3, no. 3, pp. 153–159, 2007. and A. Rayes, “Online auction of cloud resources in support of the
[155] K. Bourzac, “Speck-size computers: Now with deep learning [news],” internet of things,” IEEE Internet of Things Journal, vol. PP, no. 99,
IEEE Spectrum, vol. 54, no. 4, pp. 13–15, 2017. pp. 1–14, 2017.
[156] B. Moons and M. Verhelst, “A 0.3–2.6 tops/w precision-scalable [177] J. Gantz and D. Reinsel, “The digital universe in 2020: Big data, bigger
processor for real-time large-scale convnets,” in VLSI Circuits (VLSI- digital shadows, and biggest growth in the far east,” IDC iView: IDC
Circuits), 2016 IEEE Symposium on. IEEE, 2016, pp. 1–2. Analyze the future, vol. 2007, no. 2012, pp. 1–16, 2012.
[157] D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “A deep learning approach [178] National Science Foundation. (2017) Cyber-Physical Systems (CPS)
to on-node sensor data analytics for mobile or wearable devices,” IEEE - Program Solicitation (NSF 17-529). [Online]. Available: https:
Journal of Biomedical and Health Informatics, 2016. //www.nsf.gov/pubs/2017/nsf17529/nsf17529.pdf
34

[179] P. Valente Klaine, M. A. Imran, O. Onireti, and R. D. Souza, “A survey Sameh Sorour (S’98, M’11, SM’16) is an As-
of machine learning techniques applied to self organizing cellular sistant Professor at the Department of Electrical
networks,” IEEE Communications Surveys and Tutorials, vol. PP, 2017. and Computer Engineering, University of Idaho. He
[180] J. Lee, J. Wang, D. Crandall, S. Šabanović, and G. Fox, “Real-time, received his B.Sc. and M.Sc. degrees in Electrical
cloud-based object detection for unmanned aerial vehicles,” in Robotic Engineering from Alexandria University, Egypt, in
Computing (IRC), IEEE International Conference on. IEEE, 2017, 2002 and 2006, respectively. In 2011, he obtained his
pp. 36–43. Ph.D. degree in Electrical and Computer Engineer-
[181] L. Tai, S. Li, and M. Liu, “A deep-network solution towards model-less ing from University of Toronto, Canada. After two
obstacle avoidance,” in Intelligent Robots and Systems (IROS), 2016 postdoctoral fellowships at University of Toronto
IEEE/RSJ International Conference on. IEEE, 2016, pp. 2759–2764. and King Abduallah University of Science and Tech-
[182] O. Akgul, H. I. Penekli, and Y. Genc, “Applying deep learning in nology (KAUST), he joined King Fahd University
augmented reality tracking,” in Signal-Image Technology & Internet- of Petroleum and Minerals (KFUPM) in 2013 before moving to University
Based Systems (SITIS), 2016 12th International Conference on. IEEE, of Idaho in 2016. His research interests lie in the broad area of advanced
2016, pp. 47–54. communications/networking/computing/learning technologies for smart cities
[183] R. E. Sutanto, L. Pribadi, and S. Lee, “3d integral imaging based applications, including cyber physical systems, internet of things (IoT) and
augmented reality with deep learning implemented by faster r-cnn,” IoT-enabled systems, cloud and fog networking, network coding, device-to-
in International Conference on Mobile and Wireless Technology. device networking, autonomous driving and autonomous systems, intelligent
Springer, 2017, pp. 241–247. transportation systems, and mathematical modelling and optimization for
smart systems.

Mehdi Mohammadi (S’14) is a Ph.D. candidate


in the Department of Computer Science, Western
Michigan University (WMU), Kalamazoo, MI, USA.
He received his B.S. degree in Computer Engi-
neering from Kharazmi University, Tehran, Iran in
2003 and his M.S. degree in Computer Engineering Mohsen Guizani (S’85, M’89, SM’99, F’09) re-
(Software) from Sheikhbahaee University, Isfahan, ceived the B.S. (with distinction) and M.S. degrees
Iran in 2010. His research interests include Internet in electrical engineering, the M.S. and Ph.D. degrees
of Things, IoT data analytics, Cloud Computing, in computer engineering from Syracuse University,
and Machine Learning. He served as reviewer for Syracuse, NY, USA, in 1984, 1986, 1987, and 1990,
several journals including IEEE Communications respectively. He is currently a Professor and the ECE
Magazine, IEEE Communications Letters, Wiley’s Security and Wireless Department Chair at the University of Idaho, USA.
Communication Networks journal and Wiley’s Wireless Communications and Previously, he served as the Associate Vice President
Mobile Computing Journal. He received a Graduate Doctoral Assistantship of Graduate Studies, Qatar University, Chair of the
from the WMU Libraries Information Technology since 2013. He was the Computer Science Department, Western Michigan
recipient of six travel grants from the National Science Foundation (NSF). University, and Chair of the Computer Science De-
partment, University of West Florida. He also served in academic positions
at the University of Missouri-Kansas City, University of Colorado-Boulder,
Syracuse University, and Kuwait University. His research interests include
wireless communications and mobile computing, computer networks, mobile
cloud computing, security, and smart grid. He currently serves on the editorial
boards of several international technical journals and the Founder and the
Editor-in-Chief of Wireless Communications and Mobile Computing journal
(Wiley). He is the author of nine books and more than 450 publications
in refereed journals and conferences. He guest edited a number of special
issues in IEEE journals and magazines. He also served as a member, Chair,
and General Chair of a number of international conferences. He received
the teaching award multiple times from different institutions as well as the
best Research Award from three institutions. He was the Chair of the IEEE
Ala Al-Fuqaha (S’00-M’04-SM’09) received his Communications Society Wireless Technical Committee and the Chair of
M.S. and Ph.D. degrees in Electrical and Com- the TAOS Technical Committee. He served as the IEEE Computer Society
puter Engineering from the University of Missouri- Distinguished Speaker from 2003 to 2005. He is a Fellow of IEEE and a
Columbia and the University of Missouri-Kansas Senior Member of ACM.
City, in 1999 and 2004, respectively. Currently, he
is a Professor and director of NEST Research Lab
at the Computer Science Department of Western
Michigan University. His research interests include
smart services in support of the Internet of Things,
Wireless Vehicular Networks (VANETs), coopera-
tion and spectrum access etiquettes in cognitive radio
networks, management and planning of software defined networks (SDN),
intelligent services for the blind and the visually impaired, QoS routing in
optical and wireless networks, and performance analysis and evaluation of
high-speed computer and telecommunication networks. He is a senior member
of the IEEE and has served as Technical Program Committee member and
reviewer of many international conferences and journals.

You might also like