0% found this document useful (0 votes)
98 views64 pages

DeepLearning Networking

This document provides a survey of research at the intersection of deep learning and mobile/wireless networking. It begins by introducing the rapid growth of mobile data and need for intelligent network architectures. Deep learning is well-suited for mobile networks due to its ability to extract information from heterogeneous mobile data sources and handle complex problems. The survey then reviews literature on applying deep learning techniques across different domains in mobile/wireless networking and discusses how to tailor deep learning models for these tasks. It concludes by identifying open challenges for future research.

Uploaded by

Anirudh M K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views64 pages

DeepLearning Networking

This document provides a survey of research at the intersection of deep learning and mobile/wireless networking. It begins by introducing the rapid growth of mobile data and need for intelligent network architectures. Deep learning is well-suited for mobile networks due to its ability to extract information from heterogeneous mobile data sources and handle complex problems. The survey then reviews literature on applying deep learning techniques across different domains in mobile/wireless networking and discusses how to tailor deep learning models for these tasks. It concludes by identifying open challenges for future research.

Uploaded by

Anirudh M K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

2224 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO.

3, THIRD QUARTER 2019

Deep Learning in Mobile and Wireless


Networking: A Survey
Chaoyun Zhang , Paul Patras , Senior Member, IEEE, and Hamed Haddadi

Abstract—The rapid uptake of mobile devices and the rising increasing demand, early efforts propose to agilely provi-
popularity of mobile applications and services pose unprece- sion resources [2] and tackle mobility management distribu-
dented demands on mobile and wireless networking infrastruc- tively [3]. In the long run, however, Internet Service Providers
ture. Upcoming 5G systems are evolving to support exploding
mobile traffic volumes, real-time extraction of fine-grained ana- (ISPs) must develop intelligent heterogeneous architectures
lytics, and agile management of network resources, so as to and tools that can spawn the 5th generation of mobile systems
maximize user experience. Fulfilling these tasks is challenging, (5G) and gradually meet more stringent end-user application
as mobile environments are increasingly complex, heterogeneous, requirements [4], [5].
and evolving. One potential solution is to resort to advanced The growing diversity and complexity of mobile network
machine learning techniques, in order to help manage the rise in
data volumes and algorithm-driven applications. The recent suc- architectures has made monitoring and managing the multi-
cess of deep learning underpins new and powerful tools that tude of network elements intractable. Therefore, embedding
tackle problems in this space. In this paper, we bridge the versatile machine intelligence into future mobile networks is
gap between deep learning and mobile and wireless networking drawing unparalleled research interest [6], [7]. This trend is
research, by presenting a comprehensive survey of the crossovers reflected in machine learning (ML) based solutions to prob-
between the two areas. We first briefly introduce essential back-
ground and state-of-the-art in deep learning techniques with lems ranging from radio access technology (RAT) selection [8]
potential applications to networking. We then discuss several to malware detection [9], as well as the development of
techniques and platforms that facilitate the efficient deployment networked systems that support machine learning practices
of deep learning onto mobile systems. Subsequently, we pro- (e.g., [10] and [11]). ML enables systematic mining of valu-
vide an encyclopedic review of mobile and wireless networking able information from traffic data and automatically uncover
research based on deep learning, which we categorize by different
domains. Drawing from our experience, we discuss how to tailor correlations that would otherwise have been too complex to
deep learning to mobile environments. We complete this survey extract by human experts [12]. As the flagship of machine
by pinpointing current challenges and open future directions for learning, deep learning has achieved remarkable performance
research. in areas such as computer vision [13] and natural language
Index Terms—Deep learning, machine learning, mobile processing (NLP) [14]. Networking researchers are also begin-
networking, wireless networking, mobile big data, 5G systems, ning to recognize the power and importance of deep learning,
network management. and are exploring its potential to solve problems specific to
the mobile networking domain [15], [16].
Embedding deep learning into the 5G mobile and wire-
I. I NTRODUCTION
less networks is well justified. In particular, data generated
NTERNET connected mobile devices are penetrating every
I aspect of individuals’ life, work, and entertainment. The
increasing number of smartphones and the emergence of ever-
by mobile environments are increasingly heterogeneous, as
these are usually collected from various sources, have dif-
ferent formats, and exhibit complex correlations [17]. As a
more diverse applications trigger a surge in mobile data traffic. consequence, a range of specific problems become too dif-
Indeed, the latest industry forecasts indicate that the annual ficult or impractical for traditional machine learning tools
worldwide IP traffic consumption will reach 3.3 zettabytes (e.g., shallow neural networks). This is because (i) their
(1015 MB) by 2021, with smartphone traffic exceeding PC performance does not improve if provided with more data [18]
traffic by the same year [1]. Given the shift in user pref- and (ii) they cannot handle highly dimensional state/action
erence towards wireless connectivity, current mobile infras- spaces in control problems [19]. In contrast, big data fuels
tructure faces great capacity demands. In response to this the performance of deep learning, as it eliminates domain
expertise and instead employs hierarchical feature extraction.
Manuscript received March 12, 2018; revised September 16, 2018
and January 29, 2019; accepted March 8, 2019. Date of publica- In essence this means information can be distilled effi-
tion March 13, 2019; date of current version August 20, 2019. ciently and increasingly abstract correlations can be obtained
(Corresponding author: Chaoyun Zhang.) from the data, while reducing the pre-processing effort.
C. Zhang and P. Patras are with the Institute for Computing Systems
Architecture, School of Informatics, University of Edinburgh, Edinburgh EH8 Graphics Processing Unit (GPU)-based parallel computing
9AB, U.K. (e-mail: chaoyun.zhang@ed.ac.uk; paul.patras@ed.ac.uk). further enables deep learning to make inferences within
H. Haddadi is with the Dyson School of Design Engineering, milliseconds. This facilitates network analysis and manage-
Imperial College London, London SW7 2AZ, U.K. (e-mail:
h.haddadi@imperial.ac.uk). ment with high accuracy and in a timely manner, over-
Digital Object Identifier 10.1109/COMST.2019.2904897 coming the run-time limitations of traditional mathematical
1553-877X  c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2225

TABLE I
techniques (e.g., convex optimization, game theory, meta L IST OF A BBREVIATIONS IN A LPHABETICAL O RDER
heuristics).
Despite growing interest in deep learning in the mobile
networking domain, existing contributions are scattered across
different research areas and a comprehensive survey is lacking.
This article fills this gap between deep learning and mobile
and wireless networking, by presenting an up-to-date survey of
research that lies at the intersection between these two fields.
Beyond reviewing the most relevant literature, we discuss
the key pros and cons of various deep learning architectures,
and outline deep learning model selection strategies, in view
of solving mobile networking problems. We further investi-
gate methods that tailor deep learning to individual mobile
networking tasks, to achieve the best performance in com-
plex environments. We wrap up this paper by pinpointing
future research directions and important problems that remain
unsolved and are worth pursing with deep neural networks.
Our ultimate goal is to provide a definite guide for networking
researchers and practitioners, who intend to employ deep
learning to solve problems of interest.
Survey Organization: We structure this article in a top-down
manner, as shown in Figure 1. We begin by discussing work
that gives a high-level overview of deep learning, future mobile
networks, and networking applications built using deep learn-
ing, which help define the scope and contributions of this paper
(Section II). Since deep learning techniques are relatively new
in the mobile networking community, we provide a basic deep
learning background in Section III, highlighting immediate
advantages in addressing mobile networking problems. There
exist many factors that enable implementing deep learning
for mobile networking applications (including dedicated deep
learning libraries, optimization algorithms, etc.). We discuss
these enablers in Section IV, aiming to help mobile network
researchers and engineers in choosing the right software and
hardware platforms for their deep learning deployments.
In Section V, we introduce and compare state-of-the-art
deep learning models and provide guidelines for model selec-
tion toward solving networking problems. In Section VI
we review recent deep learning applications to mobile and
wireless networking, which we group by different scenarios
ranging from mobile traffic analytics to security, and emerg-
ing applications. We then discuss how to tailor deep learning
models to mobile networking problems (Section VII) and con-
clude this article with a brief discussion of open challenges,
with a view to future research directions (Section VIII).1

II. R ELATED H IGH -L EVEL A RTICLES AND


THE S COPE OF T HIS S URVEY
Mobile networking and deep learning problems have been
researched mostly independently. Only recently crossovers
between the two areas have emerged. Several notable works
paint a comprehensives picture of the deep learning and/or
mobile networking research landscape. We categorize these
works into (i) pure overviews of deep learning techniques,
(ii) reviews of analyses and management techniques in modern
mobile networks, and (iii) reviews of works at the intersection between deep learning and computer networking. We summa-
rize these earlier efforts in Table II and in this section discuss
1 We list the abbreviations used throughout this paper in Table I. the most representative publications in each class.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2226 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

Fig. 1. Diagramatic view of the organization of this survey.

A. Overviews of Deep Learning and Its Applications remarkable performance of deep neural networks in differ-
The era of big data is triggering wide interest in deep ent control problem (e.g., video gaming, Go board game
learning across different research disciplines [28]–[31] and play, etc.). Similarly, deep reinforcement learning has also
a growing number of surveys and tutorials are emerging been surveyed in [77], where Li shed more light on appli-
(e.g., [23] and [24]). LeCun et al. [20] give a milestone cations. Zhang et al. survey developments in deep learning
overview of deep learning, introduce several popular mod- for recommender systems [32], which have potential to play
els, and look ahead at the potential of deep neural networks. an important role in mobile advertising. As deep learning
Schmidhuber [21] undertakes an encyclopedic survey of deep becomes increasingly popular, Goodfellow et al. [18] provide
learning, likely the most comprehensive thus far, covering the a comprehensive tutorial of deep learning in a book that cov-
evolution, methods, applications, and open research issues. ers prerequisite knowledge, underlying principles, and popular
Liu et al. [22] summarize the underlying principles of several applications.
deep learning models, and review deep learning developments
in selected applications, such as speech processing, pattern B. Surveys on Future Mobile Networks
recognition, and computer vision. The emerging 5G mobile networks incorporate a host
Arulkumaran et al. [26] present several architectures and of new techniques to overcome the performance limita-
core algorithms for deep reinforcement learning, including tions of current deployments and meet new application
deep Q-networks, trust region policy optimization, and asyn- requirements. Progress to date in this space has been sum-
chronous advantage actor-critic. Their survey highlights the marized through surveys, tutorials, and magazine papers

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2227

TABLE II
S UMMARY OF E XISTING S URVEYS , M AGAZINE PAPERS , AND B OOKS R ELATED TO D EEP L EARNING AND M OBILE N ETWORKING . T HE S YMBOL
 I NDICATES A P UBLICATION I S IN THE S COPE OF A D OMAIN ; ✗ M ARKS PAPERS T HAT D O N OT D IRECTLY C OVER T HAT A REA , B UT F ROM W HICH
R EADERS M AY R ETRIEVE S OME R ELATED I NSIGHTS . P UBLICATIONS R ELATED TO B OTH D EEP L EARNING AND M OBILE N ETWORKS A RE S HADED

(e.g., [4], [5], [38], [39], and [47]). Andrews et al. [38] high- techniques, and discuss research challenges facing future
light the differences between 5G and prior mobile network developments. Agiwal et al. [4] review new architectures
architectures, conduct a comprehensive review of 5G for 5G networks, survey emerging wireless technologies,

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2228 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE III
C ONTINUED F ROM TABLE II

and point out research problems that remain unsolved. Ahad et al. [68] introduce techniques, applications, and
Gupta and Jha [5] also review existing work on 5G cellu- guidelines on applying neural networks to wireless networking
lar network architectures, subsequently proposing a framework problems. Despite several limitations of neural networks iden-
that incorporates networking ingredients such as Device-to- tified, this article focuses largely on old neural networks
Device (D2D) communication, small cells, cloud computing, models, ignoring recent progress in deep learning and suc-
and the IoT. cessful applications in current mobile networks. Lane and
Intelligent mobile networking is becoming a popu- Georgiev [74] investigate the suitability and benefits of
lar research area and related work has been reviewed employing deep learning in mobile sensing, and emphasize
in [7], [34], [37], [54], and [56]–[59]. Jiang et al. [7] dis- on the potential for accurate inference on mobile devices.
cuss the potential of applying machine learning to 5G network Ota et al. report novel deep learning applications in mobile
applications including massive MIMO and smart grids. This multimedia. Their survey covers state-of-the-art deep learn-
work further identifies several research gaps between ML and ing practices in mobile health and wellbeing, mobile security,
5G that remain unexplored. Li et al. [58] discuss opportuni- mobile ambient intelligence, language translation, and speech
ties and challenges of incorporating artificial intelligence (AI) recognition. Mohammadi et al. [67] survey recent deep learn-
into future network architectures and highlight the significance ing techniques for Internet of Things (IoT) data analytics. They
of AI in the 5G era. Klaine et al. [57] present several suc- overview comprehensively existing efforts that incorporate
cessful ML practices in Self-Organizing Networks (SONs), deep learning into the IoT domain and shed light on current
discuss the pros and cons of different algorithms, and identify research challenges and future directions. Mao et al. [69] focus
future research directions in this area. Potential exists to apply on deep learning in wireless networking. Their work sur-
AI and exploit big data for energy efficiency purposes [53]. veys state-of-the-art deep learning applications in wireless
Chen et al. [52] survey traffic offloading approaches in networks, and discusses research challenges to be solved in
wireless networks, and propose a novel reinforcement learn- the future.
ing based solution. This opens a new research direction
toward embedding machine learning towards greening cellular D. Our Scope
networks.
The objective of this survey is to provide a comprehensive
view on state-of-the-art deep learning practices in the mobile
C. Deep Learning Driven Networking Applications networking area. By this we aim to answer the following key
A growing number of papers survey recent works that questions:
bring deep learning into the computer networking domain. 1) Why is deep learning promising for solving mobile
Alsheikh et al. [17] identify benefits and challenges of using networking problems?
big data for mobile analytics and propose a Spark based deep 2) What are the cutting-edge deep learning models relevant
learning framework for this purpose. Wang and Jones [63] to mobile and wireless networking?
discuss evaluation criteria, data streaming and deep learning 3) What are the most recent successful deep learning
practices for network intrusion detection, pointing out research applications in the mobile networking domain?
challenges inherent to such applications. Zheng et al. [6] put 4) How can researchers tailor deep learning to specific
forward a big data-driven mobile network optimization frame- mobile networking problems?
work in 5G networks, to enhance QoE performance. More 5) Which are the most important and promising directions
recently, Fadlullah et al. [66] deliver a survey on the progress worthy of further study?
of deep learning in a board range of areas, highlighting its The research papers and books we mentioned previously
potential application to network traffic control systems. Their only partially answer these questions. This article goes beyond
work also highlights several unsolved research issues worthy these previous works and specifically focuses on the crossovers
of future study. between deep learning and mobile networking. We cover a

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2229

range of neural network (NN) structures that are increas-


ingly important and have not been explicitly discussed in
earlier tutorials, e.g., [78]. This includes auto-encoders and
Generative Adversarial Networks. Unlike such existing tuto-
rials, we also review open-source libraries for deploying and
training neural networks, a range of optimization algorithms,
and the parallelization of neural networks models and train-
ing across large numbers of mobile devices. We also review
applications not looked at in other related surveys, including
traffic/user analytics, security and privacy, mobile health, etc.
While our main scope remains the mobile networking
domain, for completeness we also discuss deep learning appli-
cations to wireless networks, and identify emerging application
Fig. 2. Venn diagram of the relation between deep learning, machine learning,
domains intimately connected to these areas. We differentiate and AI. This survey particularly focuses on deep learning applications in
between mobile networking, which refers to scenarios where mobile and wireless networks.
devices are portable, battery powered, potentially wearable,
and routinely connected to cellular infrastructure, and wire-
less networking, where devices are mostly fixed, and part of processing units, in order to make predictions or take actions
a distributed infrastructure (including WLANs and WSNs), according to some target objective. The most well-known deep
and serve a single application. Overall, our paper distinguishes learning models are neural networks (NNs), but only NNs
itself from earlier surveys from the following perspectives: that have a sufficient number of hidden layers (usually more
(i) We particularly focus on deep learning applications than one) can be regarded as ‘deep’ models. Besides deep
for mobile network analysis and management, instead NNs, other architectures have multiple layers, such as deep
of broadly discussing deep learning methods (as, e.g., Gaussian processes [81], neural processes [82], and deep ran-
in [20] and [21]) or centering on a single application dom forests [83], and can also be regarded as deep learning
domain, e.g., mobile big data analysis with a specific structures. The major benefit of deep learning over traditional
platform [17]. ML is thus the automatic feature extraction, by which expen-
(ii) We discuss cutting-edge deep learning techniques from sive hand-crafted feature engineering can be circumvented. We
the perspective of mobile networks (e.g., [79] and [80]), illustrate the relation between deep learning, machine learning,
focusing on their applicability to this area, whilst giving and artificial intelligence (AI) at a high level in Fig. 2.
less attention to conventional deep learning models that In general, AI is a computation paradigm that endows
may be out-of-date. machines with intelligence, aiming to teach them how to work,
(iii) We analyze similarities between existing non- react, and learn like humans. Many techniques fall under this
networking problems and those specific to mobile broad umbrella, including machine learning, expert systems,
networks; based on this analysis we provide insights and evolutionary algorithms. Among these, machine learn-
into both best deep learning architecture selection ing enables the artificial processes to absorb knowledge from
strategies and adaptation approaches, so as to exploit data and make decisions without being explicitly programmed.
the characteristics of mobile networks for analysis and Machine learning algorithms are typically categorized into
management tasks. supervised, unsupervised, and reinforcement learning. Deep
To the best of our knowledge, this is the first time that learning is a family of machine learning techniques that mimic
mobile network analysis and management are jointly reviewed biological nervous systems and perform representation learn-
from a deep learning angle. We also provide for the first time ing through multi-layer transformations, extending across all
insights into how to tailor deep learning to mobile networking three learning paradigms mentioned before. As deep learn-
problems. ing has growing number of applications in mobile an wireless
networking, the crossovers between these domains make the
scope of this manuscript.
III. D EEP L EARNING 101
We begin with a brief introduction to deep learning, high- A. The Evolution of Deep Learning
lighting the basic principles behind computation techniques in The discipline traces its origins 75 years back, when
this field, as well as key advantages that lead to their suc- threshold logic was employed to produce a computational
cess. Deep learning is essentially a sub-branch of ML, which model for neural networks [84]. However, it was only in
essentially enables an algorithm to make predictions, classifi- the late 1980s that neural networks (NNs) gained interest,
cations, or decisions based on data, without being explicitly as Rumelhart et al. [85] showed that multi-layer NNs could
programmed. Classic examples include linear regression, the be trained effectively by back-propagating errors. LeCun and
k-nearest neighbors classifier, and Q-learning. In contrast to Bengio subsequently proposed the now popular Convolutional
traditional ML tools that rely heavily on features defined by Neural Network (CNN) architecture [86], but progress stalled
domain experts, deep learning algorithms hierarchically extract due to computing power limitations of systems available at
knowledge from raw data through multiple layers of nonlinear that time. Following the recent success of GPUs, CNNs have

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2230 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE IV
S UMMARY OF THE B ENEFITS OF A PPLYING D EEP L EARNING TO S OLVE
P ROBLEMS IN M OBILE AND W IRELESS N ETWORKS

Fig. 3. Illustration of the learning and inference processes of a 4-layer CNN.


w(·) denote weights of each hidden layer, σ(·) is an activation function, λ
refers to the learning rate, ∗(·) denotes the convolution operation and L(w )
is the loss function to be optimized.

been employed to dramatically reduce the error rate in the


Large Scale Visual Recognition Challenge (LSVRC) [87]. This
has drawn unprecedented interest in deep learning and break-
throughs continue to appear in a wide range of computer
science areas.

B. Fundamental Principles of Deep Learning


The key aim of deep neural networks is to approximate com- at improving the non-linearity and representability of the
plex functions through a composition of simple and predefined model. The output h1 is subsequently provided as input to and
operations of units (or neurons). Such an objective function processed by the following two convolutional layers, which
can be almost of any type, such as a mapping between images eventually produces a final output y. This could be for instance
and their class labels (classification), computing future stock vector of probabilities for different possible patterns (shapes)
prices based on historical values (regression), or even deciding discovered in the (image) input. To train the CNN appropri-
the next optimal chess move given the current status on the ately, one uses a loss function L(w ) to measure the distance
board (control). The operations performed are usually defined between the output y and the ground truth y∗ . The purpose
by a weighted combination of a specific group of hidden units of training is to find the best weights w, so as to minimize
with a non-linear activation function, depending on the struc- the loss function L(w ). This can be achieved by the back
ture of the model. Such operations along with the output units propagation through gradient descent.
are named “layers”. The neural network architecture resem- Backward Propagation: During backward propagation, one
bles the perception process in a brain, where a specific set of computes the gradient of the loss function L(w ) over the
units are activated given the current environment, influencing weight of the last hidden layer, and updates the weight by
the output of the neural network model. computing:
dL(w )
C. Forward and Backward Propagation w4 = w4 − λ . (2)
dw4
In mathematical terms, the architecture of deep neural
Here λ denotes the learning rate, which controls the step size
networks is usually differentiable, therefore the weights (or
of moving in the direction indicated by the gradient. The
parameters) of the model can be learned by minimizing a
same operation is performed for each weight, following the
loss function using gradient descent methods through back-
chain rule. The process is repeated and eventually the gradient
propagation, following the fundamental chain rule [85]. We
descent will lead to a set w that minimizes the L(w ).
illustrate the principles of the learning and inference processes
For other NN structures, the training and inference processes
of a deep neural network in Fig. 3, where we use a two-
are similar. To help less expert readers we detail the principles
dimensional (2D) Convolutional Neural Network (CNN) as
and computational details of various deep learning techniques
an example.
in Section V.
Forward Propagation: The figure shows a CNN with 5 lay-
ers, i.e., an input layer (grey), 3 hidden layers (blue) and an
output layer (orange). In forward propagation, A 2D input x D. Advantages of Deep Learning in Mobile and Wireless
(e.g., images) is first processed by a convolutional layer, which Networking
perform the following convolutional operation: We recognize several benefits of employing deep learning
to address network engineering problems, as summarized in
h1 = σ(w1 ∗ x). (1)
Table IV. Specifically:
Here h1 is the output of the first hidden layer, w1 is the con- 1) It is widely acknowledged that, while vital to the
volutional filter and σ(·) is the activation function, aiming performance of traditional ML algorithms, feature

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2231

engineering is costly [88]. A key advantage of deep


learning is that it can automatically extract high-level
features from data that has complex structure and inner
correlations. The learning process does not need to be
designed by a human, which tremendously simplifies
prior feature handcrafting [20]. The importance of this is
amplified in the context of mobile networks, as mobile Fig. 4. An example of an adversarial attack on deep learning.
data is usually generated by heterogeneous sources,
is often noisy, and exhibits non-trivial spatial/temporal
patterns [17], whose labeling would otherwise require these architectures has great potential to revolutionize
outstanding human effort. the geometric mobile data analysis [103].
2) Secondly, deep learning is capable of handling large
amounts of data. Mobile networks generate high vol- E. Limitations of Deep Learning in Mobile and Wireless
umes of different types of data at fast pace. Training Networking
traditional ML algorithms (e.g., Support Vector Machine Although deep learning has unique advantages when
(SVM) [89] and Gaussian Process (GP) [90]) some- addressing mobile network problems, it also has several short-
times requires to store all the data in memory, which comings, which partially restricts its applicability in this
is computationally infeasible under big data scenarios. domain. Specifically,
Furthermore, the performance of ML does not grow 1) In general, deep learning (including deep reinforce-
significantly with large volumes of data and plateaus rel- ment learning) is vulnerable to adversarial exam-
atively fast [18]. In contrast, Stochastic Gradient Descent ples [104], [105]. These refer to artifact inputs that are
(SGD) employed to train NNs only requires sub-sets of intentionally designed by an attacker to fool machine
data at each training step, which guarantees deep learn- learning models into making mistakes [104]. While it
ing’s scalability with big data. Deep neural networks is difficult to distinguish such samples from genuine
further benefit as training with big data prevents model ones, they can trigger mis-adjustments of a model with
over-fitting. high likelihood. We illustrate an example of such an
3) Traditional supervised learning is only effective when adversarial attack in Fig. 4. Deep learning, especially
sufficient labeled data is available. However, most CNNs are vulnerable to these types of attacks. This may
current mobile systems generate unlabeled or semi- also affect the applicability of deep learning in mobile
labeled data [17]. Deep learning provides a variety systems. For instance, hackers may exploit this vulner-
of methods that allow exploiting unlabeled data to ability and construct cyber attacks that subvert deep
learn useful patterns in an unsupervised manner, e.g., learning based detectors [106]. Constructing deep mod-
Restricted Boltzmann Machine (RBM) [91], Generative els that are robust to adversarial examples is imperative,
Adversarial Network (GAN) [92]. Applications include but remains challenging.
clustering [93], data distributions approximation [92], 2) Deep learning algorithms are largely black boxes and
un/semi-supervised learning [94], [95], and one/zero have low interpretability. Their major breakthroughs
shot learning [96], [97], among others. are in terms of accuracy, as they significantly improve
4) Compressive representations learned by deep neural performance of many tasks in different areas. However,
networks can be shared across different tasks, while although deep learning enables creating “machines” that
this is limited or difficult to achieve in other ML have high accuracy in specific tasks, we still have limited
paradigms (e.g., linear regression, random forest, etc.). knowledge as of why NNs make certain decisions. This
Therefore, a single model can be trained to ful- limits the applicability of deep learning, e.g., in network
fill multiple objectives, without requiring complete economics. Therefore, businesses would rather continue
model retraining for different tasks. We argue that to employ statistical methods that have high inter-
this is essential for mobile network engineering, as pretability, whilst sacrificing on accuracy. Researchers
it reduces computational and memory requirements of have recognized this problem and investing continu-
mobile systems when performing multi-task learning ous efforts to address this limitation of deep learning
applications [98]. (e.g., [107]–[109]).
5) Deep learning is effective in handing geometric mobile 3) Deep learning is heavily reliant on data, which some-
data [99], while this is a conundrum for other times can be more important than the model itself. Deep
ML approaches. Geometric data refers to multivari- models can further benefit from training data augmen-
ate data represented by coordinates, topology, metrics tation [110]. This is indeed an opportunity for mobile
and order [100]. Mobile data, such as mobile user networking, as networks generates tremendous amounts
location and network connectivity can be naturally repre- of data. However, data collection may be costly, and face
sented by point clouds and graphs, which have important privacy concern, therefore it may be difficult to obtain
geometric properties. These data can be effectively mod- sufficient information for model training. In such sce-
elled by dedicated deep learning architectures, such as narios, the benefits of employing deep learning may be
PointNet++ [101] and Graph CNN [102]. Employing outweigth by the costs.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2232 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE V
S UMMARY OF T OOLS AND T ECHNIQUES T HAT E NABLE D EPLOYING D EEP L EARNING IN M OBILE S YSTEMS

4) Deep learning can be computationally demanding.


Advanced parallel computing (e.g., GPUs, high-
performance chips) fostered the development and popu-
larity of deep learning, yet deep learning also heavily
relies on these. Deep NNs usually require complex
structures to obtain satisfactory accuracy performance.
However, when deploying NNs on embedded and mobile
devices, energy and capability constraints have to be
considered. Very deep NNs may not be suitable for
such scenario and this would inevitably compromise
accuracy. Solutions are being developed to mitigate
this problem and we will dive deeper into these in
Sections IV and VII.
5) Deep neural networks usually have many hyper-
parameters and finding their optimal configuration can
be difficult. For a single convolutional layer, we need
to configure at least hyper-parameters for the number,
shape, stride, and dilation of filters, as well as for Fig. 5. Hierarchical view of deep learning enablers. Parallel computing
the residual connections. The number of such hyper- and hardware in fog computing lay foundations for deep learning. Distributed
parameters grows exponentially with the depth of the machine learning systems can build upon them, to support large-scale deploy-
ment of deep learning. Deep learning libraries run at the software level, to
model and can highly influence its performance. Finding enable fast deep learning implementation. Higher-level optimizers are used to
a good set of hyper-parameters can be similar to look- train the NN, to fulfill specific objectives.
ing for a needle in a haystack. The AutoML platform2
provides a first solution to this problem, by employing
progressive neural architecture search [111]. This task, systems, so as to meet these objectives is expensive. This is
however, remains costly. because powerful hardware and software is required to support
To circumvent some of the aforementioned problems and training and inference in complex settings. Fortunately, sev-
allow for effective deployment in mobile networks, deep learn- eral tools are emerging, which make deep learning in mobile
ing requires certain system and software support. We review networks tangible; namely, (i) advanced parallel computing,
and discuss such enablers in the next section. (ii) distributed machine learning systems, (iii) dedicated deep
learning libraries, (iv) fast optimization algorithms, and (v) fog
computing. These tools can be seen as forming a hierar-
IV. E NABLING D EEP L EARNING IN M OBILE N ETWORKING
chical structure, as illustrated in Fig. 5; synergies between
5G systems seek to provide high throughput and ultra-low them exist that makes networking problem amenable to deep
latency communication services, to improve users’ QoE [4]. learning based solutions. By employing these tools, once the
Implementing deep learning to build intelligence into 5G training is completed, inferences can be made within millisec-
2 AutoML ond timescales, as already reported by a number of papers
– training high-quality custom machine learning
models with minimum effort and machine learning expertise. for a range of tasks (e.g., [112]–[114]). We summarize these
https://cloud.google.com/automl/. advances in Table V and review them in what follows.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2233

A. Advanced Parallel Computing is impractical to move all mobile data to a central data center
Compared to traditional machine learning models, deep to run deep learning applications [10]. Running network-wide
neural networks have significantly larger parameters spaces, deep learning algorithms would therefore require distributed
intermediate outputs, and number of gradient values. Each of machine learning systems that support different interfaces
these need to be updated during every training step, requiring (e.g., operating systems, programming language, libraries), so
powerful computation resources. The training and inference as to enable training and evaluation of deep models across
processes involve huge amounts of matrix multiplications and geographically distributed servers simultaneously, with high
other operations, though they could be massively parallelized. efficiency and low overhead.
Traditional Central Processing Units (CPUs) have a limited Deploying deep learning in a distributed fashion will
number of cores, thus they only support restricted computing inevitably introduce several system-level problems, which
parallelism. Employing CPUs for deep learning implementa- require satisfying the following properties.
tions is highly inefficient and will not satisfy the low-latency Consistency – Guaranteeing that model parameters and
requirements of mobile systems. computational processes are consistent across all machines.
Engineers address these issues by exploiting the power of Fault tolerance – Effectively dealing with equipment break-
GPUs. GPUs were originally designed for high performance downs in large-scale distributed machine learning systems.
video games and graphical rendering, but new techniques Communication – Optimizing communication between
such as Compute Unified Device Architecture (CUDA) [116] nodes in a cluster and to avoid congestion.
and the CUDA Deep Neural Network library (cuDNN) [117] Storage – Designing efficient storage mechanisms tailored
developed by NVIDIA add flexibility to this type of hard- to different environments (e.g., distributed clusters, single
ware, allowing users to customize their usage for specific machines, GPUs), given I/O and data processing diversity.
purposes. GPUs usually incorporate thousand of cores and Resource management – Assigning workloads and ensuring
perform exceptionally in fast matrix multiplications required that nodes work well-coordinated.
for training neural networks. This provides higher memory Programming model – Designing programming interfaces
bandwidth over CPUs and dramatically speeds up the learn- to support multiple programming languages.
ing process. Recent advanced Tensor Processing Units (TPUs) There exist several distributed machine learning systems
developed by Google even demonstrate 15-30× higher pro- that facilitate deep learning in mobile networking applica-
cessing speeds and 30-80× higher performance-per-watt, as tions. Kraska et al. [129] introduce a distributed system
compared to CPUs and GPUs [115]. named MLbase, which enables to intelligently specify, select,
Diffractive neural networks (D2 NNs) that completely rely optimize, and parallelize ML algorithms. Their system helps
on light communication were recently introduced in [132], to non-experts deploy a wide range of ML methods, allowing
enable zero-consumption and zero-delay deep learning. The optimization and running ML applications across different
D2 NN is composed of several transmissive layers, where servers. Hsieh et al. [10] develop a geography-distributed ML
points on these layers act as neurons in a NN. The structure system called Gaia, which breaks the throughput bottleneck by
is trained to optimize the transmission/reflection coefficients, employing an advanced communication mechanism over Wide
which are equivalent to weights in a NN. Once trained, trans- Area Networks, while preserving the accuracy of ML algo-
missive layers will be materialized via 3D printing and they rithms. Their proposal supports versatile ML interfaces (e.g.,
can subsequently be used for inference. TensorFlow, Caffe), without requiring significant changes to
There are also a number of toolboxes that can assist the the ML algorithm itself. This system enables deployments of
computational optimization of deep learning on the server side. complex deep learning applications over large-scale mobile
Spring and Shrivastava [133] introduce a hashing based tech- networks.
nique that substantially reduces computation requirements of Xing et al. [135] develop a large-scale machine learning
deep network implementations. Mirhoseini et al. employ a platform to support big data applications. Their architecture
reinforcement learning scheme to enable machines to learn achieves efficient model and data parallelization, enabling
the optimal operation placement over mixture hardware for parameter state synchronization with low communication cost.
deep neural networks. Their solution achieves up to 20% Xiao et al. [11] propose a distributed graph engine for ML
faster computation speed than human experts’ designs of such named TUX2 , to support data layout optimization across
placements [134]. machines and reduce cross-machine communication. They
Importantly, these systems are easy to deploy, therefore demonstrate remarkable performance in terms of runtime and
mobile network engineers do not need to rebuild mobile convergence on a large dataset with up to 64 billion edges.
servers from scratch to support deep learning computing. This Chilimbi et al. [130] build a distributed, efficient, and scalable
makes implementing deep learning in mobile systems feasible system named “Adam”3 tailored to the training of deep mod-
and accelerates the processing of mobile data streams. els. Their architecture demonstrates impressive performance
in terms of throughput, delay, and fault tolerance. Another
dedicated distributed deep learning system called GeePS is
B. Distributed Machine Learning Systems developed by Cui et al. [131]. Their framework allows data
Mobile data is collected from heterogeneous sources (e.g.,
mobile devices, network probes, etc.), and stored in multiple 3 Note that this is distinct from the Adam optimizer discussed in
distributed data centers. With the increase of data volumes, it Section IV-D.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2234 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE VI
S UMMARY AND C OMPARISON OF M AINSTREAM D EEP L EARNING L IBRARIES

parallelization on distributed GPUs, and demonstrates higher TensorBoard,5 a sophisticated visualization tool, to help users
training throughput and faster convergence rate. More recently, understand model structures and data flows, and perform
Moritz et al. [136] designed a dedicated distributed framework debugging. Detailed documentation and tutorials for Python
named Ray to underpin reinforcement learning applications. exist, while other programming languages such as C, Java, and
Their framework is supported by an dynamic task execu- Go are also supported. currently it is the most popular deep
tion engine, which incorporates the actor and task-parallel learning library. Building upon TensorFlow, several dedicated
abstractions. They further introduce a bottom-up distributed deep learning toolboxes were released to provide higher-level
scheduling strategy and a dedicated state storage scheme, to programming interfaces, including Keras,6 Luminoth7 and
improve scalability and fault tolerance. TensorLayer [138].
Theano is a Python library that allows to efficiently
define, optimize, and evaluate numerical computations involv-
C. Dedicated Deep Learning Libraries
ing multi-dimensional data [119]. It provides both GPU and
Building a deep learning model from scratch can prove CPU modes, which enables users to tailor their programs to
complicated to engineers, as this requires definitions of for- individual machines. Learning Theano is however difficult and
warding behaviors and gradient propagation operations at each building a NNs with it involves substantial compiling time.
layer, in addition to CUDA coding for GPU parallelization. Though Theano has a large user base and a support com-
With the growing popularity of deep learning, several dedi- munity, and at some stage was one of the most popular deep
cated libraries simplify this process. Most of these toolboxes learning tools, its popularity is decreasing rapidly, as core ideas
work with multiple programming languages, and are built and attributes are absorbed by TensorFlow.
with GPU acceleration and automatic differentiation support. Caffe(2) is a dedicated deep learning framework developed
This eliminates the need of hand-crafted definition of gradient by Berkeley AI Research [120] and the latest version, Caffe2,8
propagation. We summarize these libraries below, and give a was recently released by Facebook. Inheriting all the advan-
comparison among them in Table VI. tages of the old version, Caffe2 has become a very flexible
TensorFlow4 is a machine learning library developed by framework that enables users to build their models efficiently.
Google [118]. It enables deploying computation graphs on It also allows to train neural networks on multiple GPUs within
CPUs, GPUs, and even mobile devices [137], allowing ML distributed systems, and supports deep learning implementa-
implementation on both single and distributed architectures. tions on mobile operating systems, such as iOS and Android.
This permits fast implementation of deep NNs on both cloud
and fog services. Although originally designed for ML and 5 TensorBoard – A visualization tool for TensorFlow,
deep neural networks applications, TensorFlow is also suit- https://www.tensorflow.org/guide/summaries_and_tensorboard.
6 Keras deep learning library, https://github.com/fchollet/keras.
able for other data-driven research purposes. It provides 7 Luminoth deep learning library for computer vision,
https://github.com/tryolabs/luminoth.
4 TensorFlow, https://www.tensorflow.org/. 8 Caffe2, https://caffe2.ai/.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2235

Therefore, it has the potential to play an important role in the these optimizers and make a comparison between them in
future mobile edge computing. Table VII. We delve into the details of their operation next.
(Py)Torch is a scientific computing framework with wide Fixed Learning Rate SGD Algorithms: Sutskever
support for machine learning models and algorithms [121]. It et al. [126] introduce a variant of the SGD optimizer
was originally developed in the Lua language, but developers with Nesterov’s momentum, which evaluates gradients after
later released an improved Python version [139]. In essence the current velocity is applied. Their method demonstrates
PyTorch is a lightweight toolbox that can run on embedded faster convergence rate when optimizing convex functions.
systems such as smart phones, but lacks comprehensive docu- Another approach is Adagrad, which performs adaptive learn-
mentations. Since building NNs in PyTorch is straightforward, ing to model parameters according to their update frequency.
the popularity of this library is growing rapidly. It also offers This is suitable for handling sparse data and significantly
rich pretrained models and modules that are easy to reuse and outperforms SGD in terms of robustness [127]. Adadelta
combine. PyTorch is now officially maintained by Facebook improves the traditional Adagrad algorithm, enabling it to
and mainly employed for research purposes. converge faster, and does not rely on a global learning
MXNET is a flexible and scalable deep learning library rate [142]. RMSprop is a popular SGD based method
that provides interfaces for multiple languages (e.g., C++, introduced by G. Hinton. RMSprop divides the learning rate
Python, MATLAB, R, etc.) [140]. It supports different levels by an exponential smoothing the average of gradients and
of machine learning models, from logistic regression to GANs. does not require one to set the learning rate for each training
MXNET provides fast numerical computation for both single step [141].
machine and distributed ecosystems. It wraps workflows com- Adaptive Learning Rate SGD Algorithms: Kingma and
monly used in deep learning into high-level functions, such Ba [128] propose an adaptive learning rate optimizer named
that standard neural networks can be easily constructed with- Adam, which incorporates momentum by the first-order
out substantial coding effort. However, learning how to work moment of the gradient. This algorithm is fast in terms of
with this toolbox in short time frame is difficult, hence the convergence, highly robust to model structures, and is consid-
number of users who prefer this library is relatively small. ered as the first choice if one cannot decide what algorithm
MXNET is the official deep learning framework in Amazon. to use. By incorporating the momentum into Adam, Nadam
Although less popular, there are other excellent deep learn- applies stronger constraints to the gradients, which enables
ing libraries, such as CNTK,9 Deeplearning4j,10 Blocks,11 faster convergence [143].
Gluon,12 and Lasagne,13 which can also be employed in Other Optimizers: Andrychowicz et al. [144] suggest that
mobile systems. Selecting among these varies according to the optimization process can be even learned dynamically.
specific applications. For AI beginners who intend to employ They pose the gradient descent as a trainable learning problem,
deep learning for the networking domain, PyTorch is a good which demonstrates good generalization ability in neural
candidate, as it is easy to build neural networks in this envi- network training. Wen et al. [147] propose a training algorithm
ronment and the library is well optimized for GPUs. On the tailored to distributed systems. They quantize float gradi-
other hand, if for people who pursue advanced operations ent values to {−1, 0 and +1} in the training processing,
and large-scale implementation, Tensorflow might be a bet- which theoretically require 20 times less gradient commu-
ter choice, as it is well-established, under good maintainance nications between nodes. Szegedy et al. prove that such
and has standed the test of many Google industrial projects. gradient approximation mechanism allows the objective func-
tion to converge to optima with probability 1, where in their
experiments only a 2% accuracy loss is observed on aver-
D. Fast Optimization Algorithms
age on GoogleLeNet [145] training. Zhou et al. [146] employ
The objective functions to be optimized in deep learn- a differential private mechanism to compare training and
ing are usually complex, as they involve sums of extremely validation gradients, to reuse samples and keep them
large numbers of data-wise likelihood functions. As the depth fresh. This can dramatically reduce overfitting during
of the model increases, such functions usually exhibit high training.
non-convexity with multiple local minima, critical points, and
saddle points. In this case, conventional Stochastic Gradient
Descent (SGD) algorithms [141] are slow in terms of con- E. Fog Computing
vergence, which will restrict their applicability to latency The fog computing paradigm presents a new opportunity to
constrained mobile systems. To overcome this problem and implement deep learning in mobile systems. Fog computing
stabilize the optimization process, many algorithms evolve the refers to a set of techniques that permit deploying applica-
traditional SGD, allowing NN models to be trained faster for tions or data storage at the edge of networks [148], e.g., on
mobile applications. We summarize the key principles behind individual mobile devices. This reduces the communications
overhead, offloads data traffic, reduces user-side latency, and
9 MS Cognitive Toolkit, https://www.microsoft.com/en-us/cognitive-toolkit/.
lightens the sever-side computational burdens [149], [150]. A
10 Deeplearning4j, http://deeplearning4j.org.
11 Blocks, A Theano framework for building and training neural networks
formal definition of fog computing is given in [151], where
https://github.com/mila-udem/blocks. this is interpreted as ‘a huge number of heterogeneous (wire-
12 Gluon, A deep learning library https://gluon.mxnet.io/. less and sometimes autonomous) ubiquitous and decentralized
13 Lasagne, https://github.com/Lasagne. devices [that] communicate and potentially cooperate among

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2236 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE VII
S UMMARY AND C OMPARISON OF D IFFERENT O PTIMIZATION A LGORITHMS

them and with the network to perform storage and process- called TrueNorth is proposed by IBM [156]. Their solu-
ing tasks without the intervention of third parties.’ To be more tion seeks to support computationally intensive applications
concrete, it can refer to smart phones, wearables devices and on embedded battery-powered mobile devices. Qualcomm
vehicles which store, analyze and exchange data, to offload introduces a Snapdragon neural processing engine to enable
the burden from cloud and perform more delay-sensitive deep learning computational optimization tailored to mobile
tasks [152], [153]. Since fog computing involves deployment devices.14 Their hardware allows developers to execute neu-
at the edge, participating devices usually have limited comput- ral network models on Snapdragon 820 boards to serve a
ing resource and battery power. Therefore, special hardware variety of applications. In close collaboration with Google,
and software are required for deep learning implementation, Movidius15 develops an embedded neural network computing
as we explain next. framework that allows user-customized deep learning deploy-
Hardware: There exist several efforts that attempt to shift ments at the edge of mobile networks. Their products can
deep learning computing from the cloud side to mobile achieve satisfying runtime efficiency, while operating with
devices [154]. For example, Gokhale et al. [122] develop ultra-low power requirements. It further supports difference
a mobile coprocessor named neural network neXt (nn-X),
which accelerates the deep neural networks execution in 14 Qualcomm Helps Make Your Mobile Devices Smarter With
mobile devices, while retaining low energy consumption. New Snapdragon Machine Learning Software Development Kit:
Bang et al. [155] introduce a low-power and programmable https://www.qualcomm.com/news/releases/2016/05/02/qualcomm-helps-
deep learning processor to deploy mobile intelligence on make-your-mobile-devices-smarter-new-snapdragon-machine.
15 Movidius, an Intel company, provides cutting edge solutions for deploy-
edge devices. Their hardware only consumes 288 μW but ing deep learning and computer vision algorithms on ultra-low power devices.
achieves 374 GOPS/W efficiency. A Neurosynaptic Chip https://www.movidius.com/.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2237

TABLE VIII
C OMPARISON OF M OBILE D EEP L EARNING P LATFORM

frameworks, such as TensorFlow and Caffe, providing users neural networks. The more experienced can continue read-
with flexibility in choosing among toolkits. More recently, ing with Section VI. We illustrate and summarize the most
Huawei officially announced the Kirin 970 as a mobile AI salient architectures that we present in Fig. 6 and Table IX,
computing system on chip.16 Their innovative framework respectively.
incorporates dedicated Neural Processing Units (NPUs), which
dramatically accelerates neural network computing, enabling A. Multilayer Perceptron
classification of 2,000 images per second on mobile devices. The Multilayer Perceptrons (MLPs) is the initial Artificial
Software: Beyond these hardware advances, there are also Neural Network (ANN) design, which consists of at least three
software platforms that seek to optimize deep learning on layers of operations [174]. Units in each layer are densely
mobile devices (e.g., [157]). We compare and summarize all connected, hence require to configure a substantial number of
these platforms in Table VIII.17 In addition to the mobile ver- weights. We show an MLP with two hidden layers in Fig. 6(a).
sion of TensorFlow and Caffe, Tencent released a lightweight, Note that usually only MLPs containing more than one hidden
high-performance neural network inference framework tai- layer are regarded as deep learning structures.
lored to mobile platforms, which relies on CPU computing.18 Given an input vector x, a standard MLP layer performs the
This toolbox performs better than all known CPU-based open following operation:
source frameworks in terms of inference speed. Apple has
developed “Core ML”, a private ML framework to facilitate y = σ(W · x + b). (3)
mobile deep learning implementation on iOS 11.19 This lowers Here y denotes the output of the layer, W are the weights
the entry barrier for developers wishing to deploy ML models and b the biases. σ(·) is an activation function, which aims
on Apple equipment. Yao et al. develop a deep learning frame- at improving the non-linearity of the model. Commonly used
work called DeepSense dedicated to mobile sensing related activation function are the sigmoid,
data processing, which provides a general machine learning
1
toolbox that accommodates a wide range of edge applica- sigmoid(x) = ,
tions. It has moderate energy consumption and low latency, 1 + e −x
thus being amenable to deployment on smartphones. the Rectified Linear Unit (ReLU) [175],
The techniques and toolboxes mentioned above make the ReLU(x) = max(x, 0),
deployment of deep learning practices in mobile network
applications feasible. In what follows, we briefly introduce tanh,
several representative deep learning architectures and discuss e x − e −x
their applicability to mobile networking problems. tanh(x) = x ,
e + e −x
and the Scaled Exponential Linear Units (SELUs) [176],

V. D EEP L EARNING : S TATE - OF - THE -A RT x, if x > 0;
SELU(x) = λ
Revisiting Fig. 2, machine learning methods can be natu- αe x − α, if x ≤ 0,
rally categorized into three classes, namely supervised learn- where the parameters λ = 1.0507 and α = 1.6733 are fre-
ing, unsupervised learning, and reinforcement learning. Deep quently used. In addition, the softmax function is typically
learning architectures have achieved remarkable performance employed in the last layer when performing classification:
in all these areas. In this section, we introduce the key prin- e xi
ciples underpinning several deep learning models and discuss softmax(xi ) = k ,
their largely unexplored potential to solve mobile networking e xk
j =0
problems. Technical details of classical models are provided where k is the number of labels involved in classification. Until
to readers who seek to obtain a deeper understanding of recently, sigmoid and tanh have been the activation func-
tions most widely used. However, they suffer from a known
16 Huawei announces the Kirin 970 – new flagship SoC with AI capabilities gradient vanishing problem, which hinders gradient propaga-
http://www.androidauthority.com/huawei-announces-kirin-970-797788/. tion through layers. Therefore these functions are increasingly
17 Adapted from https://mp.weixin.qq.com/s/3gTp1kqkiGwdq5olrpOvKw.
more often replaced by ReLU or SELU. SELU enables to nor-
18 ncnn is a high-performance neural network inference framework opti-
mized for the mobile platform, https://github.com/Tencent/ncnn.
malize the output of each layer, which dramatically accelerates
19 Core ML: Integrate machine learning models into your app, the training convergence, and can be viewed as a replacement
https://developer.apple.com/documentation/coreml. of Batch Normalization [177].

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2238 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Fig. 6. Typical structure and operation principles of MLP, RBM, AE, CNN, RNN, LSTM, GAN, and DRL. (a) Structure of an MLP with 2 hidden layers
(blue circles). (b) Graphical model and training process of an RBM. v and h denote visible and hidden variables, respectively. (c) Operating principle of an
auto-encoder, which seeks to reconstruct the input from the hidden layer. (d) Operating principle of a convolutional layer. (e) Recurrent layer – x1:t is the
input sequence, indexed by time t, st denotes the state vector and ht the hidden outputs. (f) The inner structure of an LSTM layer. (g) Underlying principle
of a generative adversarial network (GAN). (h) Typical deep reinforcement learning architecture. The agent is a neural network model that approximates the
required function.

The MLP can be employed for supervised, unsupervised, popularity is decreasing because it entails high complexity
and even reinforcement learning purposes. Although this struc- (fully-connected structure), modest performance, and low con-
ture was the most popular neural network in the past, its vergence efficiency. MLPs are mostly used as a baseline or

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2239

TABLE IX
S UMMARY OF D IFFERENT D EEP L EARNING A RCHITECTURES . GAN AND DRL A RE S HADED , S INCE T HEY A RE B UILT U PON OTHER M ODELS

integrated into more complex architectures (e.g., the final layer where h, v are the hidden and visible units respectively, and
in CNNs used for classification). Building an MLP is straight- W are weights and a, b are biases. The visible units are con-
forward, and it can be employed, e.g., to assist with feature ditional independent to the hidden units, and vice versa. A
extraction in models built for specific objectives in mobile typical structure of an RBM is shown in Fig. 6(b). In general,
network applications. The advanced Adaptive learning of neu- input data are assigned to visible units v. Hidden units h are
ral Network (AdaNet) enables MLPs to dynamically train their invisible and they fully connect to all v through weights W,
structures to adapt to the input [158]. This new architecture can which is similar to a standard feed forward neural network.
be potentially explored for analyzing continuously changing However, unlike in MLPs where only the input vector can
mobile environments. affect the hidden units, with RBMs the state of v can affect
the state of h, and vice versa.
B. Boltzmann Machine RBMs can be effectively trained using the contrastive
divergence algorithm [178] through multiple steps of Gibbs
Restricted Boltzmann Machines (RBMs) [91] were origi-
sampling [179]. We illustrate the structure and the training pro-
nally designed for unsupervised learning purposes. They are
cess of an RBM in Fig. 6(b). RBM-based models are usually
essentially a type of energy-based undirected graphical mod-
employed to initialize the weights of a neural network in more
els, and include a visible layer and a hidden layer, and where
recent applications. The pre-trained model can be subsequently
each unit can only assume binary values (i.e., 0 and 1). The
fine-tuned for supervised learning purposes using a standard
probabilities of these values are given by:
back-propagation algorithm. A stack of RBMs is called a Deep
  1 Belief Network (DBN) [159], which performs layer-wise train-
P hj = 1|v =
1 + e −W·v+bj ing and achieves superior performance as compared to MLPs
  1 in many applications, including time series forecasting [180],
P vj = 1|h = , ratio matching [181], and speech recognition [182]. Such
1 + e −W ·h+aj
T

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2240 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

structures can be even extended to a convolutional architecture, RGB representation of images). A convolutional layer employs
to learn hierarchical spatial representations [160]. multiple filters shared across different locations, to “scan” the
inputs and produce output maps. In general, if the inputs and
C. Auto-Encoders outputs have M and N filters respectively, the convolutional
layer will require M × N filters to perform the convolution
Auto-Encoders (AEs) are also designed for unsupervised
operation.
learning and attempt to copy inputs to outputs. The underlying
CNNs improve traditional MLPs by leveraging three impor-
principle of an AE is shown in Fig. 6(c). AEs are frequently
tant ideas, namely, (i) sparse interactions, (ii) parameter
used to learn compact representation of data for dimension
sharing, and (iii) equivariant representations [18]. This reduces
reduction [183]. Extended versions can be further employed to
the number of model parameters significantly and maintains
initialize the weights of a deep architecture, e.g., the Denoising
the affine invariance (i.e., recognition results are robust to the
Auto-Encoder (DAE) [161]), and generate virtual examples
affine transformation of objects). Specifically, The sparse inter-
from a target data distribution, e.g., Variational Auto-Encoders
actions imply that the weight kernel has smaller size than the
(VAEs) [162].
input. It performs moving filtering to produce outputs (with
A VAE typically comprises two neural networks – an
roughly the same size as the inputs) for the current layer.
encoder and a decoder. The input of the encoder is a data
Parameter sharing refers to employing the same kernel to scan
point x (e.g., images) and its functionality is to encode this
the whole input map. This significantly reduces the number of
input into a latent representation space z. Let fΘ (z|x) be an
parameters needed, which mitigates the risk of over-fitting.
encoder parameterized by Θ and z is sampled from a Gaussian
Equivariant representations indicate that convolution opera-
distribution, the objective of the encoder is to output the mean
tions are invariant in terms of translation, scale, and shape.
and variance of the Gaussian distribution. Similarly, denot-
This is particularly useful for image processing, since essen-
ing gΩ (x|z) the decoder parameterized by Ω, this accepts the
tial features may show up at different locations in the image,
latent representation z as input, and outputs the parameter of
with various affine patterns.
the distribution of x. The objective of the VAE is to minimize
Owing to the properties mentioned above, CNNs
the reconstruction error of the data and the Kullback-Leibler
achieve remarkable performance in imaging applications.
(KL) divergence between p(z) and fΘ (z|x). Once trained, the
Krizhevsky et al. [87] exploit a CNN to classify images on
VAE can generate new data point samples by (i) drawing
the ImageNet dataset [192]. Their method reduces the top-5
latent variables zi ∼ p(z) and (ii) drawing a new data point
error by 39.7% and revolutionizes the imaging classification
xi ∼ p(x|z).
field. GoogLeNet [145] and ResNet [163] significantly
AEs can be employed to address network security problems,
increase the depth of CNN structures, and propose incep-
as several research papers confirm their effectiveness in detect-
tion and residual learning techniques to address problems
ing anomalies under different circumstances [184]–[186],
such as over-fitting and gradient vanishing introduced by
which we will further discuss in Section VI-H. The structures
“depth”. Their structure is further improved by the Dense
of RBMs and AEs are based upon MLPs, CNNs or RNNs.
Convolutional Network (DenseNet) [165], which reuses
Their goals are similar, while their learning processes are dif-
feature maps from each layer, thereby achieving significant
ferent. Both can be exploited to extract patterns from unlabeled
accuracy improvements over other CNN based models, while
mobile data, which may be subsequently employed for various
requiring fewer layers. CNNs have also been extended to
supervised learning tasks, e.g., routing [187], mobile activity
video applications. Ji et al. propose 3D convolutional neural
recognition [188], [189], periocular verification [190] and base
networks for video activity recognition [164], demonstrating
station user number prediction [191].
superior accuracy as compared to 2D CNN. More recent
research focuses on learning the shape of convolutional
D. Convolutional Neural Network kernels [193]–[195]. These dynamic architectures allow to
Instead of employing full connections between layers, automatically focus on important regions in input maps. Such
Convolutional Neural Networks (CNNs or ConvNets) employ properties are particularly important in analyzing large-scale
a set of locally connected kernels (filters) to capture correla- mobile environments exhibiting clustering behaviors (e.g.,
tions between different data regions. Mathematically, for each surge of mobile traffic associated with a popular event).
location p y of the output y, the standard convolution performs Given the high similarity between image and spatial mobile
the following operation: data (e.g., mobile traffic snapshots, users’ mobility, etc.),
     CNN-based models have huge potential for network-wide
y py = w(p G ) · x p y + p G , (4)
mobile data analysis. This is a promising future direction that
p G ∈G
we further discuss in Section VIII.
where p G denotes all positions in the receptive field G of
the convolutional filter W, effectively representing the recep-
tive range of each neuron to inputs in a convolutional layer. E. Recurrent Neural Network
Here the weights W are shared across different locations of Recurrent Neural Networks (RNNs) are designed for
the input map. We illustrate the operation of one 2D convolu- modeling sequential data, where sequential correlations exist
tional layer in Fig. 6(d). Specifically, the inputs of a 2D CNN between samples. At each time step, they produce output via
layer are multiple 2D matrices with different channels (e.g., the recurrent connections between hidden units [18], as shown in

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2241

Fig. 6(e). Given a sequence of inputs x = {x1 , x2 , . . . , xT }, a Algorithm 1 Typical GAN Training Algorithm
standard RNN performs the following operations: 1: Inputs:
Batch size m.
st = σs (Wx xt + Ws st−1 + bs ) The number of steps for the discriminator K.
ht = σh (Wh st + bh ), Learning rate λ and an optimizer Opt(·)
Noise vector z ∼ pg (z ).
where st represents the state of the network at time t and it
Target data set x ∼ pdata (x ).
constructs a memory unit for the network. Its values are com- 2: Initialise:
puted by a function of the input xt and previous state st−1 . ht Generative and discriminative models, G and
is the output of the network at time t. In natural language pro- D , parameterized by ΘG and ΘD .
cessing applications, this usually represents a language vector 3: while ΘG and ΘD have not converged do
and becomes the input at t + 1 after being processed by an 4: for k = 1 to K do
embedding layer. The weights Wx , Wh and biases bs , bh are 5: Sample m-element noise vector {z (1) , · · · , z (m) }
shared across different temporal locations. This reduces the from the noise prior pg (z )
model complexity and the degree of over-fitting. 6: Sample m data points {x (1) , · · · , x (m) } from the
The RNN is trained via a Backpropagation Through Time target data distribution pdata (x )
(BPTT) algorithm. However, gradient vanishing and exploding 7: gD ← ΔΘD [ m 1 m log D (x (i) )+
1 m log(1 − D (G (z (i) )))].
i=1
problems are frequently reported in traditional RNNs, which +m i=1
make them particularly hard to train [196]. The Long Short- 8: ΘD ← ΘD + λ · Opt(ΘD , gD ).
Term Memory (LSTM) mitigates these issues by introducing 9: end for
a set of “gates” [166], which has been proven successful in 10: Sample m-element noise vector {z (1) , · · · , z (m) }
many applications (e.g., speech recognition [197], text cate- from the noise prior pg (z )
gorization [198], and wearable activity recognition [113]). A
11: gG ← m 1 m log(1 − D (G (z (i) )))
standard LSTM performs the following operations: i=1
12: ΘG ← ΘG − λ · Opt(ΘG , gG ).
it = σ(Wxi Xt + Whi Ht−1 + bi ), 13: end while
 
ft = σ Wxf Xt + Whf Ht−1 + bf ,
Ct = ft  Ct−1 + it  tanh(Wxc Xt + Whc Ht−1 + bc ),
rather than the output of G [92]. Both of G and D are nor-
ot = σ(Wxo Xt + Who Ht−1 + bo ), mally neural networks. The training procedure for G aims to
ht = ot  tanh(Ct ). maximize the probability of D making a mistake. The overall
objective is solving the following minimax problem [92]:
Here, ‘’ denotes the Hadamard product, Ct denotes the cell
outputs, ht are the hidden states, it , ft , and ot are input gates, min max Ex ∼Pr (x ) [log D (x )] + Ez ∼Pn (z ) [log(1 − D (G (z )))].
forget gates, and output gates, respectively. These gates mit- G D
igate the gradient issues and significantly improve the RNN. Algorithm 1 shows the typical routine used to train a simple
We illustrated the structure of an LSTM in Fig. 6(f). GAN. Both the generators and the discriminator are trained
Sutskever et al. [167] introduce attention mechanisms to iteratively while fixing the other one. Finally G can produce
RNNs, which achieves outstanding accuracy in tokenized data close to a target distribution (the same with training exam-
predictions. Xingjian et al. [168] substitute the dense matrix ples), if the model converges. We show the overall structure of
multiplication in LSTMs with convolution operations, design- a GAN in Fig. 6(g). In practice, the generator G takes a noise
ing a Convolutional Long Short-Term Memory (ConvLSTM). vector z as input, and generates an output G (z ) that follows
Their proposal reduces the complexity of traditional LSTM the target distribution. D will try to discriminate whether G (z )
and demonstrates significantly lower prediction errors in is a real sample or an artifact [199]. This effectively constructs
precipitation nowcasting (i.e., forecasting the volume of a dynamic game, for which a Nash Equilibrium is reached if
precipitation). both G and D become optimal, and G can produce lifelike data
Mobile networks produce massive sequential data from var- that D can no longer discriminate, i.e., D (G (z )) = 0.5, ∀z .
ious sources, such as data traffic flows, and the evolution of The training process of traditional GANs is highly sensitive
mobile network subscribers’ trajectories and application laten- to model structures, learning rates, and other hyper-parameters.
cies. Exploring the RNN family is promising to enhance the Researchers are usually required to employ numerous ad
analysis of time series data in mobile networks. hoc ‘tricks’ to achieve convergence and improve the fidelity
of data generated. There exist several solutions for mitigat-
F. Generative Adversarial Network ing this problem, e.g., Wasserstein Generative Adversarial
The Generative Adversarial Network (GAN) is a framework Network (WGAN) [80], Loss-Sensitive Generative Adversarial
that trains generative models using the following adversarial Network (LS-GAN) [169] and BigGAN [170], but research on
process. It simultaneously trains two models: a generative one the theory of GANs remains shallow. Recent work confirms
G that seeks to approximate the target data distribution from that GANs can promote the performance of some supervised
training data, and a discriminative model D that estimates the tasks (e.g., super-resolution [200], object detection [201], and
probability that a sample comes from the real training data face completion [202]) by minimizing the divergence between

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2242 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

inferred and real data distributions. Exploiting the unsuper- Many mobile networking problems can be formulated as
vised learning abilities of GANs is promising in terms of Markov Decision Processes (MDPs), where reinforcement
generating synthetic mobile data for simulations, or assisting learning can play an important role (e.g., base station on-
specific supervised tasks in mobile network applications. This off switching strategies [209], routing [210], and adaptive
becomes more important in tasks where appropriate datasets tracking control [211]). Some of these problems nevertheless
are lacking, given that operators are generally reluctant to involve high-dimensional inputs, which limits the applica-
share their network data. bility of traditional reinforcement learning algorithms. DRL
techniques broaden the ability of traditional reinforcement
learning algorithms to handle high dimensionality, in scenar-
G. Deep Reinforcement Learning ios previously considered intractable. Employing DRL is thus
Deep Reinforcement Learning (DRL) refers to a set of meth- promising to address network management and control prob-
ods that approximate value functions (deep Q learning) or lems under complex, changeable, and heterogeneous mobile
policy functions (policy gradient method) through deep neu- environments. We further discuss this potential in Section VIII.
ral networks. An agent (neural network) continuously interacts
with an environment and receives reward signals as feedback.
The agent selects an action at each step, which will change VI. D EEP L EARNING D RIVEN M OBILE
the state of the environment. The training goal of the neu- AND W IRELESS N ETWORKS
ral network is to optimize its parameters, such that it can Deep learning has a wide range of applications in mobile
select actions that potentially lead to the best future return. and wireless networks. In what follows, we present the
We illustrate this principle in Fig. 6(h). DRL is well-suited to most important research contributions across different mobile
problems that have a huge number of possible states (i.e., envi- networking areas and compare their design and principles. In
ronments are high-dimensional). Representative DRL methods particular, we first discuss a key prerequisite, that of mobile
include Deep Q-Networks (DQNs) [19], deep policy gradient big data, then organize the review of relevant works into nine
methods [171], Asynchronous Advantage Actor-Critic [79], subsections, focusing on specific domains where deep learning
Rainbow [172] and Distributed Proximal Policy Optimization has made advances. Specifically,
(DPPO) [173]. These perform remarkably in AI gaming (e.g., 1) Deep Learning Driven Network-Level Mobile Data
Gym20 ), robotics, and autonomous driving [204]–[207], and Analysis focuses on deep learning applications built on
have made inspiring deep learning breakthroughs recently. mobile big data collected within the network, including
In particular, the DQN [19] is first proposed by DeepMind network prediction, traffic classification, and Call Detail
to play Atari video games. However, traditional DQN requires Record (CDR) mining.
several important adjustments to work well. The A3C [79] 2) Deep Learning Driven App-Level Mobile Data Analysis
employs an actor-critic mechanism, where the actor selects shifts the attention towards mobile data analytics on edge
the action given the state of the environment, and the critic devices.
estimates the value given the state and the action, then 3) Deep Learning Driven User Mobility Analysis sheds
delivers feedback to the actor. The A3C deploys different light on the benefits of employing deep neural networks
actors and critics on different threads of a CPU to break to understand the movement patterns of mobile users,
the dependency of data. This significantly improves train- either at group or individual levels.
ing convergence, enabling fast training of DRL agents on 4) Deep Learning Driven User Localization reviews lit-
CPUs. Rainbow [172] combines different variants of DQNs, erature that employ deep neural networks to localize
and discovers that these are complementary to some extent. users in indoor or outdoor environments, based on dif-
This insight improved performance in many Atari games. ferent signals received from mobile devices or wireless
To solve the step size problem in policy gradients methods, channels.
Schulman et al. [173] propose a Distributed Proximal Policy 5) Deep Learning Driven Wireless Sensor Networks dis-
Optimization (DPPO) method to constrain the update step of cusses important work on deep learning applications in
new policies, and implement this on multi-threaded CPUs in a WSNs from four different perspectives, namely central-
distributed manner. Based on this method, an agent developed ized vs. decentralized sensing, WSN data analysis, WSN
by OpenAI defeated a human expert in Dota2 team in a 5v5 localization and other applications.
match.21 Recent DRL method also conquers a more complex 6) Deep Learning Driven Network Control investigate the
real-time multi-agent game StarCraft II.22 In [208], DeepMind usage of deep reinforcement learning and deep imitation
develops an game agent based on supervised learning and learning on network optimization, routing, scheduling,
DRL named AlphaStar, beating one of the world’s strongest resource allocation, and radio control.
professional StarCraft players by 5-0. 7) Deep Learning Driven Network Security presents work
that leverages deep learning to improve network security,
20 Gym is a toolkit for developing and comparing reinforcement learning which we cluster by focus as infrastructure, software,
algorithms. It supports teaching agents everything from walking to playing and privacy related.
games like Pong or Pinball. In combination with the NS3 simulator Gym
becomes applicable to networking research [203] https://gym.openai.com/. 8) Deep Learning Driven Signal Processing scrutinizes
21 Dota2 is a popular multiplayer online battle arena video game. physical layer aspects that benefit from deep learning
22 StarCraft II is a popular multi-agent real-time strategy game. and reviews relevant work on signal processing.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2243

TABLE X
T HE TAXONOMY OF M OBILE B IG DATA

Fig. 7. Classification of the literature reviewed in Section VI. holistic review of deep learning driven mobile data analysis
research.
Yazti and Krishnaswamy [467] propose to categorize mobile
9) Emerging Deep Learning Driven Mobile Network data into two groups, namely network-level data and app-level
Application warps up this section, presenting other data. The key difference between them is that in the former
interesting deep learning applications in mobile data is usually collected by the edge mobile devices, while
networking. in the latter obtained throughout network infrastructure. We
For each domain, we summarize work broadly in tabular form, summarize these two types of data and their information com-
providing readers with a general picture of individual topics. prised in Table X. Before delving into mobile data analytics,
Most important works in each domain are discussed in more we illustrate the typical data collection process in Figure 9.
details in text. Lessons learned are also discussed at the end Network-level mobile data generated by the networking
of each subsection. We give a diagramatic view of the topics infrastructure not only deliver a global view of mobile network
dealt with by the literature reviewed in this section in Fig. 7. performance (e.g., throughput, end-to-end delay, jitter, etc.),
but also log individual session times, communication types,
sender and receiver information, through Call Detail Records
A. Mobile Big Data as a Prerequisite (CDRs). Network-level data usually exhibit significant spatio-
The development of mobile technology (e.g., smartphones, temporal variations resulting from users’ behaviors [468],
augmented reality, etc.) are forcing mobile operators to evolve which can be utilized for network diagnosis and manage-
mobile network infrastructures. As a consequence, both the ment, user mobility analysis and public transportation plan-
cloud and edge side of mobile networks are becoming increas- ning [218]. Some network-level data (e.g., mobile traffic
ingly sophisticated to cater for users who produce and con- snapshots) can be viewed as pictures taken by ‘panoramic
sume huge amounts of mobile data daily. These data can cameras’, which provide a city-scale sensing system for urban
be either generated by the sensors of mobile devices that sensing.
record individual user behaviors, or from the mobile network On the other hand, App-level data is directly recorded by
infrastructure, which reflects dynamics in urban environments. sensors or mobile applications installed in various mobile
Appropriately mining these data can benefit multidisciplinary devices. These data are frequently collected through crowd-
research fields and the industry in areas such mobile network sourcing schemes from heterogeneous sources, such as
management, social analysis, public transportation, personal Global Positioning Systems (GPS), mobile cameras and video
services provision, and so on [36]. Network operators, how- recorders, and portable medical monitors. Mobile devices act
ever, could become overwhelmed when managing and ana- as sensor hubs, which are responsible for data gathering and
lyzing massive amounts of heterogeneous mobile data [466]. preprocessing, and subsequently distributing such data to spe-
Deep learning is probably the most powerful methodology cific locations, as required [36]. We show a typical app-level
that can overcoming this burden. We begin therefore by intro- data processing system in Fig. 8. App-level mobile data is gen-
ducing characteristics of mobile big data, then present a erated and collected by a Software Development Kit (SDK)

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2244 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

Fig. 8. Typical pipeline of an app-level mobile data processing system.

installed on mobile devices. After pre-processing and load- Analytical Processing – OLAP33 ), and data warehousing (e.g.,
balancing (e.g., Nginx23 ), Such data is subsequently processed Hive34 ). Among these, the algorithms container is the core of
by real-time collection and computing services (e.g., Storm,24 the entire system as it connects to front-end access and fog
Kafka,25 HBase,26 Redis,27 etc.) as required. Further offline computing, real-time collection and computing, and offline
storage and computing with mobile data can be performed computing and analysis modules, while it links directly to
with various tools, such as Hadoop Distribute File System mobile applications, such as mobile healthcare, pattern recog-
(HDFS),28 Python, Mahout,29 Pig,30 or Oozie.31 The raw data nition, and advertising platforms. Deep learning logic can be
and analysis results will be further transferred to databases placed within the algorithms container.
(e.g., MySQL32 ) Business Intelligence – BI (e.g., Online App-level data may directly or indirectly reflect users’
behaviors, such as mobility, preferences, and social links [61].
Analyzing app-level data from individuals can help recon-
23 Nginx is an HTTP and reverse proxy server, a mail proxy server, and a
structing one’s personality and preferences, which can be
generic TCP/UDP proxy server, https://nginx.org/en/. used in recommender systems and users targeted advertising.
24 Storm is a free and open-source distributed real-time computation system,
http://storm.apache.org/. Some of these data comprise explicit information about indi-
25 Kafka is used for building real-time data pipelines and streaming apps, viduals’ identities. Inappropriate sharing and use can raise
https://kafka.apache.org/. significant privacy issues. Therefore, extracting useful pat-
26 Apache HBase is the Hadoop database, a distributed, scalable, big data
terns from multi-modal sensing devices without compromising
store, https://hbase.apache.org/.
27 Redis is an open source, in-memory data structure store, used as a user’s privacy remains a challenging endeavor.
database, cache and message broker, https://redis.io/. Compared to traditional data analysis techniques, deep
28 The Hadoop Distributed File System (HDFS) is a dis-
learning embraces several unique features to address the
tributed file system designed to run on commodity hardware, aforementioned challenges [17]. Namely:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
29 Apache Mahout is a distributed linear algebra framework, 1) Deep learning achieves remarkable performance in
https://mahout.apache.org/. various data analysis tasks, on both structured and
30 Apache Pig is a high-level platform for creating programs that run on
Apache Hadoop, https://pig.apache.org/.
31 Oozie is a workflow scheduler system to manage Apache Hadoop jobs, 33 OLAP is an approach to answer multi-dimensional analytical queries
http://oozie.apache.org/. swiftly in computing, and is part of the broader category of business
32 MySQL is the open source database, https://www.oracle.com/ intelligence.
technetwork/database/mysql/index.html. 34 The Apache Hive is a data warehouse software, https://hive.apache.org/.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2245

metadata, network performance indicators and call detail


records (CDRs) (see Table XI). The recent remarkable success
of deep learning ignites global interests in exploiting this
methodology for mobile network-level data analysis, so as to
optimize mobile networks configurations, thereby improving
end-uses’ QoE. These work can be categorized into four
types: network state prediction, network traffic classification,
CDR mining and radio analysis. In what follows, we review
work in these directions, which we first summarize and
compare in Table XI.
Network State Prediction refers to inferring mobile network
traffic or performance indicators, given historical measure-
ments or related data. Pierucci and Micheli [212] investigate
the relationship between key objective metrics and QoE. They
employ MLPs to predict users’ QoE in mobile communica-
tions, based on average user throughput, number of active
users in a cells, average data volume per user, and chan-
nel quality indicators, demonstrating high prediction accuracy.
Network traffic forecasting is another field where deep learning
is gaining importance. By leveraging sparse coding and max-
pooling, Gwon and Kung [213] develop a semi-supervised
deep learning model to classify received frame/packet patterns
and infer the original properties of flows in a WiFi network.
Their proposal demonstrates superior performance over tradi-
tional ML techniques. Nie et al. [214] investigate the traffic
demand patterns in wireless mesh network. They design a
DBN along with Gaussian models to precisely estimate traffic
distributions.
In addition to the above, several researchers employ deep
learning to forecast mobile traffic at city scale, by considering
spatio-temporal correlations of geographic mobile traffic mea-
surements. We illustrate the underlying principle in Fig. 10.
Fig. 9. Illustration of the mobile data collection process in cellular, WiFi Wang et al.[216] propose to use an AE-based architecture and
and wireless sensor networks. BSC: Base Station Controller; RNC: Radio LSTMs to model spatial and temporal correlations of mobile
Network Controller.
traffic distribution, respectively. In particular, the authors use
a global and multiple local stacked AEs for spatial fea-
unstructured data. Some types of mobile data can be
ture extraction, dimension reduction and training parallelism.
represented as image-like (e.g., [218]) or sequential
Compressed representations extracted are subsequently pro-
data [226].
cessed by LSTMs, to perform final forecasting. Experiments
2) Deep learning performs remarkably well in feature
with a real-world dataset demonstrate superior performance
extraction from raw data. This saves tremendous effort
over SVM and the Autoregressive Integrated Moving Average
of hand-crafted feature engineering, which allows spend-
(ARIMA) model. The work in [217] extends mobile traf-
ing more time on model design and less on sorting
fic forecasting to long time frames. The authors combine
through the data itself.
ConvLSTMs and 3D CNNs to construct spatio-temporal
3) Deep learning offers excellent tools (e.g., RBM, AE,
neural networks that capture the complex spatio-temporal
GAN) for handing unlabeled data, which is common in
features at city scale. They further introduce a fine-tuning
mobile network logs.
scheme and lightweight approach to blend predictions with
4) Multi-modal deep learning allows to learn features over
historical means, which significantly extends the length of
multiple modalities [469], which makes it powerful
reliable prediction steps. Deep learning was also employed
in modeling with data collected from heterogeneous
in [78], [219], [235], and [470], where the authors employ
sensors and data sources.
CNNs and LSTMs to perform mobile traffic forecasting. By
These advantages make deep learning as a powerful tool for
effectively extracting spatio-temporal features, their proposals
mobile data analysis.
gain significantly higher accuracy than traditional approaches,
such as ARIMA. Wang et al. [103] represent spatio-temporal
B. Deep Learning Driven Network-Level Mobile Data dependencies in mobile traffic using graphs, and learn such
Analysis dependencies using Graph Neural Networks. Beyond the
Network-level mobile data refers broadly to logs recorded accurate inference achieved in their study, this work also
by Internet service providers, including infrastructure demonstrates potential for precise social events inference.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2246 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE XI
A S UMMARY OF W ORK ON N ETWORK -L EVEL M OBILE DATA A NALYSIS

More recently, Zhang et al. [218] propose an original coarse-grained counterparts obtained by probing, thereby
Mobile Traffic Super-Resolution (MTSR) technique to infer reducing traffic measurement overheads. We illustrate the prin-
network-wide fine-grained mobile traffic consumption given ciple of MTSR in Fig. 11. Inspired by image super-resolution

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2247

recently, Aceto et al. [471] employ MLPs, CNNs, and LSTMs


to perform encrypted mobile traffic classification, arguing that
deep NNs can automatically extract complex features present
in mobile traffic. As reflected by their results, deep learning
based solutions obtain superior accuracy over RFs in classify-
ing Android, IOS and Facebook traffic. CNNs have also been
used to identify malware traffic, where work in [225] regards
Fig. 10. The underlying principle of city-scale mobile traffic forecasting. The
traffic data as images and unusual patterns that malware traf-
deep learning predictor takes as input a sequence of mobile traffic measure- fic exhibit are classified by representation learning. Similar
ments in a region (snapshots t − s to t), and forecasts how much mobile traffic work on mobile malware detection will be further discussed
will be consumed in the same areas in the future t + 1 to t + n instances.
in Section VI-H.
CDR Mining involves extracting knowledge from specific
instances of telecommunication transactions such as phone
number, cell ID, session start/end time, traffic consumption,
etc. Using deep learning to mine useful information from
CDR data can serve a variety of functions. For example,
Liang et al. [226] propose Mercury to estimate metro den-
sity from streaming CDR data, using RNNs. They take the
trajectory of a mobile phone user as a sequence of locations;
RNN-based models work well in handling such sequential
data. Likewise, Felbo et al. [227] use CDR data to study
demographics. They employ a CNN to predict the age and
gender of mobile users, demonstrating the superior accu-
racy of these structures over other ML tools. More recently,
Chen et al. [228] compare different ML models to predict
tourists’ next locations of visit by analyzing CDR data. Their
experiments suggest that RNN-based predictors significantly
outperform traditional ML methods, including Naive Bayes,
SVM, RF, and MLP.
Lessons Learned: Network-level mobile data, such as
Fig. 11. Illustration of the image super-resolution (SR) principle (above) and mobile traffic, usually involves essential spatio-temporal corre-
the mobile traffic super-resolution (MTSR) technique (below). Figure adapted
from [218]. lations. These correlations can be effectively learned by CNNs
and RNNs, as they are specialized in modeling spatial and tem-
poral data (e.g., images, traffic series). An important observa-
techniques, they design a dedicated CNN with multiple skip tion is that large-scale mobile network traffic can be processed
connections between layers, named deep zipper network, along as sequential snapshots, as suggested in [217] and [218], which
with a Generative Adversarial Network (GAN) to perform resemble images and videos. Therefore, potential exists to
precise MTSR and improve the fidelity of inferred traffic exploit image processing techniques for network-level analy-
snapshots. Experiments with a real-world dataset show that sis. Techniques previously used for imaging usually, however,
this architecture can improve the granularity of mobile traffic cannot be directly employed with mobile data. Efforts must
measurements over a city by up to 100×, while significantly be made to adapt them to the particularities of the mobile
outperforming other interpolation techniques. networking domain. We expand on this future research direc-
Traffic Classification is aimed at identifying specific appli- tion in Section VIII-B.
cations or protocols among the traffic in networks. Wang [222] On the other hand, although deep learning brings precision
recognizes the powerful feature learning ability of deep neu- in network-level mobile data analysis, making causal inference
ral networks and uses a deep AE to identify protocols in remains challenging, due to limited model interpretability. For
a TCP flow dataset, achieving excellent precision and recall example, a NN may predict there will be a traffic surge in a
rates. Work in [223] proposes to use a 1D CNN for encrypted certain region in the near future, but it is hard to explain why
traffic classification. The authors suggest that this struc- this will happen and what triggers such a surge. Additional
ture works well for modeling sequential data and has lower efforts are required to enable explanation and confident deci-
complexity, thus being promising in addressing the traffic clas- sion making. At this stage, the community should rather use
sification problem. Similarly, Lotfollahi et al. [224] present deep learning algorithms as intelligent assistants that can make
Deep Packet, which is based on a CNN, for encrypted traf- accurate inferences and reduce human effort, instead of relying
fic classification. Their framework reduces the amount of exclusively on these.
hand-crafted feature engineering and achieves great accu-
racy. An improved stacked AE is employed in [234], where C. Deep Learning Driven App-Level Mobile Data Analysis
Li et al. incorporate Bayesian methods into AEs to enhance Triggered by the increasing popularity of Internet of Things
the inference accuracy in network traffic classification. More (IoT), current mobile devices bundle increasing numbers of

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2248 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

Fig. 12. Illustration of two deployment approaches for app-level mobile data analysis, namely cloud-based (left) and edge-based (right). The cloud-based
approach makes inference on clouds and send results to edge devices. On the contrary, the edge-based approach deploys models on edge devices which can
make local inference.

applications and sensors that can collect massive amounts of latency. In contrast, in the edge-based computing scenario
app-level mobile data [472]. Employing artificial intelligence pre-trained models are offloaded from the cloud to individual
to extract useful information from these data can extend the mobile devices, such that they can make inferences locally. As
capability of devices [75], [473], [474], thus greatly benefit- illustrated in the right part of Fig. 12, this scenario typically
ing users themselves, mobile operators, and indirectly device consists of the following: (i) servers use offline datasets to per-
manufacturers. Analysis of mobile data therefore becomes train a model; (ii) the pre-trained model is offloaded to edge
an important and popular research direction in the mobile devices; (iii) mobile devices perform inferences locally using
networking domain. Nonetheless, mobile devices usually oper- the model; (iv) cloud servers accept data from local devices;
ate in noisy, uncertain and unstable environments, where their (v) the model is updated using these data whenever necessary.
users move fast and change their location and activity con- While this scenario requires less interactions with the cloud,
texts frequently. As a result, app-level mobile data analysis its applicability is limited by the computing and battery capa-
becomes difficult for traditional machine learning tools, which bilities of edge hardware. Therefore, it can only support tasks
performs relatively poorly. Advanced deep learning practices that require light computations.
provide a powerful solution for app-level data mining, as Many researchers employ deep learning for app-level
they demonstrate better precision and higher robustness in IoT mobile data analysis. We group the works reviewed according
applications [475]. to their application domains, namely mobile healthcare, mobile
There exist two approaches to app-level mobile data anal- pattern recognition, and mobile Natural Language Processing
ysis, namely (i) cloud-based computing and (ii) edge-based (NLP) and Automatic Speech Recognition (ASR). Table XII
computing. We illustrate the difference between these scenar- gives a high-level summary of existing research efforts and we
ios in Fig. 12. As shown in the left part of the figure, the cloud- discuss representative work next.
based computing treats mobile devices as data collectors and Mobile Health: There is an increasing variety of wearable
messengers that constantly send data to cloud servers, via local health monitoring devices being introduced to the market. By
points of access with limited data preprocessing capabilities. incorporating medical sensors, these devices can capture the
This scenario typically includes the following steps: (i) users physical conditions of their carriers and provide real-time feed-
query on/interact with local mobile devices; (ii) queries are back (e.g., heart rate, blood pressure, breath status etc.), or
transmitted to severs in the cloud; (iii) servers gather the data trigger alarms to remind users of taking medical actions [476].
received for model training and inference; (iv) query results Liu and Du [237] design a deep learning-driven MobiEar
are subsequently sent back to each device, or stored and ana- to aid deaf people’s awareness of emergencies. Their proposal
lyzed without further dissemination, depending on specific accepts acoustic signals as input, allowing users to register dif-
application requirements. The drawback of this scenario is ferent acoustic events of interest. MobiEar operates efficiently
that constantly sending and receiving messages to/from servers on smart phones and only requires infrequent communications
over the Internet introduces overhead and may result in severe with servers for updates. Likewise, Sicong et al. [238] develop
a UbiEar, which is operated on the Android platform to
34 Human profile source: https://lekeart.deviantart.com/art/male-body- assist hard-to-hear sufferers in recognizing acoustic events,
profile-251793336. without requiring location information. Their design adopts

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2249

TABLE XII
A S UMMARY OF W ORKS ON A PP -L EVEL M OBILE DATA A NALYSIS

a lightweight CNN architecture for inference acceleration Hosseini et al. [243] design an edge computing system
and demonstrates comparable accuracy over traditional CNN for health monitoring and treatment. They use CNNs to
models. extract features from mobile sensor data, which plays an

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2250 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

important role in their epileptogenicity localization applica- servers for model training and the model will be subsequently
tion. Stamate et al. [244] develop a mobile Android app called deployed for domain-specific tasks.
cloudUPDRS to manage Parkinson’s symptoms. In their work, Essential features of sensor data can be automatically
MLPs are employed to determine the acceptance of data col- extracted by neural networks. The first work in this space that
lected by smart phones, to maintain high-quality data samples. is based on deep learning employs a CNN to capture local
The proposed method outperforms other ML methods such as dependencies and preserve scale invariance in motion sensor
GPs and RFs. Quisel et al. [245] suggest that deep learn- data [254]. The authors evaluate their proposal on 3 offline
ing can be effectively used for mobile health data analysis. datasets, demonstrating their proposal yields higher accuracy
They exploit CNNs and RNNs to classify lifestyle and environ- over statistical methods and Principal Components Analysis
mental traits of volunteers. Their models demonstrate superior (PCA). Almaslukh et al. [255] employ a deep AE to perform
prediction accuracy over RFs and logistic regression, over six human activity recognition by analyzing an offline smart phone
datasets. dataset gathered from accelerometers and gyroscope sensors.
As deep learning performs remarkably in medical data Li et al. [256] consider different scenarios for activity recogni-
analysis [477], we expect more and more deep learning pow- tion. In their implementation, Radio Frequency Identification
ered health care devices will emerge to improve physical (RFID) data is directly sent to a CNN model for recogniz-
monitoring and illness diagnosis. ing human activities. While their mechanism achieves high
Mobile Pattern Recognition: Recent advanced mobile accuracy in different applications, experiments suggest that the
devices offer people a portable intelligent assistant, which RFID-based method does not work well with metal objects or
fosters a diverse set of applications that can classify surround- liquid containers.
ing objects (e.g., [247]–[249], and [252]) or users’ behaviors Reference [257] exploits an RBM to predict human activi-
(e.g., [113], [254], [257], [263], [264], [478], and [479]) based ties, given 7 types of sensor data collected by a smart watch.
on patterns observed in the output of the mobile camera or Experiments on prototype devices show that this approach
other sensors. We review and compare recent works on mobile can efficiently fulfill the recognition objective under tolera-
pattern recognition in this part. ble power requirements. Ordóñez and Roggen [113] architect
Object classification in pictures taken by mobile devices is an advanced ConvLSTM to fuse data gathered from multiple
drawing increasing research interest. Li et al. [247] develop sensors and perform activity recognition. By leveraging
DeepCham as a mobile object recognition framework. Their CNN and LSTM structures, ConvLSTMs can automatically
architecture involves a crowd-sourcing labeling process, which compress spatio-temporal sensor data into low-dimensional
aims to reduce the hand-labeling effort, and a collaborative representations, without heavy data post-processing effort.
training instance generation pipeline that is built for deploy- Wang et al. [259] exploit Google Soli to architect a
ment on mobile devices. Evaluations of the prototype system mobile user-machine interaction platform. By analyzing radio
suggest that this framework is efficient and effective in terms frequency signals captured by millimeter-wave radars, their
of training and inference. Tobías et al. [248] investigate the architecture is able to recognize 11 types of gestures with
applicability of employing CNN schemes on mobile devices high accuracy. Their models are trained on the server side,
for objection recognition tasks. They conduct experiments on and inferences are performed locally on mobile devices. More
three different model deployment scenarios, i.e., on GPU, recently, Zhao et al. [289] design a 4D CNN framework (3D
CPU, and respectively on mobile devices, with two benchmark for the spatial dimension + 1D for the temporal dimension)
datasets. The results obtained suggest that deep learning mod- to reconstruct human skeletons using radio frequency signals.
els can be efficiently embedded in mobile devices to perform This novel approach resembles virtual “X-ray”, enabling to
real-time inference. accurately estimate human poses, without requiring an actual
Mobile classifiers can also assist Virtual Reality (VR) appli- camera.
cations. A CNN framework is proposed in [252] for facial Mobile NLP and ASR: Recent remarkable achievements
expressions recognition when users are wearing head-mounted obtained by deep learning in Natural Language Processing
displays in the VR environment. Rao et al. [253] incorporate a (NLP) and Automatic Speech Recognition (ASR) are also
deep learning object detector into a mobile augmented reality embraced by applications for mobile devices.
(AR) system. Their system achieves outstanding performance Powered by deep learning, the intelligent personal assis-
in detecting and enhancing geographic objects in outdoor envi- tant Siri, developed by Apple, employs a deep mixture density
ronments. Further work focusing on mobile AR applications networks [482] to fix typical robotic voice issues and synthe-
is introduced in [480], where Ran et al. characterize the trade- size more human-like voice [279]. An Android app released
offs between accuracy, latency, and energy efficiency of object by Google supports mobile personalized speech recogni-
detection. tion [280]; this quantizes the parameters in LSTM model
Activity recognition is another interesting area that relies on compression, allowing the app to run on low-power mobile
data collected by mobile motion sensors [479], [481]. This phones. Likewise, Prabhavalkar et al. [281] propose a math-
refers to the ability to classify based on data collected via, ematical RNN compression technique that reduces two thirds
e.g., video capture, accelerometer readings, motion – Passive of an LSTM acoustic model size, while only compromising
Infra-Red (PIR) sensing, specific actions and activities that a negligible accuracy. This allows building both memory- and
human subject performs. Data collected will be delivered to energy-efficient ASR applications on mobile devices.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2251

Yoshioka et al. [282] present a framework that incorporates


a network-in-network architecture into a CNN model, which
allows to perform ASR with mobile multi-microphone devices
used in noisy environments. Mobile ASR can also accelerate
text input on mobile devices, Ruan et al.’s [283] study show-
ing that with the help of ASR, the input rates of English and
Mandarin are 3.0 and 2.8 times faster over standard typing
on keyboards. More recently, the applicability of deep learn- Fig. 13. Illustration of mobility analysis paradigms at individual (left) and
ing to multi-task audio sensing is investigated in [98], where group (right) levels.
Georgiev et al. propose and evaluate a novel deep learning
modelling and optimization framework tailored to embedded
audio sensing tasks. To this end, they selectively share com- At the same time, app-level data usually contains impor-
pressed representations between different tasks, which reduces tant users information and processing this poses significant
training and data storage overhead, without significantly com- privacy concerns. Although there have been efforts that com-
promising accuracy of an individual task. The authors evaluate mit to preserve user privacy, as we discuss in Section VI-H,
their framework on a memory-constrained smartphone per- research efforts in this direction are new, especially in terms of
forming four audio tasks (i.e., speaker identification, emotion protecting user information in distributed training. We expect
recognition, stress detection, and ambient scene analysis). more efforts in this direction in the future.
Experiments suggest this proposal can achieve high efficiency
in terms of energy, runtime and memory, while maintaining
excellent accuracy. D. Deep Learning Driven Mobility Analysis
Other Applications: Deep learning also plays an important Understanding movement patterns of groups of human
role in other applications that involve app-level data analy- beings and individuals is becoming crucial for epidemiol-
sis. For instance, Ignatov et al. [284] show that deep learning ogy, urban planning, public service provisioning, and mobile
can enhance the quality of pictures taken by mobile phones. network resource management [483]. Deep learning is gaining
By employing a CNN, they successfully improve the quality increasing attention in this area, both from a group and indi-
of images obtained by different mobile devices, to a digital vidual level perspective (see Fig. 13). In this subsection, we
single-lens reflex camera level. Lu et al. focus on video thus discuss research using deep learning in this space, which
post-processing under wireless networks [285], where their we summarize in Table XIII.
framework exploits a customized AlexNet to answer queries Since deep learning is able to capture spatial dependen-
about detected objects. This framework further involves an cies in sequential data, it is becoming a powerful tool for
optimizer, which instructs mobile devices to offload videos, in mobility analysis. The applicability of deep learning for trajec-
order to reduce query response time. tory prediction is studied in [484]. By sharing representations
Another interesting application is presented in [286], where learned by RNN and Gate Recurrent Unit (GRU), the frame-
Lee et al. show that deep learning can help smartwatch users work can perform multi-task learning on both social networks
reduce distraction by eliminating unnecessary notifications. and mobile trajectories modeling. Specifically, the authors first
Specifically, the authors use an 11-layer MLP to predict the use deep learning to reconstruct social network representa-
importance of a notification. Fang et al. [288] exploit an tions of users, subsequently employing RNN and GRU models
MLP to extract features from high-dimensional and hetero- to learn patterns of mobile trajectories with different time
geneous sensor data, including accelerometer, magnetometer, granularity. Importantly, these two components jointly share
and gyroscope measurements. Their architecture achieves 95% representations learned, which tightens the overall architecture
accuracy in recognizing human transportation modes, i.e., still, and enables efficient implementation. Ouyang et al. argue that
walking, running, biking, and on vehicle. mobility data are normally high-dimensional, which may be
Lessons Learned: App-level data is heterogeneous and gen- problematic for traditional ML models. Therefore, they build
erated from distributed mobile devices, and there is a trend upon deep learning advances and propose an online learn-
to offload the inference process to these devices. However, ing scheme to train a hierarchical CNN architecture, allowing
due to computational and battery power limitations, models model parallelization for data stream processing [294]. By
employed in the edge-based scenario are constrained to light- analyzing usage records, their framework “DeepSpace” pre-
weight architectures, which are less suitable for complex tasks. dicts individuals’ trajectories with much higher accuracy as
Therefore, the trade-off between model complexity and accu- compared to naive CNNs, as shown with experiments on a
racy should be carefully considered [67]. Numerous efforts real-world dataset. Tkačík and Kordík [307] design a Neural
were made towards tailoring deep learning to mobile devices, Turing Machine [485] to predict trajectories of individu-
in order to make algorithms faster and less energy-consuming als using mobile phone data. The Neural Turing Machine
on embedded equipment. For example, model compression, embraces two major components: a memory module to store
pruning, and quantization are commonly used for this purpose. the historical trajectories, and a controller to manage the
Mobile device manufacturers are also developing new software “read” and “write” operations over the memory. Experiments
and hardware to support deep learning based applications. We show that their architecture achieves superior generalization
will discuss this work in more detail in Section VII. over stacked RNN and LSTM, while also delivering more

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2252 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE XIII
A S UMMARY OF W ORK ON D EEP L EARNING D RIVEN M OBILITY A NALYSIS

precise trajectory prediction than the n-grams and k nearest fuses representations extracted by all models for the final
neighbor methods. prediction. By incorporating external events information,
Instead of focusing on individual trajectories, their proposal achieves the highest accuracy among all deep
Song et al. [296] shed light on the mobility analysis at learning and non-deep learning methods studied. An RNN is
a larger scale. In their work, LSTM networks are exploited also employed in [309], where Jiang et al. perform short-term
to jointly model the city-wide movement patterns of a large urban mobility forecasting on a huge dataset collected from
group of people and vehicles. Their multi-task architecture a real-world deployment. Their model delivers superior
demonstrates superior prediction accuracy over a standard accuracy over the n-gram and Markovian approaches.
LSTM. City-wide mobile patterns is also researched in [297], Lin et al. [229] consider generating human movement
where Zhang et al. architect deep spatio-temporal residual chains from cellular data, to support transportation planning. In
networks to forecast the movements of crowds. In order particular, they first employ an input-output Hidden Markov
to capture the unique characteristics of spatio-temporal Model (HMM) to label activity profiles for CDR data pre-
correlations associated with human mobility, their frame- processing. Subsequently, an LSTM is designed for activity
work abandons RNN-based models and constructs three chain generation, given the labeled activity sequences. They
ResNets to extract nearby and distant spatial dependencies further synthesize urban mobility plans using the generative
within a city. This scheme learns temporal features and model and the simulation results reveal reasonable fit accuracy.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2253

Jiang et al. [311] design 24-h mobility prediction system base


on RNN mdoels. They employ dynamic Region of Interests
(ROIs) for each hour to discovered through divide-and-merge
mining from raw trajectory database, which leads to high
prediction accuracy. Feng et al. incorporate attention mech-
anisms on RNN [312], to capture the complicated sequential
transitions of human mobility. By combining the heteroge-
neous transition regularity and multi-level periodicity, their
model delivers up to 10% of accuracy improvement compared
to state-of-the-art forecasting models.
Yayeh et al. [301] employ an MLP to predict the mobil-
ity of mobile devices in mobile ad-hoc networks, given
previously observed pause time, speed, and movement direc-
tion. Simulations conducted using the random waypoint mobil- Fig. 14. An illustration of device-based (left) and device-free (right) indoor
localization systems.
ity model show that their proposal achieves high prediction
accuracy. An MLP is also adopted in [308], where Kim
and Song model the relationship between human mobil-
ity and personality, and achieve high prediction accuracy. As a result, research on user localization is evolving rapidly
Yao et al. [304] discover groups of similar trajectories and numerous techniques are emerging [487]. In general, user
to facilitate higher-level mobility driven applications using localization methods can be categorized as device-based and
RNNs. Particularly, a sequence-to-sequence AE is adopted device-free [488]. We illustrate the two different paradigms
to learn fixed-length representations of mobile users’ trajec- in Fig. 14. Specifically, in the first category specific devices
tories. Experiments show that their method can effectively carried by users become prerequisites for fulfilling the appli-
capture spatio-temporal patterns in both real and synthetic cations’ localization function. This type of approaches rely on
datasets. Shao et al. [300] design a sophisticated pedome- signals from the device to identify the location. Conversely,
ter using a CNN. By reducing false negative steps caused by approaches that require no device pertain to the device-free
periodic movements, their proposal significantly improves the category. Instead these employ special equipment to moni-
robustness of the pedometer. tor signal changes, in order to localize the entities of interest.
Chen et al. [302] combine GPS records and traffic accident Deep learning can enable high localization accuracy with both
data to understand the correlation between human mobility and paradigms. We summarize the most notable contributions in
traffic accidents. To this end, they design a stacked denoising Table XIV and delve into the details of these works next.
AE to learn a compact representation of the human mobil- To overcome the variability and coarse-granularity
ity, and subsequently use that to predict the traffic accident limitations of signal strength based methods,
risk. Their proposal can deliver accurate, real-time prediction Wang et al. [313] propose a deep learning driven fin-
across large regions. GPS records are also used in other gerprinting system name “DeepFi” to perform indoor
mobility-driven applications. Song et al. [303] employ DBNs localization based on Channel State Information (CSI).
to predict and simulate human emergency behavior and mobil- Their toolbox yields much higher accuracy as compared to
ity in natural disaster, learning from GPS records of 1.6 traditional methods, including FIFS [489], Horus [490], and
million users. Their proposal yields accurate predictions in Maximum Likelihood [491]. The same group of authors
different disaster scenarios such as earthquakes, tsunamis, and extend their work in [274], [275], [314], and [315], where
nuclear accidents. GPS data is also utilized in [305], where they update the localization system, such that it can work
Liu et al. study the potential of employing deep learning for with calibrated phase information of CSI [274], [275], [325].
urban traffic prediction using mobility data. They further use more sophisticated CNN [314], [333] and
Lessons Learned: Mobility analysis is concerned with the bi-modal structures [315] to improve the accuracy.
movement trajectory of a single user or large groups of users. Nowicki and Wietrzykowski [316] propose a localization
The data of interest are essential time series, but have an addi- framework that reduces significantly the effort of system tun-
tional spatial dimension. Mobility data is usually subject to ing or filtering and obtains satisfactory prediction performance.
stochasticity, loss, and noise; therefore precise modelling is Wang et al. suggest that the objective of indoor localiza-
not straightforward. As deep learning is able to perform auto- tion can be achieved without the help of mobile devices.
matic feature extraction, it becomes a strong candidate for Wang et al. [318] employ an AE to learn useful patterns from
human mobility modelling. Among them, CNNs and RNNs WiFi signals. By automatic feature extraction, they produce a
are the most successful architectures in such applications predictor that can fulfill multi-tasks simultaneously, including
(e.g., [229] and [294]–[297]), as they can effectively exploit indoor localization, activity, and gesture recognition. A similar
spatial and temporal correlations. work in presented in [328], where Zhou et al. employ an MLP
structure to perform device-free indoor localization using CSI.
E. Deep Learning Driven User Localization Kumar et al. [322] use deep learning to address the problem
Location-based services and applications (e.g., mobile AR, of indoor vehicles localization. They employ CNNs to analyze
GPS) demand precise individual positioning technology [486]. visual signal and localize vehicles in a car park. This can help

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2254 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE XIV
L EVERAGING D EEP L EARNING IN U SER L OCALIZATION

driver assistance systems operate in underground environments can perform precise positioning in both vertical and hori-
where the system has limited vision ability. zontal dimensions in real-time. Niitsoo et al. [332] employ
Xiao et al.[334] achieve low cost indoor localization with a CNN to perform localization given raw channel impulse
Bluetooth technology. The authors design a denosing AE to response data. Their framework is robust to multipath propa-
extract fingerprint features from the received signal strength of gation environments and more precise than signal processing
Bluetooth Low Energy beacon and subsequently project that based approaches. A CNN is also adopted in [331], where
to the exact position in 3D space. Experiments conducted in the authors work with received signal strength series and
a conference room demonstrate that the proposed framework achieve 100% prediction accuracy in terms of building and

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2255

Fig. 15. EZ-Sleep setup in a subject’s bedroom. Figure adopted from [335].
Fig. 16. An example framework for WSN data collection and (centralized
and decentralized) analysis.

floor identification. The work in [330] combines deep learn-


under both indoor and outdoor environments. They use an
ing with linear discriminant analysis for feature reduction,
AE to pre-train a four-layer MLP, in order to avoid hand-
achieving low positioning errors in multi-building environ-
crafted feature engineering. The MLP is subsequently used
ments. Zhang et al. [329] combine pervasive magnetic field
to estimate the coarse position of targets. The authors fur-
and WiFi fingerprinting for indoor localization using an MLP.
ther introduce an HMM to fine-tune the predictions based on
Experiments show that adding magnetic field information to
temporal properties of data. This improves the accuracy esti-
the input of the model can improve the prediction accuracy,
mation in both in-/out-door positioning with Wi-Fi signals.
compared to solutions based soley on WiFi fingerprinting.
More recently, Shokry et al. [327] propose DeepLoc, a deep
Hsu et al. [335] use deep learning to provide Radio
learning-based outdoor localization system using crowdsensed
Frequency-based user localization, sleep monitoring, and
geo-tagged received signal strength information. By using an
insomnia analysis in multi-user home scenarios where indi-
MLP to learn the correlation between cellular signal and users’
vidual sleep monitoring devices might not be available.
locations, their framework can deliver median localization
They use a CNN classifier with a 14-layer residual network
accuracy within 18.8m in urban areas and within 15.7m in
model for sleep monitoring, in addition to Hidden Markov
rural areas on Android devices, while requiring modest energy
Models, to accurately track when the user enters or leaves
budgets.
the bed. By deploying sleep sensors called EZ-Sleep in
Lessons Learned: Localization relies on sensorial output,
8 homes (see Fig. 15), collecting data for 100 nights
signal strength, or CSI. These data usually have complex
of sleep over a month, and cross-validating this using
features, therefore large amounts of data are required for
an electroencephalography-based sleep monitor, the authors
learning [316]. As deep learning can extract features in an
demonstrate the performance of their solution is comparable
unsupervised manner, it has become a strong candidate for
to that of individual, electroencephalography-based devices.
localization tasks. On the other hand, it can be observed that
Most mobile devices can only produce unlabeled posi-
positioning accuracy and system robustness can be improved
tion data, therefore unsupervised and semi-supervised learning
by fusing multiple types of signals when providing these as the
become essential. Mohammadi et al. address this problem by
input (e.g., [329]). Using deep learning to automatically extract
leveraging DRL and VAE. In particular, their framework envi-
features and correlate information from different sources for
sions a virtual agent in indoor environments [319], which can
localization purposes is becoming a trend.
constantly receive state information during training, including
signal strength indicators, current agent location, and the real
(labeled data) and inferred (via a VAE) distance to the tar- F. Deep Learning Driven Wireless Sensor Networks
get. The agent can virtually move in eight directions at each Wireless Sensor Networks (WSNs) consist of a set of unique
time step. Each time it takes an action, the agent receives an or heterogeneous sensors that are distributed over geographi-
reward signal, identifying whether it moves to a correct direc- cal regions. Theses sensors collaboratively monitor physical or
tion. By employing deep Q learning, the agent can finally environment status (e.g., temperature, pressure, motion, pol-
localize accurately a user, given both labeled and unlabeled lution, etc.) and transmit the data collected to centralized
data. servers through wireless channels (see top circle in Fig. 9
Beyond indoor localization, there also exist several research for an illustration). A WSN typically involves three key core
works that apply deep learning in outdoor scenarios. For tasks, namely sensing, communication and analysis. Deep
example, Zhengj and Weng [323] introduce a lightweight learning is becoming increasingly popular also for WSN appli-
developmental network for outdoor navigation applications cations [348]. In what follows, we review works adopting deep
on mobile devices. Compared to CNNs, their architecture learning in this domain, covering different angles, namely:
requires 100 times fewer weights to be updated, while main- centralized vs. decentralized analysis paradigms, WSN data
taining decent accuracy. This enables efficient outdoor navi- analysis per se, WSN localization, and other applications. Note
gation on mobile devices. Work in [112] studies localization that the contributions of these works are distinct from mobile

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2256 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE XV
A S UMMARY OF W ORK ON D EEP L EARNING D RIVEN WSN S

data analysis discussed in Sections VI-B and VI-C, as in this Centralized vs Decentralized Analysis Approaches: There
subsection we only focus on WSN applications. We begin by exist two data processing scenarios in WSNs, namely central-
summarizing the most important works in Table XV. ized and decentralized. The former simply takes sensors as

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2257

data collectors, which are only responsible for gathering data network testbed, achieving 99.3% leakage detection accuracy
and sending these to a central location for processing. The and localization error for less than 3 meters.
latter assumes sensors have some computational ability and WSN Data Analysis: Deep learning has also been exploited
the main server offloads part of the jobs to the edge, each for identification of smoldering and flaming combustion phases
sensor performing data processing individually. We show an in forests. Yan et al. [341] embed a set of sensors into a
example framework for WSN data collection and analysis in forest to monitor CO2 , smoke, and temperature. They sug-
Fig. 16, where sensor data is collected via various nodes in a gest that various burning scenarios will emit different gases,
field of interest. Such data is delivered to a sink node, which which can be taken into account when classifying smolder-
aggregates and optionally further processes this. Work in [345] ing and flaming combustion. Wang et al. [342] consider deep
focuses on the centralized approach and the authors apply a learning to correct inaccurate measurements of air tempera-
3-layer MLP to reduce data redundancy while maintaining ture. They discover a close relationship between solar radiation
essential points for data aggregation. These data are sent to a and actual air temperature, which can be effectively learned
central server for analysis. In contrast, Li et al. [346] propose by neural networks. Sun et al. [352] employ a Wavelet neural
to distribute data mining to individual sensors. They partition network based solution to evaluate radio link quality in WSNs
a deep neural network into different layers and offload layer on smart grids. Their proposal is more precise than traditional
operations to sensor nodes. Simulations conducted suggest approaches and can provide end-to-end reliability guarantees
that, by pre-processing with NNs, their framework obtains high to smart grid applications.
fault detection accuracy, while reducing power consumption at Missing data or de-synchronization are common in WSN
the central server. data collection. These may lead to serious problems in analy-
WSN Localization: Localization is also an important and sis due to inconsistency. Lee et al. [343] address this problem
challeging task in WSNs. Chuang and Jiang [337] exploit neu- by plugging a query refinement component in deep learn-
ral networks to localize sensor nodes in WSNs. To adapt deep ing based WSN analysis systems. They employ exponential
learning models to specific network topology, they employ an smoothing to infer missing data, thereby maintaining the
online training scheme and correlated topology-trained data, integrity of data for deep learning analysis without signif-
enabling efficient model implementations and accurate loca- icantly compromising accuracy. To enhance the intelligence
tion estimation. Based on this, Bernas and Płaczek [338] of WSNs, Li and Serpen [344] embed an artificial neural
architect an ensemble system that involves multiple MLPs network into a WSN, allowing it to agilely react to poten-
for location estimation in different regions of interest. In tial changes and following deployment in the field. To this
this scenario, node locations inferred by multiple MLPs are end, they employ a minimum weakly-connected dominating
fused by a fusion algorithm, which improves the localiza- set to represent the WSN topology, and subsequently use a
tion accuracy, particularly benefiting sensor nodes that are Hopfield recurrent neural network as a static optimizer, to
around the boundaries of regions. A comprehensive compar- adapt network infrastructure to potential changes as necessary.
ison of different training algorithms that apply MLP-based This work represents an important step towards embedding
node localization is presented in [339]. Experiments suggest machine intelligence in WSNs.
that the Bayesian regularization algorithm in general yields the Other Applications: The benefits of deep learning have
best performance. Dong et al. [340] consider an underwater also been demonstrated in other WSN applications. The work
node localization scenario. Since acoustic signals are subject to in [349] focuses on reducing energy consumption while main-
loss caused by absorption, scattering, noise, and interference, taining security in wireless multimedia sensor networks. A
underwater localization is not straightforward. By adopting a stacked AE is employed to categorize images in the form
deep neural network, their framework successfully addresses of continuous pieces, and subsequently send the data over
the aforementioned challenges and achieves higher inference the network. This enables faster data transfer rates and lower
accuracy as compared to SVM and generalized least square energy consumption. Mehmood et al. [354] employ MLPs
methods. to achieve robust routing in WSNs, so as to facilitate pol-
Phoemphon et al. [350] combine a fuzzy logic system lution monitoring. Their proposal use the NN to provide an
and an ELM via a particle swarm optimization technique to efficiency threshold value and switch nodes that consume less
achieve robust range-free location estimation for sensor nodes. energy than this threshold, thereby improving energy efficiency.
In particular, the fuzzy logic system is employed for adjust- Alsheikh et al. [355] introduce an algorithm for WSNs that
ing the weight of traditional centroids, while the ELM is used uses AEs to minimize the energy expenditure. Their architecture
for optimization for the localization precision. Their method exploits spatio-temporal correlations to reduce the dimensions
achieves superior accuracy over other soft computing-based of raw data and provides reconstruction error bound guarantees.
approaches. Similarly, Banihashemian et al. [351] employ Wang et al. [357] design a dedicated projection-recovery
the particle swarm optimization technique combining with neural network to blindly calibrate sensor measurements in
MLPs to perform range-free WSN localization, which achieves an online manner. Their proposal can automatically extract
low localization error. Kang et al. [353] shed light water features from sensor data and exploit spatial and temporal
leakage and localization in water distribution systems. They correlations among information from all sensors, to achieve
represent the water pipeline network as a graph and assume high accuracy. This is the first effort that adopts deep learn-
leakage events occur at vertices. They combine CNN with ing in WSN data calibration. Jia et al. [358] shed light on
SVM to perform detection and localization on wireless sensor ammonia monitoring using deep learning. In their design, an

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2258 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

LSTM is employed to predict the sensors’ electrical resistance


during a very short heating pulse, without waiting for settling
in an equilibrium state. This dramatically reduces the energy
consumption of sensors in the waiting process. Experiments
with 38 prototype sensors and a home-built gas flow system
show that the proposed LSTM can deliver precise prediction
of equilibrium state resistance under different ammonia con-
centrations, cutting down the overall energy consumption by
approximately 99.6%.
Lessons learned: The centralized and decentralized WSN
data analysis paradigms resemble the cloud and fog computing
philosophies in other areas. Decentralized methods exploit the
computing ability of sensor nodes and perform light processing
and analysis locally. This offloads the burden on the cloud
and significantly reduces the data transmission overheads and
storage requirements. However, at the moment, the centralized
approach dominates the WSN data analysis landscape. As deep
learning implementation on embedded devices becomes more
accessible, in the future we expect to witness a grow in the
popularity of the decentralized schemes.
On the other hand, looking at Table XV, it is interesting
to see that the majority of deep learning practices in WSNs
employ MLP models. Since MLP is straightforward to archi-
tect and performs reasonably well, it remains a good candidate
for WSN applications. However, since most sensor data col-
lected is sequential, we expect RNN-based models will play
a more important role in this area.

Fig. 17. Principles of three control approaches applied in mobile and wireless
G. Deep Learning Driven Network Control networks control, namely reinforcement learning (above), imitation learning
In this part, we turn our attention to mobile network control (middle), and analysis-based control (below).
problems. Due to powerful function approximation mecha-
nism, deep learning has made remarkable breakthroughs in
improving traditional reinforcement learning [26] and imita-
tion learning [492]. These advances have potential to solve and delivers this to an agent, to execute the actions. We illus-
mobile network control problems which are complex and trate the principles between the three control paradigms in
previously considered intractable [493], [494]. Recall that in Fig. 17. We review works proposed so far in this space next,
reinforcement learning, an agent continuously interacts with and summarize these efforts in Table XVI.
the environment to learn the best action. With constant explo- Network Optimization refers to the management of network
ration and exploitation, the agent learns to maximize its resources and functions in a given environment, with the
expected return. Imitation learning follows a different learning goal of improving the network performance. Deep learn-
paradigm called “learning by demonstration”. This learning ing has recently achieved several successful results in this
paradigm relies on a ‘teacher’ who tells the agent what action area. For example, Liu et al. [359] exploit a DBN to dis-
should be executed under certain observations during the train- cover the correlations between multi-commodity flow demand
ing. After sufficient demonstrations, the agent learns a policy information and link usage in wireless networks. Based on the
that imitates the behavior of the teacher and can operate stan- predictions made, they remove the links that are unlikely to be
dalone without supervision. For instance, an agent is trained scheduled, so as to reduce the size of data for the demand con-
to mimic human behaviour (e.g., in applications such as game strained energy minimization. Their method reduces runtime
play, self-driving vehicles, or robotics), instead of learning by up to 50%, without compromising optimality. Subramanian
by interacting with the environment, as in the case of pure and Banerjee [360] propose to use deep learning to predict
reinforcement learning. This is because in such applications, the health condition of heterogeneous devices in machine
making mistakes can have fatal consequences [27]. to machine communications. The results obtained are subse-
Beyond these two approaches, analysis-based control is quently exploited for optimizing health aware policy change
gaining traction in mobile networking. Specifically, this decisions.
scheme uses ML models for network data analysis, and subse- He et al. [361], [362] employ deep reinforcement learning to
quently exploits the results to aid network control. Unlike rein- address caching and interference alignment problems in wire-
forcement/imitation learning, analysis-based control does not less networks. In particular, they treat time-varying channels
directly output actions. Instead, it extract useful information as finite-state Markov channels and apply deep Q networks

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2259

TABLE XVI
A S UMMARY OF W ORK ON D EEP L EARNING D RIVEN N ETWORK C ONTROL

to learn the best user selection policy. This novel frame- light on automatic traffic optimization using a deep reinforce-
work demonstrates significantly higher sum rate and energy ment learning approach. Specifically, they architect a two-level
efficiency over existing approaches. Chen et al. [366] shed DRL framework, which imitates the Peripheral and Central

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2260 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

Nervous Systems in animals, to address scalability problems More recently, Chinchali et al. [371] present a policy gradi-
at datacenter scale. In their design, multiple peripheral systems ent based scheduler to optimize the cellular network traffic
are deployed on all end-hosts, so as to make decisions locally flow. Specifically, they cast the scheduling problem as a
for short traffic flows. A central system is further employed to MDP and employ RF to predict network throughput, which
decide on the optimization with long traffic flows, which are is subsequently used as a component of a reward function.
more tolerant to longer delay. Experiments in a testbed with Evaluations with a realistic network simulator demonstrate that
32 severs suggest that the proposed design reduces the traf- this proposal can dynamically adapt to traffic variations, which
fic optimization turn-around time and flow completion time enables mobile networks to carry 14.7% more data traffic,
significantly, compared to existing approaches. while outperforming heuristic schedulers by more than 2×.
Routing: Deep learning can also improve the efficiency of Wei et al. [372] address user scheduling and content caching
routing rules. Lee [367] exploit a 3-layer deep neural network simultaneously. In particular, they train a DRL agent, consist-
to classify node degree, given detailed information of the rout- ing of an actor for deciding which base station should serve
ing nodes. The classification results along with temporary certain content, and whether to save the content. A critic is
routes are exploited for subsequent virtual route genera- further employed to estimate the value function and deliver
tion using the Viterbi algorithm. Mao et al. [187] employ feedback to the actor. Simulations over a cluster of base sta-
a DBN to decide the next routing node and construct a tions show that the agent can yield low transmission delay.
software defined router. By considering Open Shortest Path Li et al. [393] shed light on resource allocation in a multi-user
First as the optimal routing strategy, their method achieves mobile computing scenario. They employ a deep Q learn-
up to 95% accuracy, while reducing significantly the over- ing framework to jointly optimize the offloading decision and
head and delay, and achieving higher throughput with a computational resource allocation, so as to minimize the sum
signaling interval of 240 milliseconds. In follow up work, cost of delay and energy consumption of all user equipment.
the authors use tensors to represent hidden layers, weights Simulations show that their proposal can reduce the total cost
and biases in DBNs, which further improves the routing of the system, as compared to fully-local, fully-offloading, and
performance [396]. naive Q-learning approaches.
A similar outcome is obtained in [295], where Yang et al. Resource Allocation: Sun et al. use a deep neural network
employ Hopfield neural networks for routing, achieving bet- to approximate the mapping between the input and output of
ter usability and survivability in mobile ad hoc network the Weighted Minimum Mean Square Error resource alloca-
application scenarios. Geyer and Carle [397] represent the tion algorithm [497], in interference-limited wireless network
network using graphs, and design a dedicated Graph-Query environments [373]. By effective imitation learning, the neural
NN to address the distributed routing problem. This novel network approximation achieves close performance to that of
architecture takes graphs as input and uses message passing its teacher. Deep learning has also been applied to cloud radio
between nodes in the graph, allowing it to operate with various access networks, Xu et al. [374] employing deep Q learning to
network topologies. Pham et al. [404] shed light on rout- determine the on/off modes of remote radio heads given, the
ing protocols in knowledge-defined networking, using a Deep current mode and user demand. Comparisons with single base
Deterministic Policy Gradient algorithm based on reinforce- station association and fully coordinated association methods
ment learning. Their agent takes traffic conditions as input suggest that the proposed DRL controller allows the system to
and incorporates QoS into the reward function. Simulations satisfy user demand while requiring significantly less energy.
show that their framework can effectively learn the corre- Ferreira et al. [375] employ deep State-Action-Reward-
lations between traffic flows, which leads to better routing State-Action (SARSA) to address resource allocation manage-
configurations. ment in cognitive communications. By forecasting the effects
Scheduling: There are several studies that investigate of radio parameters, this framework avoids wasted trials of
scheduling with deep learning. Zhang et al. [369] intro- poor parameters, which reduces the computational resources
duce a deep Q learning-powered hybrid dynamic voltage and required. Mennes et al. [394] employ MLPs to precisely fore-
frequency scaling scheduling mechanism, to reduce the energy cast free slots prediction in a Multiple Frequencies Time
consumption in real-time systems (e.g., Wi-Fi, IoT, video Division Multiple Access (MF-TDMA) network, thereby
applications). In their proposal, an AE is employed to approx- achieving efficient scheduling. The authors conduct simula-
imate the Q function and the framework performs experience tions with a network deployed in a 100×100 room, showing
replay [496] to stabilize the training process and acceler- that their solution can effectively reduces collisions by half.
ate convergence. Simulations demonstrate that this method Zhou et al. [395] adopt LSTMs to predict traffic load at base
reduces by 4.2% the energy consumption of a traditional Q stations in ultra dense networks. Based on the predictions,
learning based method. Similarly, the work in [370] uses their method changes the resource allocation policy to avoid
deep Q learning for scheduling in roadside communications congestion, which leads to lower packet loss rates, and higher
networks. In particular, interactions between vehicular envi- throughput and mean opinion scores.
ronments, including the sequence of actions, observations, and Radio Control: Naparstek and Cohen [378] address the
reward signals are formulated as an MDP. By approximating dynamic spectrum access problem in multichannel wireless
the Q value function, the agent learns a scheduling policy that network environments using deep reinforcement learning. In
achieves lower latency and busy time, and longer battery life, this setting, they incorporate an LSTM into a deep Q network,
compared to traditional scheduling methods. to maintain and memorize historical observations, allowing

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2261

the architecture to perform precise state estimation, given par- reinforcement learning algorithm A3C, which takes the band-
tial observations. The training process is distributed to each width, bit rate and buffer size as input, and selects the bit rate
user, which enables effective training parallelization and the that leads to the best expected return. The model is trained
learning of good policies for individual users. Experiments offline and deployed on an adaptive bit rate server, demonstrat-
demonstrate that this framework achieves double the chan- ing that the system outperforms the best existing scheme by
nel throughput, when compared to a benchmark method. 12%-25% in terms of QoE. Liu et al. [391] apply deep Q learn-
Yu et al. apply deep reinforcement learning to address chal- ing to reduce the energy consumption in cellular networks.
lenges in wireless multiple access control [389], recognizing They train an agent to dynamically switch on/off base stations
that in such tasks DRL agents are fast in terms of con- based on traffic consumption in areas of interest. An action-
vergence and robust against non-optimal parameter settings. wise experience replay mechanism is further designed for
Li et al. investigate power control for spectrum sharing in balancing different traffic behaviours. Experiments show that
cognitive radios using DRL. In their design, a DQN agent is their proposal can significantly reduce the energy consumed
built to adjust the transmit power of a cognitive radio system, by base stations, outperforming naive table-based Q learn-
such that the overall signal-to-interference-plus-noise ratio is ing approaches. A control mechanism for unmanned aerial
maximized. vehicles using DQN is proposed in [401], where multiple
The work in [379] sheds light on the radio control and signal objectives are targeted: maximizing energy efficiency, com-
detection problems. In particular, the authors introduce a radio munications coverage, fairness and connectivity. The authors
signal search environment based on the Gym Reinforcement conduct extensive simulations in an virtual playground, show-
Learning platform. Their agent exhibits a steady learning ing that their agent is able to learn the dynamics of the
process and is able to learn a radio signal search policy. environment, achieving superior performance over random and
Rutagemwa et al. [381] employ an RNN to perform traffic greedy control baselines.
prediction, which can subsequently aid the dynamic spec- Kim and Kim [386] link deep learning with the load bal-
trum assignment in mobile networks. With accurate traffic ancing problem in IoT. The authors suggest that DBNs can
forecasting, their proposal improves the performance of spec- effectively analyze network load and process structural con-
trum sharing in dynamic wireless environments, as it attains figuration, thereby achieving efficient load balancing in IoT.
near-optimal spectrum assignments. Liu et al. [403] approach Challita et al. [387] employ a deep reinforcement learn-
the anti-jamming communications problem in dynamic and ing algorithm based on echo state networks to perform
unknown environments with a DRL agent. Their system is path planning for a cluster of unmanned aerial vehicles.
based on a DQN with CNN, where the agent takes raw spec- Their proposal yields lower delay than a heuristic baseline.
trum information as input and requires limited prior knowl- Xu et al. [390] employ a DRL agent to learn from network
edge about the environment, in order to improve the overall dynamics how to control traffic flow. They advocate that DRL
throughput of the network in such adversarial circumstances. is suitable for this problem, as it performs remarkably well in
Luong et al. incorporate the blockchain technique into cog- handling dynamic environments and sophisticated state spaces.
nitive radio networking [398], employing a double DQN agent Simulations conducted over three network topologies confirm
to maximize the number of successful transaction transmis- this viewpoint, as the DRL agent significantly reduces the
sions for secondary users, while minimizing the channel cost delay, while providing throughput comparable to that of tra-
and transaction fees. Simulations show that the DQN method ditional approaches. Zhu et al. employ the A3C algorithm to
significantly outperforms na ive Q learning in terms of suc- address the caching problem in mobile edge computing. Their
cessful transactions, channel cost, and learning speed. DRL method obtains superior cache hit ratios and traffic offloading
can further attack problems in the satellite communications performance over three baselines caching methods. Several
domain. Ferreira et al. [495] fuse multi-objective reinforce- open challenges are also pointed out, which are worthy of
ment learning [405] with deep neural networks to select among future pursuit. The edge caching problem is also addressed
multiple radio transmitter settings while attempting to achieve in [402], where He et al. architect a DQN agent to perform
multiple conflicting goals, in a dynamically changing satel- dynamic orchestration of networking, caching, and comput-
lite communications channel. Specifically, two set of NNs are ing. Their method facilitates high revenue to mobile virtual
employed to execute exploration and exploitation separately. network operators.
This builds an ensembling system, with makes the frame- Lessons learned: There exist three approaches to network
work more robust to the changing environment. Simulations control using deep learning, i.e., reinforcement learning, imi-
demonstrate that their system can nearly optimize six dif- tation learning, and analysis-based control. Reinforcement
ferent objectives (i.e., bit error rate, throughput, bandwidth, learning requires to interact with the environment, trying dif-
spectral efficiency, additional power consumption, and power ferent actions and obtaining feedback in order to improve. The
efficiency), only with small performance errors compared to agent will make mistakes during training, and usually needs
ideal solutions. a large number of steps of steps to become smart. Therefore,
Other Applications: Deep learning is playing an impor- most works do not train the agent on the real infrastructure,
tant role in other network control problems as well. as making mistakes usually can have serious consequences for
Mao et al. [383] develop the Pensieve system that generates the network. Instead, a simulator that mimics the real network
adaptive video bit rate algorithms using deep reinforcement environments is built and the agent is trained offline using
learning. Specifically, Pensieve employs a state-of-the-art deep that. This imposes high fidelity requirements on the simulator,

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2262 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

as the agent can not work appropriately in an environment Infrastructure level security: We mostly focus on anomaly
that is different from the one used for training. On the other detection at the infrastructure level, i.e., identifying network
hand, although DRL performs remarkable well in many appli- events (e.g., attacks, unexpected access and use of data) that do
cations, considerable amount of time and computing resources not conform to expected behaviors. Many researchers exploit
are required to train an usable agent. This should be considered the outstanding unsupervised learning ability of AEs [406]. For
in real-life implementation. example, Thing investigates features of attacks and threats that
In contrast, the imitation learning mechanism “learns by exist in IEEE 802.11 networks [186]. The author employs a
demonstration”. It requires a teacher that provides labels stacked AE to categorize network traffic into 5 types (i.e., legit-
telling the agent what it should do under certain circum- imate, flooding, injection and impersonation traffic), achieving
stances. In the networking context, this mechanism is usually 98.67% overall accuracy. The AE is also exploited in [407],
employed to reduce the computational time [187]. Specifically, where Aminanto and Kim use an MLP and stacked AE
in some network application (e.g., routing), computing the for feature selection and extraction, demonstrating remark-
optimal solution is time-consuming, which cannot satisfy the able performance. Similarly, Feng et al. [408] use AEs to
delay constraints of mobile network. To mitigate this, one can detect abnormal spectrum usage in wireless communications.
generate a large dataset offline, and use an NN agent to learn Their experiments suggest that the detection accuracy can
the optimal actions. significantly benefit from the depth of AEs.
Analysis-based control on the other hand, is suitable for Distributed attack detection is also an important issue
problems were decisions cannot be based solely on the state in mobile network security. Khan et al. [409] focus on
of the network environment. One can use a NN to extract addi- detecting flooding attacks in wireless mesh networks. They
tional information (e.g., traffic forecasts), which subsequently simulate a wireless environment with 100 nodes, and artifi-
aids decisions. For example, the dynamic spectrum assignment cially inject moderate and severe distributed flooding attacks,
can benefit from the analysis-based control. to generate a synthetic dataset. Their deep learning based
methods achieve excellent false positive and false negative
rates. Distributed attacks are also studied in [410], where
H. Deep Learning Driven Network Security Diro and Chilamkurti focus on an IoT scenario. Another work
With the increasing popularity of wireless connectivity, pro- in [411] employs MLPs to detect distributed denial of service
tecting users, network equipment and data from malicious attacks. By characterizing typical patterns of attack incidents,
attacks, unauthorized access and information leakage becomes the proposed model works well in detecting both known and
crucial. Cyber security systems guard mobile devices and users unknown distributed denial of service attacks. More recently,
through firewalls, anti-virus software, and Intrusion Detection Nguyen et al. [421] employ RBMs to classify cyberattacks
Systems (IDS) [498]. The firewall is an access security gate- in the mobile cloud in an online manner. Through unsuper-
way that allows or blocks the uplink and downlink network vised layer-wise pre-training and fine-tuning, their methods
traffic, based on pre-defined rules. Anti-virus software detects obtain over 90% classification accuracy on three different
and removes computer viruses, worms and Trojans and mal- datasets, significantly outperforming other machine learning
ware. IDSs identify unauthorized and malicious activities, or approaches.
rule violations in information systems. Each performs its own Lopez-Martin et al. [412] propose a conditional VAE to
functions to protect network communication, central servers identify intrusion incidents in IoT. In order to improve detec-
and edge devices. tion performance, their VAE infers missing features associated
Modern cyber security systems benefit increasingly from with incomplete measurements, which are common in IoT
deep learning [500], since it can enable the system to environments. The true data labels are embedded into the
(i) automatically learn signatures and patterns from experi- decoder layers to assist final classification. Evaluations on the
ence and generalize to future intrusions (supervised learning); well-known NSL-KDD dataset [501] demonstrate that their
or (ii) identify patterns that are clearly differed from regular model achieves remarkable accuracy in identifying denial of
behavior (unsupervised learning). This dramatically reduces service, probing, remote to user and user to root attacks, out-
the effort of pre-defined rules for discriminating intrusions. performing traditional ML methods by 0.18 in terms of F1
Beyond protecting networks from attacks, deep learning can score. Hamedani et al. [413] employ MLPs to detect malicious
also be used for attack purposes, bringing huge potential to attacks in delayed feedback networks. The proposal achieves
steal or crack user passwords or information. In this subsec- more than 99% accuracy over 10,000 simulations.
tion, we review deep learning driven network security from Software level security: Nowadays, mobile devices are
three perspectives, namely infrastructure, software, and user carrying considerable amount of private information. This
privacy. Specifically, infrastructure level security work focuses information can be stolen and exploited by malicious apps
on detecting anomalies that occur in the physical network and installed on smartphones for ill-conceived purposes [502].
software level work is centred on identifying malware and bot- Deep learning is being exploited for analyzing and detecting
nets in mobile networks. From the user privacy perspective, we such threats.
discuss methods to protect from how to protect against private Yuan et al. [416] use both labeled and unlabeled mobile
information leakage, using deep learning. To our knowledge, apps to train an RBM. By learning from 300 samples,
no other reviews summarize these efforts. We summarize these their model can classify Android malware with remark-
works in Table XVII. able accuracy, outperforming traditional ML tools by up to

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2263

TABLE XVII
A S UMMARY OF W ORK ON D EEP L EARNING D RIVEN N ETWORK S ECURITY

19%. Their follow-up research in [417] named Droiddetector namely requested permission, used permission, sensitive appli-
further improves the detection accuracy by 2%. Similarly, cation programming interface calls, action and app compo-
Su et al. [418] analyze essential features of Android apps, nents. They employ DBNs to extract features of malware and

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2264 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

an SVM for classification, achieving high accuracy and only devices, with heavy computation being conducted on the cloud
requiring 6 seconds per inference instance. and mobile devices performing only simple data transforma-
Hou et al. [419] attack the malware detection problem tion and perturbation, using a differentially private mechanism.
from a different perspective. Their research points out that This simultaneously guarantees user privacy, improves infer-
signature-based detection is insufficient to deal with sophisti- ence accuracy, and reduces resource consumption.
cated Android malware. To address this problem, they propose Ossia et al. [430] focus on privacy-preserving mobile analyt-
the Component Traversal, which can automatically execute ics using deep learning. They design a client-server framework
code routines to construct weighted directed graphs. By based on the Siamese architecture [505], which accommodates
employing a Stacked AE for graph analysis, their framework a feature extractor in mobile devices and correspondingly a
Deep4MalDroid can accurately detect Android malware that classifier in the cloud. By offloading feature extraction from
intentionally repackages and obfuscates to bypass signatures the cloud, their system offers strong privacy guarantees. An
and hinder analysis attempts to their inner operations. This innovative work in [431] implies that deep neural networks
work is followed by that of Martinelli et al. [420], who exploit can be trained with differential privacy. The authors intro-
CNNs to discover the relationship between app types and duce a differentially private SGD to avoid disclosure of private
extracted syscall traces from real mobile devices. The CNN information of training data. Experiments on two publicly-
has also been used in [422], where McLaughlin et al. draw available image recognition datasets demonstrate that their
inspiration from NLP and take the disassembled byte-code algorithm is able to maintain users privacy, with a manage-
of an app as a text for analysis. Their experiments demon- able cost in terms of complexity, efficiency, and performance.
strate that CNNs can effectively learn to detect sequences of This approach is also useful for edge-based privacy filtering
opcodes that are indicative of malware. Chen et al. [423] incor- techniques such as Distributed One-class Learning [506].
porate location information into the detection framework and Servia-Rodriguez et al. [433] consider training deep neu-
exploit an RBM for feature extraction and classification. Their ral networks on distributed devices without violating privacy
proposal improves the performance of other ML methods. constraints. Specifically, the authors retrain an initial model
Botnets are another important threat to mobile networks. A locally, tailored to individual users. This avoids transferring
botnet is effectively a network that consists of machines com- personal data to untrusted entities, hence user privacy is guar-
promised by bots. These machine are usually under the control anteed. Osia et al. focus on protecting user’s personal data
of a botmaster who takes advantages of the bots to harm public from the inferences’ perspective. In particular, they break the
services and systems [503]. Detecting botnets is challenging entire deep neural network into a feature extractor (on the
and now becoming a pressing task in cyber security. Deep client side) and an analyzer (on the cloud side) to mini-
learning is playing an important role in this area. For exam- mize the exposure of sensitive information. Through local
ple, Oulehla et al. [424] propose to employ neural networks to processing of raw input data, sensitive personal information
extract features from mobile botnet behaviors. They design a is transferred into abstract features, which avoids direct dis-
parallel detection framework for identifying both client-server closure to the cloud. Experiments on gender classification and
and hybrid botnets, and demonstrate encouraging performance. emotion detection suggest that this framework can effectively
Torres et al. [425] investigate the common behavior patterns preserve user privacy, while maintaining remarkable inference
that botnets exhibit across their life cycle, using LSTMs. They accuracy.
employ both under-sampling and over-sampling to address Deep learning has also been exploited for cyber attacks,
the class imbalance between botnet and normal traffic in the including attempts to compromise private user information and
dataset, which is common in anomaly detection problems. guess passwords. Hitaj et al. [434] suggest that learning a
Similar issues are also studies in [426] and [427], where the deep model collaboratively is not reliable. By training a GAN,
authors use standard MLPs to perform mobile and peer-to-peer their attacker is able to affect such learning process and lure
botnet detection respectively, achieving high overall accuracy. the victims to disclose private information, by injecting fake
User privacy level: Preserving user privacy during train- training samples. Their GAN even successfully breaks the dif-
ing and evaluating a deep neural network is another important ferentially private collaborative learning in [431]. The authors
research issue [504]. Initial research is conducted in [428], further investigate the use of GANs for password guessing.
where Shokri and Shmatikov enable user participation in the In [507], they design PassGAN, which learns the distribu-
training and evaluation of a neural network, without sharing tion of a set of leaked passwords. Once trained on a dataset,
their input data. This allows to preserve individual’s privacy PassGAN is able to match over 46% of passwords in a dif-
while benefiting all users, as they collaboratively improve ferent testing set, without user intervention or cryptography
the model performance. Their framework is revisited and knowledge. This novel technique has potential to revolutionize
improved in [429], where another group of researchers employ current password guessing algorithms.
additively homomorphic encryption, to address the information Greydanus [435] breaks a decryption rule using an LSTM
leakage problem ignored in [428], without compromising network. They treat decryption as a sequence-to-sequence
model accuracy. This significantly boosts the security of the translation task, and train a framework with large enigma pairs.
system. More recently, Wang et al. [499] propose a frame- The proposed LSTM demonstrates remarkable performance in
work called ARDEN to preserve users’ privacy while reducing learning polyalphabetic ciphers. Maghrebi et al. [436] exploit
communication overhead in mobile-cloud deep learning appli- various deep learning models (i.e., MLP, AE, CNN, LSTM) to
cations. ARDEN partitions a NN across cloud and mobile construct a precise profiling system and perform side channel

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2265

TABLE XVIII
A S UMMARY OF D EEP L EARNING D RIVEN S IGNAL P ROCESSING

key recovery attacks. Surprisingly, deep learning based meth- important potential pitfall that should be considered in real
ods demonstrate overwhelming performance over other tem- implementations.
plate machine learning attacks in terms of efficiency in
breaking both unprotected and protected Advanced Encryption
Standard implementations. Ning et al. [438] demonstrate that I. Deep Learning Driven Signal Processing
an attacker can use a CNN to infer with over 84% accuracy Deep learning is also gaining increasing attention in signal
what apps run on a smartphone and their usage, based on mag- processing, in applications including Multi-Input Multi-Output
netometer or orientation data. The accuracy can increase to (MIMO) and modulation. MIMO has become a fundamental
98% if motion sensors information is also taken into account, technique in current wireless communications, both in cellular
which jeopardizes user privacy. To mitigate this issue, the and WiFi networks. By incorporating deep learning, MIMO
authors propose to inject Gaussian noise into the magnetome- performance is intelligently optimized based on environment
ter and orientation data, which leads to a reduction in inference conditions. Modulation recognition is also evolving to be more
accuracy down to 15%, thereby effectively mitigating the risk accurate, by taking advantage of deep learning. We give an
of privacy leakage. overview of relevant work in this area in Table XVIII.
Lessons learned: Most deep learning based solutions focus MIMO Systems: Samuel et al. suggest that deep neural
on existing network attacks, yet new attacks emerge every day. networks can be a good estimator of transmitted vectors in
As these new attacks may have different features and appear a MIMO channel. By unfolding a projected gradient descent
to behave ‘normally’, old NN models may not easily detect method, they design an MLP-based detection network to per-
them. Therefore, an effective deep learning technique should form binary MIMO detection [448]. The Detection Network
be able to (i) rapidly transfer the knowledge of old attacks to can be implemented on multiple channels after a single train-
detect newer ones; and (ii) constantly absorb the features of ing. Simulations demonstrate that the proposed architecture
newcomers and update the underlying model. Transfer learn- achieves near-optimal accuracy, while requiring light com-
ing and lifelong learning are strong candidates to address this putation without prior knowledge of Signal-to-Noise Ratio
problems, as we will discuss in Section VII-C. Research in (SNR). Yan et al. [449] employ deep learning to solve a
this directions remains shallow, hence we expect more efforts similar problem from a different perspective. By consider-
in the future. ing the characteristic invariance of signals, they exploit an
Another issue to which attention should be paid is the fact AE as a feature extractor, and subsequently use an Extreme
that NNs are vulnerable to adversarial attacks. This has been Learning Machine (ELM) to classify signal sources in a
briefly discussed in Section III-E. Although formal reports on MIMO orthogonal frequency division multiplexing (OFDM)
this matter are lacking, hackers may exploit weaknesses in system. Their proposal achieves higher detection accuracy
NN models and training procedures to perform attacks that than several traditional methods, while maintaining similar
subvert deep learning based cyber-defense systems. This is an complexity.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2266 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

Fig. 18. A communications system over an additive white Gaussian noise channel represented as an autoencoder.

Vieira et al. [324] show that massive MIMO chan- Code approach in terms of SNR by approximately 15 dB.
nel measurements in cellular networks can be utilized for Borgerding et al. [440] propose to use deep learning to recover
fingerprint-based inference of user positions. Specifically, a sparse signal from noisy linear measurements in MIMO
they design CNNs with weight regularization to exploit environments. The proposed scheme is evaluated on compres-
the sparse and information-invariance of channel finger- sive random access and massive-MIMO channel estimation,
prints, thereby achieving precise positions inference. CNNs where it achieves better accuracy over traditional algorithms
have also been employed for MIMO channel estimation. and CNNs.
Neumann et al. [447] exploit the structure of the MIMO chan- Modulation: West and O’Shea [443] compare the modu-
nel model to design a lightweight, approximated maximum lation recognition accuracy of different deep learning archi-
likelihood estimator for a specific channel model. Their meth- tectures, including traditional CNN, ResNet, Inception CNN,
ods outperform traditional estimators in terms of computation and LSTM. Their experiments suggest that the LSTM is the
cost and reduce the number of hyper-parameters to be tuned. A best candidate for modulation recognition, since it achieves the
similar idea is implemented in [454], where Ye et al. employ highest accuracy. Due to its superior performance, an LSTM
an MLP to perform channel estimation and signal detection in is also employed for a similar task in [442]. O’Shea et al. then
OFDM systems. focus on tailoring deep learning architectures to radio prop-
Wijaya et al. [380], [382] consider applying deep learn- erties. Their prior work is improved in [444], where they
ing to a different scenario. The authors propose to use architect a novel deep radio transformer network for precise
non-iterative neural networks to perform transmit power modulation recognition. Specifically, they introduce radio-
control at base stations, thereby preventing degradation of domain specific parametric transformations into a spatial
network performance due to inter-cell interference. The neu- transformer network, which assists in the normalization of
ral network is trained to estimate the optimal transmit the received signal, thereby achieving superior performance.
power at every packet transmission, selecting that with the This framework also demonstrates automatic synchronization
highest activation probability. Simulations demonstrate that abilities, which reduces the dependency on traditional expert
the proposed framework significantly outperform the belief systems and expensive signal analytic processes. O’Shea and
propagation algorithm that is routinely used for transmit Hoydis [450] introduce several novel deep learning appli-
power control in MIMO systems, while attaining a lower cations for the network physical layer. They demonstrate a
computational cost. proof-of-concept where they employ a CNN for modulation
More recently, O’Shea et al. [439] bring deep learning classification and obtain satisfying accuracy.
to physical layer design. They incorporate an unsupervised Other Signal Processing Applciations: Deep learning is also
deep AE into a single-user end-to-end MIMO system, to adopted for radio signal analysis. O’Shea et al. [452] employ
optimize representations and the encoding/decoding processes, an LSTM to replace sequence translation routines between
for transmissions over a Rayleigh fading channel. We illus- radio transmitter and receiver. Although their framework
trate the adopted AE-based framework in Fig. 18. This design works well in ideal environments, its performance drops sig-
incorporates a transmitter consisting of an MLP followed by nificantly when introducing realistic channel effects. Later,
a normalization layer, which ensures that physical constraints the authors consider a different scenario in [453], where they
on the signal are guaranteed. After transfer through an addi- exploit a regularized AE to enable reliable communications
tive white Gaussian noise channel, a receiver employs another over an impaired channel. They further incorporate a radio
MLP to decode messages and select the one with the highest transformer network for signal reconstruction at the decoder
probability of occurrence. The system can be trained with an side, thereby achieving receiver synchronization. Simulations
SGD algorithm in an end-to-end manner. Experimental results demonstrate that this approach is reliable and can be efficiently
show that the AE system outperforms the Space Time Block implemented.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2267

TABLE XIX
A S UMMARY OF E MERGING D EEP L EARNING D RIVEN M OBILE N ETWORK A PPLICATIONS

Liang et al. [455] exploit noise correlations to decode chan- of the subjects discussed thus far. These emerging applications
nels using a deep learning approach. Specifically, they use a open several new research directions, as we discuss next. A
CNN to reduce channel noise estimation errors by learning summary of these works is given in Table XIX.
the noise correlation. Experiments suggest that their frame- Network Data Monetization: Gonzalez et al. employ unsu-
work can significantly improve the decoding performance. pervised deep learning to generate real-time accurate user
The decoding performance of MLPs, CNNs and RNNs is profiles [461] using an on-network machine learning platform
compared in [456]. By conducting experiments in different called Net2Vec [509]. Specifically, they analyze user brows-
setting, the obtained results suggest the RNN achieves the ing data in real time and generate user profiles using product
best decoding performance, nonetheless yielding the high- categories. The profiles can be subsequently associated with
est computational overhead. Liao et al. [458] employ MLPs the products that are of interest to the users and employed for
to perform accurate Rayleigh fading channel prediction. The online advertising.
authors further equip their proposal with a sparse channel IoT In-Network Computation: Instead of regarding IoT
sample construction method to save system resources with- nodes as producers of data or the end consumers of processed
out compromising precision. Deep learning can further aid information, Kaminski et al. [462] embed neural networks
visible light communication. Huang and Lin [460] employ into an IoT deployment and allow the nodes to collaboratively
a deep learning based system for error correction in optical process the data generated. This enables low-latency commu-
communications. Specifically, an AE is used in their work to nication, while offloading data storage and processing from
perform dimension reduction on light-emitting diode (LED) the cloud. In particular, the authors map each hidden unit of a
visible light downlink, thereby maximizing the channel band- pre-trained neural network to a node in the IoT network, and
width . The proposal follows the theory in [450], where investigate the optimal projection that leads to the minimum
O’Shea and Hoydis demonstrate that deep learning driven sig- communication overhead. Their framework achieves function-
nal processing systems can perform as good as traditional ality similar to in-network computation in WSNs and opens a
encoding and/or modulation systems. new research directions in fog computing.
Deep learning has been further adopted in solving mil- Mobile Crowdsensing: Xiao et al. investigate vulnerabili-
limeter wave beamforming. Alkhateeb et al. [446] propose a ties facing crowdsensing in the mobile network context. They
millimeter wave communication system that utilizes MLPs to argue that there exist malicious mobile users who intentionally
predict beamforming vectors from signals received from dis- provide false sensing data to servers, to save costs and preserve
tributed base stations. By substituting a genie-aided solution their privacy, which in turn can make mobile crowdsensings
with deep learning, their framework reduces the coordination systems vulnerable [463]. The authors model the server-users
overhead, enabling wide-coverage and low-latency beamform- system as a Stackelberg game, where the server plays the
ing. Similarly, Gante et al. employ CNNs to infer the position role of a leader that is responsible for evaluating the sensing
of a device, given the received millimeter wave radiation. Their effort of individuals, by analyzing the accuracy of each sensing
preliminary simulations show that the CNN-based system can report. Users are paid based on the evaluation of their efforts,
achieve small estimation errors in a realistic outdoors scenario, hence cheating users will be punished with zero reward. To
significantly outperforming existing prediction approaches. design an optimal payment policy, the server employs a deep
Lessons learned: Deep learning is beginning to play an Q network, which derives knowledge from experience sensing
important role in signal processing applications and the reports, without requiring specific sensing models. Simulations
performance demonstrated by early prototypes is remark- demonstrate superior performance in terms of sensing qual-
able. This is because deep learning can prove advantageous ity, resilience to attacks, and server utility, as compared to
with regards to performance, complexity, and generalization traditional Q learning based and random payment strategies.
capabilities. At this stage, research in this area is however Mobile Blockchain: Substantial computing resource require-
incipient. We can only expect that deep learning will become ments and energy consumption limit the applicability of
increasingly popular in this area. blockchain in mobile network environments. To mitigate this
problem, Luong et al. shed light on resource management in
J. Emerging Deep Learning Applications in Mobile Networks mobile blockchain networks based on optimal auction in [464].
In this part, we review work that builds upon deep learning They design an MLP to first conduct monotone transforma-
in other mobile networking areas, which are beyond the scopes tions of the miners’ bids and subsequently output the allocation

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2268 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

scheme and conditional payment rules for each miner. By Iandola et al. [515] design a compact architecture for
running experiments with different settings, the results sug- embedded systems named SqueezeNet, which has similar
gest the propsoed deep learning based framework can deliver accuracy to that of AlexNet [87], a classical CNN, yet 50 times
much higher profit to edge computing service provider than fewer parameters. SqueezeNet is also based on CNNs, but
the second-price auction baseline. its significantly smaller model size (i) allows more efficiently
Internet of Vehicles (IoV): Gulati et al. [465] extend the training on distributed systems; (ii) reduces the transmis-
success of deep learning to IoV. The authors design a deep sion overhead when updating the model at the client side;
learning-based content centric data dissemination approach and (iii) facilitates deployment on resource-limited embedded
that comprises three steps, namely (i) performing energy devices. Howard et al. [516] extend this work and introduce an
estimation on selected vehicles that are capable of data efficient family of streamlined CNNs called MobileNet, which
dissemination; (ii) employing a Weiner process model to iden- uses depth-wise separable convolution operations to drastically
tify stable and reliable connections between vehicles; and reduce the number of computations required and the model
(iii) using a CNN to predict the social relationship among size. This new design can run with low latency and can satisfy
vehicles. Experiments unveil that the volume of data dissem- the requirements of mobile and embedded vision applications.
inated is positively related to social score, energy levels, and The authors further introduce two hyper-parameters to con-
number of vehicles, while the speed of vehicles has negative trol the width and resolution of multipliers, which can help
impact on the connection probability. strike an appropriate trade-off between accuracy and efficiency.
Lessons learned: The adoption of deep learning in the The ShuffleNet proposed by Zhang et al. [517] improves the
mobile and wireless networking domain is exciting and accuracy of MobileNet by employing point-wise group con-
undoubtedly many advances are yet to appear. However, as volution and channel shuffle, while retaining similar model
discussed in Section III-E, deep learning solutions are not uni- complexity. In particular, the authors discover that more
versal and may not be suitable for every problem. One should groups of convolution operations can reduce the computation
rather regard deep learning as a powerful tool that can assist requirements.
with fast and accurate inference, and facilitate the automation Zhang et al. [518] focus on reducing the number of param-
of some processes previously requiring human intervention. eters of structures with fully-connected layers for mobile
Nevertheless, deep learning algorithms will make mistakes, multimedia features learning. This is achieved by apply-
and their decisions might not be easy to interpret. In tasks ing Trucker decomposition to weight sub-tensors in the
that require high interpretability and low fault-tolerance, deep model, while maintaining decent reconstruction capability.
learning still has a long way to go, which also holds for the The Trucker decomposition has also been employed in [523],
majority of ML algorithms. where Huynh et al. seek to approximate a model with fewer
parameters, in order to save memory. Mobile optimizations are
VII. TAILORING D EEP L EARNING TO M OBILE N ETWORKS further studied for RNN models. Cao et al. [519] use a mobile
toolbox called RenderScript35 to parallelize specific data struc-
Although deep learning performs remarkably in many
tures and enable mobile GPUs to perform computational
mobile networking areas, the No Free Lunch (NFL) theorem
accelerations. Their proposal reduces the latency when running
indicates that there is no single model that can work univer-
RNN models on Android smartphones. Chen et al. [520] shed
sally well in all problems [510]. This implies that for any
light on implementing CNNs on iOS mobile devices. In par-
specific mobile and wireless networking problem, we may
ticular, they reduce the model executions latency, through
need to adapt different deep learning architectures to achieve
space exploration for data re-usability and kernel redun-
the best performance. In this section, we look at how to tailor
dancy removal. The former alleviates the high bandwidth
deep learning to mobile networking applications from three
requirements of convolutional layers, while the latter reduces
perspectives, namely, mobile devices and systems, distributed
the memory and computational requirements, with negligible
data centers, and changing mobile network environments.
performance degradation.
Rallapalli et al. [521] investigate offloading very deep
A. Tailoring Deep Learning to Mobile Devices and Systems CNNs from clouds to edge devices, by employing memory
The ultra-low latency requirements of future 5G networks optimization on both mobile CPUs and GPUs. Their frame-
demand runtime efficiency from all operations performed by work enables running at high speed deep CNNs with large
mobile systems. This also applies to deep learning driven memory requirements in mobile object detection applica-
applications. However, current mobile devices have limited tions. Lane et al. develop a software accelerator, DeepX,
hardware capabilities, which means that implementing com- to assist deep learning implementations on mobile devices.
plex deep learning architectures on such equipment may be The proposed approach exploits two inference-time resource
computationally unfeasible, unless appropriate model tun- control algorithms, i.e., runtime layer compression and deep
ing is performed. To address this issue, ongoing research architecture decomposition [522]. The runtime layer compres-
improves existing deep learning architectures [511], such that sion technique controls the memory and computation runtime
the inference process does not violate latency or energy con- during the inference phase, by extending model compression
straints [512], [513], nor raise any privacy concern [514].
We outline these works in Table XX and discuss their key 35 Android Renderscript https://developer.android.com/guide/topics/
contributions next. renderscript/compute.html.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2269

TABLE XX
S UMMARY OF W ORKS ON I MPROVING D EEP L EARNING FOR M OBILE D EVICES AND S YSTEMS

principles. This is important in mobile devices, since offload- techniques for a specific deep NN on mobile platforms. By
ing the inference process to edge devices is more practical using a deep Q learning optimizer, their proposal can achieve
with current hardware platforms. Further, the deep architec- appropriate trade-offs between accuracy, latency, storage and
ture designs “decomposition plans” that seek to optimally energy consumption.
allocate data and model operations to local and remote pro- Beyond these works, researchers also successfully adapt deep
cessors. By combining these two, DeepX enables maximizing learning architectures through other designs and sophisticated
energy and runtime efficiency, under given computation and optimizations, such as parameters quantization [524], [529],
memory constraints. Yao et al. [535] design a framework model slimming [536], sparsification and separation [525],
called FastDeepIoT, which first learns the execution time representation and memory sharing [98], [530], convolution
of NN models on target devices, and subsequently conducts operation optimization [526], pruning [527], cloud assis-
model compression to reduce the runtime without compromis- tance [528] and compiler optimization [534]. These techniques
ing the inference accuracy. Through this process, up to 78% will be of great significance when embedding deep neural
of execution time and 69% of energy consumption is reduced, networks into mobile systems.
compared to state-of-the-art compression algorithms.
More recently, Fang et al. [531] design a framework called
NestDNN, to provide flexible resource-accuracy trade-offs on B. Tailoring Deep Learning to Distributed Data Containers
mobile devices. To this end, the NestDNN first adopts a model Mobile systems generate and consume massive volumes of
pruning and recovery scheme, which translates deep NNs to mobile data every day. This may involve similar content, but
single compact multi-capacity models. With this approach which is distributed around the world. Moving such data to
up to 4.22% inference accuracy can be achieved with six centralized servers to perform model training and evaluation
mobile vision applications, at a 2.0× faster video frame inevitably introduces communication and storage overheads,
processing rate and reducing energy consumption by 1.7×. which does not scale. However, neglecting characteristics
Xu et al. [532] accelerate deep learning inference for mobile embedded in mobile data, which are associated with local
vision from the caching perspective. In particular, the proposed culture, human mobility, geographical topology, etc., during
framework called DeepCache stores recent input frames as model training can compromise the robustness of the model
cache keys and recent feature maps for individual CNN layers and implicitly the performance of the mobile network appli-
as cache values. The authors further employ reusable region cations that build on such models. The solution is to offload
lookup and reusable region propagation, to enable a region model execution to distributed data centers or edge devices,
matcher to only run once per input video frame and load to guarantee good performance, whilst alleviating the burden
cached feature maps at all layers inside the CNN. This reduces on the cloud.
the inference time by 18% and energy consumption by 20% As such, one of the challenges facing parallelism, in the con-
on average. Liu et al. [533] develop a usage-driven frame- text of mobile networking, is that of training neural networks
work named AdaDeep, to select a combination of compression on a large number of mobile devices that are battery powered,

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2270 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

TABLE XXI
S UMMARY OF W ORK ON M ODEL AND T RAINING PARALLELISM FOR M OBILE S YSTEMS AND D EVICES

have limited computational capabilities and in particular lack its components hierarchically from cloud to end devices.
GPUs. The key goal of this paradigm is that of training with The model exploits local aggregators and binary weights,
a large number of mobile CPUs at least as effective as with to reduce computational storage, and communication over-
GPUs. The speed of training remains important, but becomes heads, while maintaining decent accuracy. Experiments on a
a secondary goal. multi-view multi-camera dataset demonstrate that this proposal
Generally, there are two routes to addressing this problem, can perform efficient cloud-based training and local infer-
namely, (i) decomposing the model itself, to train (or make ence. Importantly, without violating latency constraints, the
inference with) its components individually; or (ii) scaling the deep neural network obtains essential benefits associated with
training process to perform model update at different locations distributed systems, such as fault tolerance and privacy.
associated with data containers. Both schemes allow one to Coninck et al. [114] consider distributing deep learning over
train a single model without requiring to centralize all data. IoT for classification applications. Specifically, they deploy
We illustrate the principles of these two approaches in Fig. 19 a small neural network to local devices, to perform coarse
and summarize the existing work in Table XXI. classification, which enables fast response filtered data to be
Model Parallelism: Large-scale distributed deep learning is sent to central servers. If the local model fails to classify, the
first studied in [127], where Dean et al. develop a frame- larger neural network in the cloud is activated to perform fine-
work named DistBelief, which enables training complex neural grained classification. The overall architecture maintains good
networks on thousands of machines. In their framework, the accuracy, while significantly reducing the latency typically
full model is partitioned into smaller components and dis- introduced by large model inference.
tributed over various machines. Only nodes with edges (e.g., Decentralized methods can also be applied to deep rein-
connections between layers) that cross boundaries between forcement learning. Omidshafiei et al. [538] consider a multi-
machines are required to communicate for parameters update agent system with partial observability and limited communi-
and inference. This system further involves a parameter cation, which is common in mobile systems. They combine a
server, which enables each model replica to obtain the lat- set of sophisticated methods and algorithms, including hystere-
est parameters during training. Experiments demonstrate that sis learners, a deep recurrent Q network, concurrent experience
the proposed framework can be training significantly faster replay trajectories and distillation, to enable multi-agent coor-
on a CPU cluster, compared to training on a single GPU, dination using a single joint policy under a set of decentralized
while achieving state-of-the-art classification performance on partially observable MDPs. Their framework can potentially
ImageNet [192]. play an important role in addressing control problems in
Teerapittayanon et al. [537] propose deep neural networks distributed mobile systems.
tailored to distributed systems, which include cloud servers, Training Parallelism is also essential for mobile system,
fog layers and geographically distributed devices. The authors as mobile data usually come asynchronously from differ-
scale the overall neural network architecture and distribute ent sources. Training models effectively while maintaining

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2271

Fig. 19. The underlying principles of model parallelism (left) and training parallelism (right).

consistency, fast convergence, and accuracy remains however which allows to compress worker updates of the target model.
challenging [545]. This significantly reduce the communication overhead between
A practical method to address this problem is to perform cloud and edge, while retaining good fault tolerance.
asynchronous SGD. The basic idea is to enable the server that Federated learning is an emerging parallelism approach
maintains a model to accept delayed information (e.g., data, that enables mobile devices to collaboratively learn a
gradient updates) from workers. At each update iteration, the shared model, while retaining all training data on individ-
server only requires to wait for a smaller number of workers. ual devices [543], [546]. Beyond offloading the training data
This is essential for training a deep neural network over dis- from central servers, this approach performs model updates
tributed machines in mobile systems. The asynchronous SGD with a Secure Aggregation protocol [544], which decrypts the
is first studied in [539], where Recht et al. propose a lock- average updates only if enough users have participated, with-
free parallel SGD named HOGWILD, which demonstrates out inspecting individual updates. Based on this idea, Google
significant faster convergence over locking counterparts. The recently build a prototype system using federated Learning in
Downpour SGD in [127] improves the robustness of the train- the domain of mobile devices [547]. This fulfills the objective
ing process when work nodes breakdown, as each model that “bringing the code to the data, instead of the data to the
replica requests the latest version of the parameters. Hence code”, which protects individuals’ privacy.
a small number of machine failures will not have a sig-
nificant impact on the training process. A similar idea has
been employed in [540], where Goyal et al. investigate the
usage of a set of techniques (i.e., learning rate adjustment, C. Tailoring Deep Learning to Changing Mobile Network
warm-up, batch normalization), which offer important insights Environments
into training large-scale deep neural networks on distributed Mobile network environments often exhibit changing pat-
systems. Eventually, their framework can train an network on terns over time. For instance, the spatial distributions of mobile
ImageNet within 1 hour, which is impressive in comparison data traffic over a region may vary significantly between dif-
with traditional algorithms. ferent times of the day [548]. Applying a deep learning model
Zhang et al. [541] argue that most of asynchronous in changing mobile environments requires lifelong learning
SGD algorithms suffer from slow convergence, due to the ability to continuously absorb new features, without forgetting
inherent variance of stochastic gradients. They propose an old but essential patterns. Moreover, new smartphone-targeted
improved SGD with variance reduction to speed up the conver- viruses are spreading fast via mobile networks and may
gence. Their algorithm outperforms other asynchronous SGD severely jeopardize users’ privacy and business profits. These
approaches in terms of convergence, when training deep neu- pose unprecedented challenges to current anomaly detection
ral networks on the Google Cloud Computing Platform. The systems and anti-virus software, as such tools must react to
asynchronous method has also been applied to deep reinforce- new threats in a timely manner, using limited information. To
ment learning. Mnih et al. [79] create multiple environments, this end, the model should have transfer learning ability, which
which allows agents to perform asynchronous updates to the can enable the fast transfer of knowledge from pre-trained
main structure. The new A3C algorithm breaks the sequen- models to different jobs or datasets. This will allow models to
tial dependency and speeds up the training of the traditional work well with limited threat samples (one-shot learning) or
Actor-Critic algorithm significantly. Hardy et al. [542] further limited metadata descriptions of new threats (zero-shot learn-
study distributed deep learning over cloud and edge devices. ing). Therefore, both lifelong learning and transfer learning
In particular, they propose a training algorithm, AdaComp, are essential for applications in ever changing mobile network

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2272 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

Fig. 20. The underlying principles of deep lifelong learning (left) and deep transfer learning (right). Lifelong learning retains the knowledge learned while
transfer learning exploits labeled data of one domain to learn in a new target domain.

environments. We illustrated these two learning paradigms in Another interesting deep lifelong learning architecture is
Fig. 20 and review essential research in this subsection. presented in [553], where Tessler et al. build a DQN agent
Deep Lifelong Learning mimics human behaviors and seeks that can retain learned skills in playing the famous com-
to build a machine that can continuously adapt to new envi- puter game Minecraft. The overall framework includes a
ronments, retain as much knowledge as possible from previous pre-trained model, Deep Skill Network, which is trained
learning experience [549]. There exist several research efforts a-priori on various sub-tasks of the game. When learning
that adapt traditional deep learning to lifelong learning. For a new task, the old knowledge is maintained by incorpo-
example, Lee et al. [550] propose a dual-memory deep learn- rating reusable skills through a Deep Skill module, which
ing architecture for lifelong learning of everyday human consists of a Deep Skill Network array and a multi-skill
behaviors over non-stationary data streams. To enable the pre- distillation network. These allow the agent to selectively
trained model to retain old knowledge while training with new transfer knowledge to solve a new task. Experiments demon-
data, their architecture includes two memory buffers, namely strate that their proposal significantly outperforms traditional
a deep memory and a fast memory. The deep memory is double DQNs in terms of accuracy and convergence. This
composed of several deep networks, which are built when technique has potential to be employed in solving mobile
the amount of data from an unseen distribution is accumu- networking problems, as it can continuously acquire new
lated and reaches a threshold. The fast memory component is knowledge.
a small neural network, which is updated immediately when Deep Transfer Learning: Unlike lifelong learning, transfer
coming across a new data sample. These two memory mod- learning only seeks to use knowledge from a specific domain
ules allow to perform continuous learning without forgetting to aid learning in a target domain. Applying transfer learning
old knowledge. Experiments on a non-stationary image data can accelerate the new learning process, as the new task does
stream prove the effectiveness of this model, as it signifi- not require to learn from scratch. This is essential to mobile
cantly outperforms other online deep learning algorithms. The network environments, as they require to agilely respond to
memory mechanism has also been applied in [551]. In par- new network patterns and threats. A number of important
ticular, the authors introduce a differentiable neural computer, applications emerge in the computer network domain [57],
which allows neural networks to dynamically read from and such as Web mining [554], caching [555] and base station
write to an external memory module. This enables lifelong sleep strategies [209].
lookup and forgetting of knowledge from external sources, as There exist two extreme transfer learning paradigms,
humans do. namely one-shot learning and zero-shot learning. One-shot
Parisi et al. consider a different lifelong learning scenario learning refers to a learning method that gains as much
in [552]. They abandon the memory modules in [550] and information as possible about a category from only one or
design a self-organizing architecture with recurrent neurons a handful of samples, given a pre-trained model [556]. On the
for processing time-varying patterns. A variant of the Growing other hand, zero-shot learning does not require any sample
When Required network is employed in each layer, to predict from a category [557]. It aims at learning a new distribution
neural activation sequences from the previous network layer. given meta description of the new category and correla-
This allows learning time-vary correlations between inputs tions with existing training data. Though research towards
and labels, without requiring a predefined number of classes. deep one-shot learning [96], [558] and deep zero-shot learn-
Importantly, the framework is robust, as it has tolerance to ing [559], [560] is in its infancy, both paradigms are very
missing and corrupted sample labels, which is common in promising in detecting new threats or traffic patterns in mobile
mobile data. networks.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2273

VIII. F UTURE R ESEARCH P ERSPECTIVES


As deep learning is achieving increasingly promising results
in the mobile networking domain, several important research
issues remain to be addressed in the future. We conclude our
survey by discussing these challenges and pinpointing key
mobile networking research problems that could be tackled
with novel deep learning tools.

A. Serving Deep Learning With Massive High-Quality Data


Deep neural networks rely on massive and high-quality data Fig. 21. Example of a 3D mobile traffic surface (left) and 2D projection
to achieve good performance. When training a large and com- (right) in Milan, Italy. Figures adapted from [218] using data from [563].
plex architecture, data volume and quality are very important,
as deeper models usually have a huge set of parameters to
be learned and configured. This issue remains true in mobile
network applications. Unfortunately, unlike in other research
areas such as computer vision and NLP, high-quality and
large-scale labeled datasets still lack for mobile network appli-
cations, because service provides and operators keep the data
collected confidential and are reluctant to release datasets.
While this makes sense from a user privacy standpoint, to
some extent it restricts the development of deep learning
mechanisms for problems in the mobile networking domain.
Moreover, mobile data collected by sensors and network
equipment are frequently subject to loss, redundancy, mislabel-
ing and class imbalance, and thus cannot be directly employed
for training purpose.
To build intelligent 5G mobile network architecture, effi-
cient and mature streamlining platforms for mobile data
processing are in demand. This requires considerable amount
of research efforts for data collection, transmission, cleaning,
Fig. 22. Analogies between mobile traffic data consumption in a city (left)
clustering, transformation, and annonymization. Deep learn- and other types of data (right).
ing applications in the mobile network area can only advance
if researchers and industry stakeholder release more datasets,
with a view to benefiting a wide range of communities.
looks similar to a natural language sequence. These observa-
B. Deep Learning for Spatio-Temporal Mobile Data Mining tions suggest that, to some extent, well-established tools for
Accurate analysis of mobile traffic data over a geographical computer vision (e.g., CNN) or NLP (e.g., RNN, LSTM) are
region is becoming increasingly essential for event localiza- promising candidate for mobile traffic analysis.
tion, network resource allocation, context-based advertising Beyond these similarity, we observe several properties of
and urban planning [548]. However, due to the mobility of mobile traffic that makes it unique in comparison with images
smartphone users, the spatio-temporal distribution of mobile or language sequences. Namely,
traffic [561] and application popularity [562] are difficult 1) The values of neighboring ‘pixels’ in fine-grained traffic
to understand (see the example city-scale traffic snapshot snapshots are not significantly different in general, while
in Fig. 21). Recent research suggests that data collected by this happens quite often at the edges of natural images.
mobile sensors (e.g., mobile traffic) over a city can be regarded 2) Single mobile traffic series usually exhibit some period-
as pictures taken by panoramic cameras, which provide a icity (both daily and weekly), yet this is not a feature
city-scale sensing system for urban surveillance [564]. These seen among video pixels.
traffic sensing images enclose information associated with the 3) Due to user mobility, traffic consumption is more likely
movements of individuals [468]. to stay or shift to neighboring cells in the near future,
From both spatial and temporal dimensions perspective, we which is less likely to be seen in videos.
recognize that mobile traffic data have important similarity Such spatio-temporal correlations in mobile traffic can be
with videos or speech, which is an analogy made recently exploited as prior knowledge for model design. We recog-
also in [218] and exemplified in Fig. 22. Specifically, both nize several unique advantages of employing deep learning
videos and the large-scale evolution of mobile traffic are com- for mobile traffic data mining:
posed of sequences of “frames”. Moreover, if we zoom into 1) CNN structures work well in imaging applications, thus
a small coverage area to measure long-term traffic consump- can also serve mobile traffic analysis tasks, given the
tion, we can observe that a single traffic consumption series analogies mentioned before.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2274 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

2) LSTMs capture well temporal correlations in time series costly and requires domain-specific knowledge. To facilitate
data such as natural language; hence this structure can the analysis of raw mobile network data, unsupervised learn-
also be adapted to traffic forecasting problems. ing becomes essential in extracting insights from unlabeled
3) GPU computing enables fast training of NNs and data [571], so as to optimize the mobile network functionality
together with parallelization techniques can support to improve QoE.
low-latency mobile traffic analysis via deep learning The potential of a range of unsupervised deep learning
tools. tools including AE, RBM and GAN remains to be fur-
In essence, we expect deep learning tools tailored to ther explored. In general, these models require light feature
mobile networking, will overcome the limitation of tradi- engineering and are thus promising for learning from het-
tional regression and interpolation tools such as Exponential erogeneous and unstructured mobile data. For instance, deep
Smoothing [565], Autoregressive Integrated Moving Average AEs work well for unsupervised anomaly detection [572].
model [566], or unifrom interpolation, which are commonly Though less popular, RBMs can perform layer-wise unsu-
used in operational networks. pervised pre-training, which can accelerate the overall model
training process. GANs are good at imitating data distribu-
C. Deep Learning for Geometric Mobile Data Mining tions, thus could be employed to mimic real mobile network
environments. Recent research reveals that GANs can even
As discussed in Section III-D, certain mobile data has
protect communications by crafting custom cryptography to
important geometric properties. For instance, the location of
avoid eavesdropping [573]. All these tools require further
mobile users or base stations along with the data carried can
research to fulfill their full potentials in the mobile networking
be viewed as point clouds in a 2D plane. If the temporal
domain.
dimension is also added, this leads to a 3D point cloud repre-
sentation, with either fixed or changing locations. In addition,
the connectivity of mobile devices, routers, base stations, gate-
ways, and so on can naturally construct a directed graph, where E. Deep Reinforcement Learning for Mobile Network
entities are represented as vertices, the links between them Control
can be seen as edges, and data flows may give direction to Many mobile network control problems have been solved
these edges. We show examples of geometric mobile data and by constrained optimization, dynamic programming and game
their potential representations in Fig. 23. At the top of the fig- theory approaches. Unfortunately, these methods either make
ure a group of mobile users is represented as a point cloud. strong assumptions about the objective functions (e.g., func-
Likewise, mobile network entities (e.g., base station, gateway, tion convexity) or data distribution (e.g., Gaussian or Poisson
users) are regarded as graphs below, following the rationale distributed), or suffer from high time and space complexity. As
explained below. Due to the inherent complexity of such rep- mobile networks become increasingly complex, such assump-
resentations, traditional ML tools usually struggle to interpret tions sometimes turn unrealistic. The objective functions are
geometric data and make reliable inferencess. further affected by their increasingly large sets of variables,
In contrast, a variety of deep learning toolboxes for that pose severe computational and memory challenges to
modelling geometric data exist, albeit not having been existing mathematical approaches.
widely employed in mobile networking yet. For instance, In contrast, deep reinforcement learning does not make
PointNet [567] and the follow on PointNet++ [101] are the strong assumptions about the target system. It employs func-
first solutions that employ deep learning for 3D point cloud tion approximation, which explicitly addresses the problem of
applications, including classification and segmentation [568]. large state-action spaces, enabling reinforcement learning to
We recognize that similar ideas can be applied to geometric scale to network control problems that were previously consid-
mobile data analysis, such as clustering of mobile users or base ered hard. Inspired by remarkable achievements in Atari [19]
stations, or user trajectory predictions. Further, deep learning and Go [574] games, a number of researchers begin to explore
for graphical data analysis is also evolving rapidly [569]. This DRL to solve complex network control problems, as we dis-
is triggered by research on Graph CNNs [102], which brings cussed in Section VI-G. However, these works only scratch
convolution concepts to graph-structured data. The applica- the surface and the potential of DRL to tackle mobile network
bility of Graph CNNs can be further extend to the temporal control problems remains largely unexplored. For instance, as
domain [570]. One possible application is the prediction of DeepMind trains a DRL agent to reduce Google’s data centers
future traffic demand at individual base station level. We cooling bill,36 DRL could be exploited to extract rich features
expect that such novel architectures will play an increasingly from cellular networks and enable intelligent on/off base sta-
important role in network graph analysis and applications such tions switching, to reduce the infrastructure’s energy footprint.
as anomaly detection over a mobile network graph. Such exciting applications make us believe that advances in
DRL that are yet to appear can revolutionize the autonomous
D. Deep Unsupervised Learning in Mobile Networks control of future mobile networks.
We observe that current deep learning practices in mobile
networks largely employ supervised learning and reinforce- 36 DeepMind AI Reduces Google Data Center Cooling Bill by 40%
ment learning. However, as mobile networks generate consid- https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-
erable amounts of unlabeled data every day, data labeling is bill-40/.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2275

Fig. 23. Examples of mobile data with geometric properties (left), their geometric representations (middle) and their candidate models for analysis (right).
PointNet++ could be used to infer user trajectories when fed with point cloud representations of user locations (above); A GraphCNN may be employed to
forecast future mobile traffic demand at base station level (below).

F. Summary [6] K. Zheng et al., “Big data-driven optimization for mobile networks
toward 5G,” IEEE Netw., vol. 30, no. 1, pp. 44–51, Jan./Feb. 2016.
Deep learning is playing an increasingly important role in [7] C. Jiang et al., “Machine learning paradigms for next-generation wire-
the mobile and wireless networking domain. In this paper, we less networks,” IEEE Wireless Commun., vol. 24, no. 2, pp. 98–105,
provided a comprehensive survey of recent work that lies at Apr. 2017.
[8] D. D. Nguyen, H. X. Nguyen, and L. B. White, “Reinforcement
the intersection between deep learning and mobile networking. learning with network-assisted feedback for heterogeneous RAT selec-
We summarized both basic concepts and advanced principles tion,” IEEE Trans. Wireless Commun., vol. 16, no. 9, pp. 6062–6076,
of various deep learning models, then reviewed work specific Sep. 2017.
[9] F. A. Narudin, A. Feizollah, N. B. Anuar, and A. Gani, “Evaluation
to mobile networks across different application scenarios. We of machine learning classifiers for mobile malware detection,” Soft
discussed how to tailor deep learning models to general mobile Comput., vol. 20, no. 1, pp. 343–357, 2016.
networking applications, an aspect overlooked by previous [10] K. Hsieh et al., “Gaia: Geo-distributed machine learning approaching
LAN speeds,” in Proc. USENIX Symp. Netw. Syst. Design Implement.
surveys. We concluded by pinpointing several open research (NSDI), 2017, pp. 629–647.
issues and promising directions, which may lead to valuable [11] W. Xiao et al., “Tux2 : Distributed graph computation for machine
future research results. Our hope is that this article will become learning,” in Proc. USENIX Symp. Netw. Syst. Design Implement.
(NSDI), 2017, pp. 669–682.
a definite guide to researchers and practitioners interested in [12] M. Paolini and S. Fili, Mastering Analytics: How to Benefit From
applying machine intelligence to complex problems in mobile Big Data and Network Complexity: An Analyst Report, RCR Wireless
network environments. News, London, U.K., 2017.
[13] C. Zhang, P. Zhou, C. Li, and L. Liu, “A convolutional neural network
for leaves recognition using data augmentation,” in Proc. IEEE Int.
ACKNOWLEDGMENT Conf. Pervasive Intell. Comput. (PICOM), Liverpool, U.K., 2015,
pp. 2143–2150.
The authors would like to thank Zongzuo Wang for sharing [14] R. Socher, Y. Bengio, and C. D. Manning, “Deep learning for NLP
valuable insights on deep learning, which helped improving (without magic),” in Proc. Tuts. Abstracts ACL, 2012, p. 5.
[15] (2017). IEEE Network Special Issue: Exploring Deep Learning for
the quality of this paper. They also thank the anonymous Efficient and Reliable Mobile Sensing. Accessed: Jul. 14, 2017.
reviewers and editors, whose detailed and thoughtful feed- [Online]. Available: http://www.comsoc.org/netmag/cfp/exploring-
back helped them give this survey more depth and a broader deep-learning-efficient-and-reliable-mobile-sensing
[16] M. Wang, Y. Cui, X. Wang, S. Xiao, and J. Jiang, “Machine learning
scope. for networking: WorkFlow, advances and opportunities,” IEEE Netw.,
vol. 32, no. 2, pp. 92–99, Mar./Apr. 2018.
R EFERENCES [17] M. A. Alsheikh, D. Niyato, S. Lin, H.-P. Tan, and Z. Han, “Mobile
big data analytics using deep learning and apache spark,” IEEE Netw.,
[1] Cisco Visual Networking Index: Forecast and Methodology, 2016–2021, vol. 30, no. 3, pp. 22–29, May/Jun. 2016.
Cisco, San Jose, CA, USA, Jun. 2017. [18] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
[2] N. Wang, E. Hossain, and V. K. Bhargava, “Backhauling 5G small cells: Cambridge, MA, USA: MIT Press, 2016.
A radio resource management perspective,” IEEE Wireless Commun., [19] V. Mnih et al., “Human-level control through deep reinforcement
vol. 22, no. 5, pp. 41–49, Oct. 2015. learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[3] F. Giust, L. Cominardi, and C. J. Bernardos, “Distributed mobility [20] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
management for future 5G networks: Overview and analysis of exist- no. 7553, pp. 436–444, 2015.
ing approaches,” IEEE Commun. Mag., vol. 53, no. 1, pp. 142–149, [21] J. Schmidhuber, “Deep learning in neural networks: An overview,”
Jan. 2015. Neural Netw., vol. 61, pp. 85–117, Jan. 2015.
[4] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5G wireless [22] W. Liu et al., “A survey of deep neural network architec-
networks: A comprehensive survey,” IEEE Commun. Surveys Tuts., tures and their applications,” Neurocomputing, vol. 234, pp. 11–26,
vol. 18, no. 3, pp. 1617–1655, 3rd Quart., 2016. Apr. 2017.
[5] A. Gupta and R. K. Jha, “A survey of 5G network: Architecture and [23] L. Deng and D. Yu, “Deep learning: Methods and applications,”
emerging technologies,” IEEE Access, vol. 3, pp. 1206–1232, 2015. Found. Trends Signal Process., vol. 7, nos. 3–4, pp. 197–387, 2014.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2276 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

[24] L. Deng, “A tutorial survey of architectures, algorithms, and applica- [49] Q. Han, S. Liang, and H. Zhang, “Mobile cloud sensing, big data,
tions for deep learning,” APSIPA Trans. Signal Inf. Process., vol. 3, and 5G networks make an intelligent and smart world,” IEEE Netw.,
pp. 1–29, Jan. 2014. vol. 29, no. 2, pp. 40–45, Mar./Apr. 2015.
[25] S. Pouyanfar et al., “A survey on deep learning: Algorithms, techniques, [50] S. Singh, N. Saxena, A. Roy, and H. S. Kim, “A survey on 5G network
and applications,” ACM Comput. Surveys vol. 52, no. 5, pp. 1–92, 2018. technologies from social perspective,” IETE Tech. Rev., vol. 34, no. 1,
[26] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, pp. 30–39, 2017.
“Deep reinforcement learning: A brief survey,” IEEE Signal Process. [51] M. Chen, J. Yang, Y. Hao, S. Mao, and K. Hwang, “A 5G cogni-
Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017. tive system for healthcare,” Big Data Cogn. Comput., vol. 1, no. 1,
[27] A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation learning: pp. 1–15, 2017.
A survey of learning methods,” ACM Comput. Surveys, vol. 50, no. 2, [52] X. Chen, J. Wu, Y. Cai, H. Zhang, and T. Chen, “Energy-efficiency
pp. 1–21, 2017. oriented traffic offloading in wireless networks: A brief survey and a
[28] X.-W. Chen and X. Lin, “Big data deep learning: Challenges and learning approach for heterogeneous cellular networks,” IEEE J. Sel.
perspectives,” IEEE Access, vol. 2, pp. 514–525, 2014. Areas Commun., vol. 33, no. 4, pp. 627–640, Apr. 2015.
[29] M. M. Najafabadi et al., “Deep learning applications and challenges [53] J. Wu, S. Guo, J. Li, and D. Zeng, “Big data meet green challenges:
in big data analytics,” J. Big Data, vol. 2, no. 1, p. 1, 2015. Big data toward green applications,” IEEE Syst. J., vol. 10, no. 3,
[30] N. F. Hordri, A. Samar, S. S. Yuhaniz, and S. M. Shamsuddin, “A pp. 888–900, Sep. 2016.
systematic literature review on features of deep learning in big data
[54] T. S. Buda et al., “Can machine learning aid in delivering new use
analytics,” Int. J. Adv. Soft Comput. Appl., vol. 9, no. 1, pp. 32–49,
cases and scenarios in 5G?” in Proc. IEEE/IFIP Netw. Oper. Manag.
2017.
Symp. (NOMS), Istanbul, Turkey, 2016, pp. 1279–1284.
[31] M. Gheisari, G. Wang, and M. Z. A. Bhuiyan, “A survey on deep learn-
[55] A. Imran, A. Zoha, and A. Abu-Dayya, “Challenges in 5G: How to
ing in big data,” in Proc. IEEE Int. Conf. Comput. Sci. Eng. (CSE)
empower SON with big data for enabling 5G,” IEEE Netw., vol. 28,
Embedded Ubiquitous Comput. (EUC), vol. 2. Guangzhou, China,
no. 6, pp. 27–33, Nov./Dec. 2014.
2017, pp. 173–180.
[32] S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based rec- [56] B. Keshavamurthy and M. Ashraf, “Conceptual design of proactive
ommender system: A survey and new perspectives,” ACM Comput. SONs based on the big data framework for 5G cellular networks:
Surveys, vol. 52, no. 1, pp. 1–5, 2019. A novel machine learning perspective facilitating a shift in the SON
[33] S. Yu, M. Liu, W. Dou, X. Liu, and S. Zhou, “Networking for big data: paradigm,” in Proc. IEEE Int. Conf. Syst. Model. Adv. Res. Trends
A survey,” IEEE Commun. Surveys Tuts., vol. 19, no. 1, pp. 531–549, (SMART), Moradabad, India, 2016, pp. 298–304.
1st Quart., 2017. [57] P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza, “A
[34] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Machine learning survey of machine learning techniques applied to self-organizing
in wireless sensor networks: Algorithms, strategies, and applica- cellular networks,” IEEE Commun. Surveys Tuts., vol. 19, no. 4,
tions,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 1996–2018, pp. 2392–2431, 4th Quart., 2017.
4th Quart., 2014. [58] R. Li et al., “Intelligent 5G: When cellular networks meet artificial
[35] C.-W. Tsai, C.-F. Lai, M.-C. Chiang, and L. T. Yang, “Data mining for intelligence,” IEEE Wireless Commun., vol. 24, no. 5, pp. 175–183,
Internet of Things: A survey,” IEEE Commun. Surveys Tuts., vol. 16, Oct. 2017.
no. 1, pp. 77–97, 1st Quart., 2014. [59] N. Bui et al., “A survey of anticipatory mobile networking: Context-
[36] X. Cheng, L. Fang, X. Hong, and L. Yang, “Exploiting mobile big based classification, prediction methodologies, and optimization tech-
data: Sources, features, and applications,” IEEE Netw., vol. 31, no. 1, niques,” IEEE Commun. Surveys Tuts., vol. 19, no. 3, pp. 1790–1821,
pp. 72–79, Jan./Feb. 2017. 3rd Quart., 2017.
[37] M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine- [60] R. Atat et al., “Big data meet cyber-physical systems: A panoramic
learning techniques in cognitive radios,” IEEE Commun. Surveys Tuts., survey,” IEEE Access, vol. 6, pp. 73603–73636, 2018.
vol. 15, no. 3, pp. 1136–1159, 3rd Quart., 2013. [61] X. Cheng, L. Fang, L. Yang, and S. Cui, “Mobile big data: The
[38] J. G. Andrews et al., “What will 5G be?” IEEE J. Sel. Areas Commun., fuel for data-driven wireless,” IEEE Internet Things J., vol. 4, no. 5,
vol. 32, no. 6, pp. 1065–1082, Jun. 2014. pp. 1489–1516, Oct. 2017.
[39] N. Panwar, S. Sharma, and A. K. Singh, “A survey on 5G: The [62] P. Kasnesis, C. Patrikakis, and I. Venieris, “Changing the game of
next generation of mobile communication,” Phys. Commun., vol. 18, mobile data analysis with deep learning,” IT Prof., to be published.
pp. 64–84, Mar. 2016. [63] L. Wang and R. Jones, “Big data analytics for network intrusion detec-
[40] O. Elijah, C. Y. Leow, T. A. Rahman, S. Nunoo, and S. Z. Iliya, “A tion: A survey,” Int. J. Netw. Commun., vol. 7, no. 1, pp. 24–31,
comprehensive survey of pilot contamination in massive MIMO—5G 2017.
system,” IEEE Commun. Surveys Tuts., vol. 18, no. 2, pp. 905–923, [64] N. Kato et al., “The deep learning vision for heterogeneous network
2nd Quart., 2016. traffic control: Proposal, challenges, and future perspective,” IEEE
[41] S. Buzzi et al., “A survey of energy-efficient techniques for 5G Wireless Commun., vol. 24, no. 3, pp. 146–153, Jun. 2017.
networks and challenges ahead,” IEEE J. Sel. Areas Commun., vol. 34,
[65] M. Zorzi, A. Zanella, A. Testolin, M. D. F. D. Grazia, and
no. 4, pp. 697–709, Apr. 2016.
M. Zorzi, “Cognition-based networks: A new perspective on network
[42] M. Peng, Y. Li, Z. Zhao, and C. Wang, “System architecture and key
optimization using learning and distributed intelligence,” IEEE Access,
technologies for 5G heterogeneous cloud radio access networks,” IEEE
vol. 3, pp. 1512–1530, 2015.
Netw., vol. 29, no. 2, pp. 6–14, Mar./Apr. 2015.
[66] Z. M. Fadlullah et al., “State-of-the-art deep learning: Evolving
[43] Y. Niu, Y. Li, D. Jin, L. Su, and A. V. Vasilakos, “A survey of mil-
machine intelligence toward tomorrow’s intelligent network traffic
limeter wave communications (mmWave) for 5G: Opportunities and
control systems,” IEEE Commun. Surveys Tuts., vol. 19, no. 4,
challenges,” Wireless Netw., vol. 21, no. 8, pp. 2657–2676, 2015.
pp. 2432–2455, 4th Quart., 2017.
[44] X. Foukas, G. Patounas, A. Elmokashfi, and M. K. Marina, “Network
slicing in 5G: Survey and challenges,” IEEE Commun. Mag., vol. 55, [67] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
no. 5, pp. 94–100, May 2017. learning for IoT big data and streaming analytics: A survey,” IEEE
[45] T. Taleb et al., “On multi-access edge computing: A survey of the Commun. Surveys Tuts., vol. 20, no. 4, pp. 2923–2960, 4th Quart.,
emerging 5G network edge cloud architecture and orchestration,” IEEE 2018.
Commun. Surveys Tuts., vol. 19, no. 3, pp. 1657–1681, 3rd Quart., [68] N. Ahad, J. Qadir, and N. Ahsan, “Neural networks in wireless
2017. networks: Techniques, applications and guidelines,” J. Netw. Comput.
[46] P. Mach and Z. Becvar, “Mobile edge computing: A survey on archi- Appl., vol. 68, pp. 1–27, Jun. 2016.
tecture and computation offloading,” IEEE Commun. Surveys Tuts., [69] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless
vol. 19, no. 3, pp. 1628–1656, 3rd Quart., 2017. networks: A comprehensive survey,” IEEE Commun. Surveys Tuts.,
[47] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey vol. 20, no. 4, pp. 2595–2621, 4th Quart., 2018.
on mobile edge computing: The communication perspective,” IEEE [70] N. C. Luong et al., “Applications of deep reinforcement learn-
Commun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, 4th Quart., ing in communications and networking: A survey,” arXiv preprint
2017. arXiv:1810.07862, 2018.
[48] Y. Wang et al., “A data-driven architecture for personalized QoE man- [71] X. Zhou, M. Sun, Y. G. Li, and B.-H. F. Juang, “Intelligent wire-
agement in 5G wireless networks,” IEEE Wireless Commun., vol. 24, less communications enabled by cognitive radio and machine learning,”
no. 1, pp. 102–110, Feb. 2017. China Commun., vol. 15, no. 12, pp. 16–48, Dec. 2018.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2277

[72] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Machine [98] P. Georgiev, S. Bhattacharya, N. D. Lane, and C. Mascolo, “Low-
learning for wireless networks with artificial intelligence: A tutorial on resource multi-task audio sensing for mobile and embedded devices
neural networks,” arXiv preprint arXiv:1710.02913, 2017. via shared deep neural network representations,” in Proc. ACM
[73] A. Gharaibeh et al., “Smart cities: A survey on data management, secu- Interact. Mobile Wearable Ubiquitous Technol. (IMWUT), vol. 1, 2017,
rity, and enabling technologies,” IEEE Commun. Surveys Tuts., vol. 19, p. 50.
no. 4, pp. 2456–2501, 4th Quart., 2017. [99] F. Monti et al., “Geometric deep learning on graphs and manifolds
[74] N. D. Lane and P. Georgiev, “Can deep learning revolutionize mobile using mixture model CNNs,” in Proc. IEEE Conf. Comput. Vis. Pattern
sensing?” in Proc. 16th ACM Int. Workshop Mobile Comput. Syst. Recognit. (CVPR), vol. 1, 2017, p. 3.
Appl., Santa Fe, NM, USA, 2015, pp. 117–122. [100] B. L. Roux and H. Rouanet, Geometric Data Analysis: From
[75] K. Ota, M. S. Dao, V. Mezaris, and F. G. B. De Natale, “Deep learning Correspondence Analysis to Structured Data Analysis. Dordrecht,
for mobile multimedia: A survey,” ACM Trans. Multimedia Comput. The Netherlands: Springer, 2004.
Commun. Appl. (TOMM), vol. 13, no. 3S, p. 34, 2017. [101] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierar-
[76] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, “A detailed chical feature learning on point sets in a metric space,” in Proc. Adv.
investigation and analysis of using machine learning techniques for Neural Inf. Process. Syst., 2017, pp. 5099–5108.
intrusion detection,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, [102] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
pp. 686–728, 1st Quart., 2019. convolutional networks,” in Proc. Int. Conf. Learn. Representations
[77] Y. Li, “Deep reinforcement learning: An overview,” arXiv preprint (ICLR), 2017, pp. 1–14.
arXiv:1701.07274, 2017. [103] X. Wang, Z. Zhou, Z. Yang, Y. Liu, and C. Peng, “Spatio-temporal
[78] L. Chen et al., “Deep mobile traffic forecast and complementary base analysis and prediction of cellular traffic in metropolis,” IEEE Trans.
station clustering for C-RAN optimization,” J. Netw. Comput. Appl., Mobile Comput., to be published.
vol. 121, pp. 59–69, Nov. 2018. [104] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks
[79] V. Mnih et al., “Asynchronous methods for deep reinforcement learn- are easily fooled: High confidence predictions for unrecognizable
ing,” in Proc. Int. Conf. Mach. Learn. (ICML), New York, NY, USA, images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015,
2016, pp. 1928–1937. pp. 427–436.
[80] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein genera- [105] V. Behzadan and A. Munir, “Vulnerability of deep reinforcement learn-
tive adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, ing to policy induction attacks,” in Proc. Int. Conf. Mach. Learn. Data
pp. 214–223. Min. Pattern Recognit., 2017, pp. 262–275.
[81] A. Damianou and N. Lawrence, “Deep Gaussian processes,” in Proc. [106] P. Madani and N. Vlajic, “Robustness of deep autoencoder in intrusion
Artif. Intell. Stat., Stellenbosch, South Africa, 2013, pp. 207–215. detection under adversarial contamination,” in Proc. 5th ACM Annu.
[82] M. Garnelo et al., “Neural processes,” arXiv preprint Symp. Bootcamp Hot Topics Sci. Security, 2018, p. 1.
arXiv:1807.01622, 2018. [107] D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network
[83] Z.-H. Zhou and J. Feng, “Deep forest: Towards an alternative to deep dissection: Quantifying interpretability of deep visual representations,”
neural networks,” in Proc. 26th Int. Joint Conf. Artif. Intell., Nanjing, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017,
China, 2017, pp. 3553–3559. pp. 3319–3327.
[84] W. Mcculloch and W. Pitts, “A logical calculus of the ideas immanent in [108] M. Wu et al., “Beyond sparsity: Tree regularization of deep models for
nervous activity,” Bull. Math. Biophys., vol. 52, nos. 1–2, pp. 99–115, interpretability,” in Proc. 32nd AAAI Conf. Artif. Intell. (AAAI), 2018,
1990. pp. 1670–1678.
[85] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning rep- [109] S. Chakraborty et al., “Interpretability of deep learning models: A sur-
resentations by back-propagating errors,” Nature, vol. 323, no. 6088, vey of results,” in Proc. IEEE Smart World Congr. Workshop DAIS,
pp. 533–536, 1986. 2017, pp. 1–6.
[86] Y. Lecun and Y. Bengio, “Convolutional networks for images, speech, [110] L. Perez and J. Wang, “The effectiveness of data augmentation in image
and time series,” in The Handbook of Brain Theory and Neural classification using deep learning,” arXiv preprint arXiv:1712.04621,
Networks, vol. 3361. Cambridge, MA, USA: MIT Press, 1995. 2017.
[87] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification [111] C. Liu et al., “Progressive neural architecture search,” in Proc. Eur.
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Conf. Comput. Vis. (ECCV), 2018, pp. 1–16.
Process. Syst., 2012, pp. 1097–1105. [112] W. Zhang, K. Liu, W. Zhang, Y. Zhang, and J. Gu, “Deep neural
[88] P. Domingos, “A few useful things to know about machine learning,” networks for wireless localization in indoor and outdoor environments,”
Commun. ACM, vol. 55, no. 10, pp. 78–87, 2012. Neurocomputing, vol. 194, pp. 279–287, Jun. 2016.
[89] I. W. Tsang, J. T. Kwok, and P.-M. Cheung, “Core vector machines: [113] F. J. Ordóñez and D. Roggen, “Deep convolutional and LSTM recurrent
Fast SVM training on very large data sets,” J. Mach. Learn. Res., vol. 6, neural networks for multimodal wearable activity recognition,” Sensors,
pp. 363–392, 2005. vol. 16, no. 1, p. 115, 2016.
[90] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine [114] E. D. Coninck et al., “Distributed neural networks for Internet of
Learning, vol. 1. Cambridge, MA, USA: MIT Press, 2006. Things: The big-little approach,” in Proc. 2nd Int. Summit Internet
[91] N. Le Roux and Y. Bengio, “Representational power of restricted Things IoT Infrastructures, Rome, Italy, Oct. 2016, pp. 484–492.
Boltzmann machines and deep belief networks,” Neural Comput., [115] N. P. Jouppi et al., “In-datacenter performance analysis of a tensor
vol. 20, no. 6, pp. 1631–1649, Jun. 2008. processing unit,” in Proc. ACM/IEEE 44th Annu. Int. Symp. Comput.
[92] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural Archit. (ISCA), 2017, pp. 1–12.
Inf. Process. Syst., 2014, pp. 2672–2680. [116] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel
[93] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A uni- programming with CUDA,” Queue, vol. 6, no. 2, pp. 40–53, 2008.
fied embedding for face recognition and clustering,” in Proc. IEEE [117] S. Chetlur et al., “cuDNN: Efficient primitives for deep learning,” arXiv
Conf. Comput. Vis. Pattern Recognit., Boston, MA, USA, 2015, preprint arXiv:1410.0759, 2014.
pp. 815–823. [118] M. Abadi et al., “TensorFlow: A system for large-scale machine learn-
[94] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, ing,” in Proc. USENIX Symp. Oper. Syst. Design Implement. (OSDI),
“Semi-supervised learning with deep generative models,” in Proc. vol. 16, 2016, pp. 265–283.
Adv. Neural Inf. Process. Syst., Montreal, QC, Canada, 2014, [119] Theano Development Team, “Theano: A Python framework for
pp. 3581–3589. fast computation of mathematical expressions,” arXiv e-prints
[95] R. Stewart and S. Ermon, “Label-free supervision of neural networks abs/1605.02688, May 2016.
with physics and domain knowledge,” in Proc. Nat. Conf. Artif. Intell. [120] Y. Jia et al., “Caffe: Convolutional architecture for fast feature embed-
(AAAI), 2017, pp. 2576–2582. ding,” arXiv preprint arXiv:1408.5093, 2014.
[96] D. J. Rezende, S. Mohamed, I. Danihelka, K. Gregor, and D. Wierstra, [121] R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A MATLAB-
“One-shot generalization in deep generative models,” in Proc. Int. Conf. like environment for machine learning,” in Proc. BigLearn NIPS
Mach. Learn. (ICML), 2016, pp. 1521–1529. Workshop, 2011, pp. 1–6.
[97] R. Socher, M. Ganjoo, C. D. Manning, and A. Y. Ng, “Zero-shot learn- [122] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240
ing through cross-modal transfer,” in Proc. Adv. Neural Inf. Process. G-Ops/s mobile coprocessor for deep neural networks,” in Proc. IEEE
Syst., 2013, pp. 935–943. Conf. Comput. Vis. Pattern Recognit. Workshops, 2014, pp. 682–687.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2278 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

[123] (2017). NCNN—A High-Performance Neural Network Inference [151] L. M. Vaquero and L. Rodero-Merino, “Finding your way in the
Framework Optimized for the Mobile Platform. Accessed: Jul. 25, fog: Towards a comprehensive definition of fog computing,” ACM
2017. [Online]. Available: https://github.com/Tencent/ncnn SIGCOMM Comput. Commun. Rev., vol. 44, no. 5, pp. 27–32, 2014.
[124] (2017). Huawei Announces the Kirin 970–New Flagship SoC [152] M. Aazam, S. Zeadally, and K. A. Harras, “Offloading in fog computing
With AI Capabilities. Accessed: Sep. 1, 2017. [Online]. Available: for IoT: Review, enabling technologies, and research opportunities,”
http://www.androidauthority.com/huawei-announces-kirin-970-797788/ Future Gener. Comput. Syst., vol. 87, pp. 278–289, Oct. 2018.
[125] (2017). Core ML: Integrate Machine Learning Models Into [153] R. Buyya et al., “A manifesto for future generation cloud computing:
Your App. Accessed: Jul. 25, 2017. [Online]. Available: Research directions for the next decade,” ACM Comput. Survey, vol. 51,
https://developer.apple.com/documentation/coreml no. 5, pp. 1–105, 2018.
[126] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the impor- [154] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing
tance of initialization and momentum in deep learning,” in Proc. Int. of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105,
Conf. Mach. Learn. (ICML), vol. 28, 2013, pp. 1139–1147. no. 12, pp. 2295–2329, Dec. 2017.
[127] J. Dean et al., “Large scale distributed deep networks,” in Proc. Adv. [155] S. Bang et al., “14.7 A 288μW programmable deep-learning proces-
Neural Inf. Process. Syst., 2012, pp. 1223–1231. sor with 270KB on-chip weight storage using non-uniform memory
[128] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” hierarchy for mobile intelligence,” in Proc. IEEE Int. Conf. Solid-State
in Proc. Int. Conf. Learn. Representations (ICLR), 2015, pp. 1–15. Circuits (ISSCC), 2017, pp. 250–251.
[129] T. Kraska et al., “MLbase: A distributed machine-learning system,” [156] F. Akopyan, “Design and tool flow of IBM’s TrueNorth: An ultra-low
in Proc. CIDR, vol. 1, 2013, pp. 1–7. power programmable neurosynaptic chip with 1 million neurons,” in
[130] T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, “Project Proc. ACM Int. Symp. Phys. Design, 2016, pp. 59–60.
ADAM: Building an efficient and scalable deep learning training [157] S. S. L. Oskouei, H. Golestani, M. Hashemi, and S. Ghiasi,
system,” in Proc. USENIX Symp. Oper. Syst. Design Implement. “CNNdroid: GPU-accelerated execution of trained deep convolutional
(OSDI), vol. 14, 2014, pp. 571–582. neural networks on Android,” in Proc. ACM Multimedia Conf., 2016,
[131] H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing, “GeePS: pp. 1201–1205.
Scalable deep learning on distributed GPUs with a GPU-specialized [158] C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, and S. Yang,
parameter server,” in Proc. 11th ACM Eur. Conf. Comput. Syst., 2016, “AdaNet: Adaptive structural learning of artificial neural networks,”
p. 4. in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 874–883.
[132] X. Lin et al., “All-optical machine learning using diffractive deep neural [159] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm
networks,” Science, vol. 361, no. 6406, pp. 1004–1008, 2018. for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,
[133] R. Spring and A. Shrivastava, “Scalable and sustainable deep learning 2006.
via randomized hashing,” in Proc. ACM SIGKDD Conf. Knowl. Disc. [160] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep
Data Min., 2017, pp. 445–454. belief networks for scalable unsupervised learning of hierarchical rep-
[134] A. Mirhoseini et al., “Device placement optimization with reinforce- resentations,” in Proc. 26th ACM Annu. Int. Conf. Mach. Learn., 2009,
ment learning,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1–11. pp. 609–616.
[135] E. P. Xing et al., “Petuum: A new platform for distributed machine [161] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
learning on big data,” IEEE Trans. Big Data, vol. 1, no. 2, pp. 49–67, “Stacked denoising autoencoders: Learning useful representations in a
Jun. 2015. deep network with a local denoising criterion,” J. Mach. Learn. Res.,
[136] P. Moritz et al., “Ray: A distributed framework for emerging AI appli- vol. 11, pp. 3371–3408, Dec. 2010.
cations,” in Proc. 13th USENIX Symp. Oper. Syst. Design Implement.
[162] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in
(OSDI), 2018, pp. 561–577.
Proc. Int. Conf. Learn. Representations (ICLR), 2014.
[137] M. Alzantot, Y. Wang, Z. Ren, and M. B. Srivastava, “RSTensorFlow:
[163] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
GPU enabled TensorFlow for deep learning on commodity Android
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
devices,” in Proc. 1st ACM Int. Workshop Deep Learn. Mobile Syst.
pp. 770–778.
Appl., 2017, pp. 7–12.
[164] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks
[138] H. Dong et al., “TensorLayer: A versatile library for efficient deep
for human action recognition,” IEEE Trans. Pattern Anal. Mach. Intell.,
learning development,” in Proc. ACM Multimedia Conf. (MM), 2017,
vol. 35, no. 1, pp. 221–231, Jan. 2013.
pp. 1201–1204.
[139] A. Paszke et al., “Automatic differentiation in PyTorch,” 2017. [165] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely
[140] T. Chen et al., “MXNet: A flexible and efficient machine learn- connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
ing library for heterogeneous distributed systems,” arXiv preprint Pattern Recognit., 2017, pp. 2261–2269.
arXiv:1512.01274, 2015. [166] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
[141] S. Ruder, “An overview of gradient descent optimization algorithms,” Continual prediction with LSTM,” in Proc. 9th Int. Conf. Artif. Neural
arXiv preprint arXiv:1609.04747, 2016. Netw. (ICANN), 1999, pp. 850–855.
[142] M. D. Zeiler, “ADADELTA: An adaptive learning rate method,” arXiv [167] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning
preprint arXiv:1212.5701, 2012. with neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2014,
[143] T. Dozat, “Incorporating Nesterov momentum into Adam,” in Proc. pp. 3104–3112.
Workshop Track (ICLR), 2016, pp. 1–4. [168] S. Xingjian et al., “Convolutional LSTM network: A machine learn-
[144] M. Andrychowicz et al., “Learning to learn by gradient descent by ing approach for precipitation nowcasting,” in Proc. Adv. Neural Inf.
gradient descent,” in Proc. Adv. Neural Inf. Process. Syst., 2016, Process. Syst., 2015, pp. 802–810.
pp. 3981–3989. [169] G.-J. Qi, “Loss-sensitive generative adversarial networks on Lipschitz
[145] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE densities,” IEEE Trans. Pattern Anal. Mach. Intell., 2018, pp. 1–34.
Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1–9. [170] A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN
[146] Y. Zhou, S. Chen, and A. Banerjee, “Stable gradient descent,” in Proc. training for high fidelity natural image synthesis,” arXiv preprint
Conf. Uncertainty Artif. Intell., 2018, pp. 1–10. arXiv:1809.11096, 2018.
[147] W. Wen et al., “TernGrad: Ternary gradients to reduce communication [171] D. Silver et al., “Mastering the game of GO with deep neural networks
in distributed deep learning,” in Proc. Adv. Neural Inf. Process. Syst., and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
2017, pp. 1–11. [172] M. Hessel et al., “Rainbow: Combining improvements in deep rein-
[148] F. Bonomi, R. Milito, P. Natarajan, and J. Zhu, “Fog computing: A plat- forcement learning,” in Proc. 32nd AAAI Conf. Artif. Intell. (AAAI),
form for Internet of Things and analytics,” in Proc. Big Data Internet 2017, pp. 3215–3222.
Things: A Roadmap Smart Environ., 2014, pp. 169–186. [173] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
[149] J. Mao, X. Chen, K. W. Nixon, C. Krieger, and Y. Chen, “MoDNN: “Proximal policy optimization algorithms,” arXiv preprint
Local distributed mobile computing system for deep neural network,” arXiv:1707.06347, 2017.
in Proc. IEEE Design Autom. Test Europe Conf. Exhibit. (DATE), 2017, [174] R. Collobert and S. Bengio, “Links between perceptrons, MLPs and
pp. 1396–1401. SVMs,” in Proc. 21st ACM Int. Conf. Mach. Learn., Banff, AB,
[150] M. Mukherjee, L. Shu, and D. Wang, “Survey of fog computing: Canada, 2004, p. 23.
Fundamental, network applications, and research challenges,” IEEE [175] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neu-
Commun. Surveys Tuts., vol. 20, no. 3, pp. 1826–1857, 3rd Quart., ral networks,” in Proc. 14th Int. Conf. Artif. Intell. Stat., 2011,
2018. pp. 315–323.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2279

[176] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self- [200] C. Ledig et al., “Photo-realistic single image super-resolution using
normalizing neural networks,” in Proc. Adv. Neural Inf. Process. Syst., a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis.
2017, pp. 971–980. Pattern Recognit., Honolulu, HI, USA, 2017, pp. 105–114.
[177] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep [201] J. Li et al., “Perceptual generative adversarial networks for small
network training by reducing internal covariate shift,” in Proc. Int. object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Conf. Mach. Learn., Long Beach, CA, USA, 2015, pp. 448–456. Honolulu, HI, USA, 2017, pp. 1951–1959.
[178] G. E. Hinton, “Training products of experts by minimizing contrastive [202] Y. Li, S. Liu, J. Yang, and M.-H. Yang, “Generative face completion,”
divergence,” Neural Comput., vol. 14, no. 8, pp. 1771–1800, 2002. in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI,
[179] G. Casella and E. I. George, “Explaining the Gibbs sampler,” Amer. USA, 2017, pp. 5892–5900.
Stat., vol. 46, no. 3, pp. 167–174, 1992. [203] P. Gawłowicz and A. Zubow, “ns3-gym: Extending OpenAI Gym for
networking research,” arXiv preprint arXiv:1810.03943, 2018.
[180] T. Kuremoto, M. Obayashi, K. Kobayashi, T. Hirata, and S. Mabu,
[204] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep
“Forecast chaotic time series data by DBNs,” in Proc. 7th IEEE
Q-learning with model-based acceleration,” in Proc. Int. Conf. Mach.
Int. Congr. Image Signal Process. (CISP), Dalian, China, 2014,
Learn., New York, NY, USA, 2016, pp. 2829–2838.
pp. 1130–1135.
[205] M. Moravčík et al., “DeepStack: Expert-level artificial intelligence in
[181] Y. Dauphin and Y. Bengio, “Stochastic ratio matching of RBMs for heads-up no-limit poker,” Science, vol. 356, no. 6337, pp. 508–513,
sparse high-dimensional inputs,” in Proc. Adv. Neural Inf. Process. 2017.
Syst., 2013, pp. 1340–1348. [206] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning
[182] T. N. Sainath et al., “Making deep belief networks effective for large hand-eye coordination for robotic grasping with deep learning and
vocabulary continuous speech recognition,” in Proc. IEEE Workshop large-scale data collection,” Int. J. Robot. Res., vol. 37, nos. 4–5,
Autom. Speech Recognit. Understanding (ASRU), Waikoloa, HI, USA, pp. 421–436, 2018.
2011, pp. 30–35. [207] A. El Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforce-
[183] Y. Bengio, “Learning deep architectures for AI,” Found. Trends ment learning framework for autonomous driving,” Electron. Imag.,
Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009. vol. 19, pp. 70–76, Jan. 2017.
[184] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with [208] DeepMind. (2019). AlphaStar: Mastering the Real-Time Strategy
nonlinear dimensionality reduction,” in Proc. ACM Workshop Mach. Game StarCraft II. Accessed: Mar. 1, 2019. [Online]. Available:
Learn. Sensory Data Anal. (MLSDA), 2014, p. 4. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-
[185] V. L. Cao, M. Nicolau, and J. McDermott, “A hybrid autoencoder and game-starcraft-ii/
density estimation model for anomaly detection,” in Proc. Int. Conf. [209] R. Li, Z. Zhao, X. Chen, J. Palicot, and H. Zhang, “TACT: A trans-
Parallel Probl. Solving Nat., 2016, pp. 717–726. fer actor–critic learning framework for energy saving in cellular radio
[186] V. L. L. Thing, “IEEE 802.11 network anomaly detection and attack access networks,” IEEE Trans. Wireless Commun., vol. 13, no. 4,
classification: A deep learning approach,” in Proc. IEEE Wireless pp. 2000–2011, Apr. 2014.
Commun. Netw. Conf. (WCNC), San Francisco, CA, USA, 2017, [210] H. A. A. Al-Rawi, M. A. Ng, and K.-L. A. Yau, “Application of
pp. 1–6. reinforcement learning to routing in distributed wireless networks:
[187] B. Mao et al., “Routing or computing? The paradigm shift towards A review,” Artif. Intell. Rev., vol. 43, no. 3, pp. 381–416,
intelligent computer network packet transmission based on deep learn- 2015.
ing,” IEEE Trans. Comput., vol. 66, no. 11, pp. 1946–1960, Nov. 2017. [211] Y.-J. Liu, L. Tang, S. Tong, C. L. P. Chen, and D.-J. Li, “Reinforcement
learning design-based adaptive tracking control with less learn-
[188] V. Radu et al., “Towards multimodal deep learning for activity recog- ing parameters for nonlinear discrete-time MIMO systems,” IEEE
nition on mobile devices,” in Proc. ACM Int. Joint Conf. Pervasive Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 165–176,
Ubiquitous Comput. Adjunct, 2016, pp. 185–188. Jan. 2015.
[189] V. Radu et al., “Multimodal deep learning for activity and con- [212] L. Pierucci and D. Micheli, “A neural network for quality of experi-
text recognition,” Proc. ACM Interact. Mobile Wearable Ubiquitous ence estimation in mobile communications,” IEEE MultiMedia, vol. 23,
Technol., vol. 1, no. 4, p. 157, 2018. no. 4, pp. 42–49, Oct./Dec. 2016.
[190] R. Raghavendra and C. Busch, “Learning deeply coupled autoen- [213] Y. L. Gwon and H. T. Kung, “Inferring origin flow patterns in Wi-Fi
coders for smartphone based robust periocular verification,” in Proc. with deep learning,” in Proc. 11th IEEE Int. Conf. Auton. Comput.
IEEE Int. Conf. Image Process. (ICIP), Phoenix, AZ, USA, 2016, (ICAC)), 2014, pp. 73–83.
pp. 325–329. [214] L. Nie, D. Jiang, S. Yu, and H. Song, “Network traffic prediction based
[191] J. Li, J. Wang, and Z. Xiong, “Wavelet-based stacked denoising on deep belief network in wireless mesh backbone networks,” in Proc.
autoencoders for cell phone base station user number prediction,” IEEE Wireless Commun. Netw. Conf. (WCNC), San Francisco, CA,
in Proc. IEEE Int. Conf. Internet Things (iThings) IEEE Green USA, 2017, pp. 1–5.
Comput. Commun. (GreenCom) IEEE Cyber Phys. Soc. Comput. [215] V. Moyo et al., “The generalization ability of artificial neural networks
(CPSCom) IEEE Smart Data (SmartData), Chengdu, China, 2016, in forecasting TCP/IP traffic trends: How much does the size of learning
pp. 833–838. rate matter?” Int. J. Comput. Sci. Appl., vol. 4, no. 1, pp. 9–17, 2015.
[192] O. Russakovsky et al., “ImageNet large scale visual recognition [216] J. Wang et al., “Spatiotemporal modeling and prediction in cellular
challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, networks: A big data enabled deep learning approach,” in Proc. 36th
2015. Annu. IEEE Int. Conf. Comput. Commun. (INFOCOM), Atlanta, GA,
[193] Y. Jeon and J. Kim, “Active convolution: Learning the shape of con- USA, 2017, pp. 1–9.
volution for image classification,” in Proc. IEEE Conf. Comput. Vis. [217] C. Zhang and P. Patras, “Long-term mobile traffic forecasting using
Pattern Recognit., Honolulu, HI, USA, 2017, pp. 1846–1854. deep spatio-temporal neural networks,” in Proc. 18th ACM Int. Symp.
[194] J. Dai et al., “Deformable convolutional networks,” in Proc. IEEE Int. Mobile Ad Hoc Netw. Comput., Los Angeles, CA, USA, 2018,
Conf. Comput. Vis., Venice, Italy, 2017, pp. 764–773. pp. 231–240.
[218] C. Zhang, X. Ouyang, and P. Patras, “ZipNet-GAN: Inferring fine-
[195] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNets v2: More
grained mobile traffic patterns via a generative adversarial neural
deformable, better results,” arXiv preprint arXiv:1811.11168, 2018.
network,” in Proc. 13th ACM Conf. Emerg. Netw. Exp. Technol.,
[196] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term depen- Incheon, South Korea, 2017, pp. 363–375.
dencies with gradient descent is difficult,” IEEE Trans. Neural Netw., [219] C.-W. Huang, C.-T. Chiang, and Q. Li, “A study of deep learning
vol. 5, no. 2, pp. 157–166, Mar. 1994. networks on mobile traffic forecasting,” in Proc. 28th IEEE Annu. Int.
[197] A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech recog- Symp. Pers. Indoor Mobile Radio Commun. (PIMRC), Montreal, QC,
nition with deep bidirectional LSTM,” in Proc. IEEE Workshop Canada, 2017, pp. 1–6.
Autom. Speech Recognit. Understand. (ASRU), Olomouc, Czechia, [220] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular traffic
2013, pp. 273–278. prediction based on densely connected convolutional neural networks,”
[198] R. Johnson and T. Zhang, “Supervised and semi-supervised text cat- IEEE Commun. Lett., vol. 22, no. 8, pp. 1656–1659, Aug. 2018.
egorization using LSTM for region embeddings,” in Proc. Int. Conf. [221] S. Navabi, C. Wang, O. Y. Bursalioglu, and H. Papadopoulos,
Mach. Learn. (ICML), New York, NY, USA, 2016, pp. 526–534. “Predicting wireless channel features using neural networks,” in Proc.
[199] I. Goodfellow, “NIPS 2016 tutorial: Generative adversarial networks,” IEEE Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018,
arXiv preprint arXiv:1701.00160, 2016. pp. 1–6.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2280 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

[222] Z. Wang, “The applications of deep learning on traffic identification,” [244] C. Stamate et al., “Deep learning Parkinson’s from smartphone data,”
in Proc. BlackHat USA, 2015, pp. 21–26. in Proc. IEEE Int. Conf. Pervasive Comput. Commun. (PerCom), Kona,
[223] W. Wang, M. Zhu, J. Wang, X. Zeng, and Z. Yang, “End-to-end HI, USA, 2017, pp. 31–40.
encrypted traffic classification with one-dimensional convolution neural [245] T. Quisel, L. Foschini, A. Signorini, and D. C. Kale, “Collecting
networks,” in Proc. IEEE Int. Conf. Intell. Security Informat., Beijing, and analyzing millions of mHealth data streams,” in Proc. 23rd ACM
China, 2017, pp. 43–48. SIGKDD Int. Conf. Knowl. Disc. Data Min., 2017, pp. 1971–1980.
[224] M. Lotfollahi, R. S. H. Zade, M. J. Siavoshani, and M. Saberian, “Deep [246] U. M. Khan, Z. Kabir, S. A. Hassan, and S. H. Ahmed, “A deep learning
packet: A novel approach for encrypted traffic classification using deep framework using passive WiFi sensing for respiration monitoring,” in
learning,” arXiv preprint arXiv:1709.02656, 2017. Proc. IEEE Glob. Commun. Conf. (GLOBECOM), Singapore, 2017,
[225] W. Wang, M. Zhu, X. Zeng, X. Ye, and Y. Sheng, “Malware traf- pp. 1–6.
fic classification using convolutional neural network for representation [247] D. Li, T. Salonidis, N. V Desai, and M. C. Chuah, “DeepCham:
learning,” in Proc. IEEE Int. Conf. Inf. Netw. (ICOIN), Da Nang, Collaborative edge-mediated adaptive deep learning for mobile object
Vietnam, 2017, pp. 712–717. recognition,” in Proc. IEEE/ACM Symp. Edge Comput. (SEC),
[226] V. C. Liang et al., “Mercury: Metro density prediction with recurrent Washington, DC, USA, 2016, pp. 64–76.
neural network on streaming CDR data,” in Proc. IEEE 32nd Int. Conf. [248] L. Tobías, A. Ducournau, F. Rousseau, G. Mercier, and R. Fablet,
Data Eng. (ICDE), Helsinki, Finland, 2016, pp. 1374–1377. “Convolutional neural networks for object recognition on mobile
[227] B. Felbo, P. Sundsøy, A. S. Pentland, S. Lehmann, and devices: A case study,” in Proc. 23rd IEEE Int. Conf. Pattern Recognit.
Y.-A. de Montjoye, “Using deep learning to predict demograph- (ICPR), Cancún, Mexico, 2016, pp. 3530–3535.
ics from mobile phone metadata,” in Proc. Workshop Track Int. Conf. [249] P. Pouladzadeh and S. Shirmohammadi, “Mobile multi-food recogni-
Learn. Represent. (ICLR), 2016. tion using deep learning,” ACM Trans. Mobile Comput. Commun. Appl.,
[228] N. C. Chen, W. Xie, R. E Welsch, K. Larson, and J. Xie, vol. 13, no. 3s, Aug. 2017, Art. no. 36.
“Comprehensive predictions of tourists’ next visit location based on [250] R. Tanno, K. Okamoto, and K. Yanai, “DeepFoodCam: A DCNN-
call detail records using machine learning and deep learning methods,” based real-time mobile food recognition system,” in Proc. 2nd ACM
in Proc. IEEE Int. Congr. Big Data (BigData Congr.), Honolulu, HI, Int. Workshop Multimedia Assisted Dietary Manag., Amsterdam, The
USA, 2017, pp. 1–6. Netherlands, 2016, p. 89.
[229] M. Yin, S. Feygin, M. Sheehan, J.-F. Paiement, and A. Pozdnoukhov, [251] P. Kuhad, A. Yassine, and S. Shimohammadi, “Using distance esti-
“Deep generative models of urban mobility,” IEEE Trans. Intell. mation and deep learning to simplify calibration in food calorie
Transp. Syst., to be published. measurement,” in Proc. IEEE Int. Conf. Comput. Intell. Virtual Environ.
[230] C. Xu, K. Chang, K.-C. Chua, M. Hu, and Z. Gao, “Large-scale Wi-Fi Meas. Syst. Appl. (CIVEMSA), Shenzhen, China, 2015, pp. 1–6.
hotspot classification via deep learning,” in Proc. 26th Int. Conf. World [252] T. Teng and X. Yang, “Facial expressions recognition based on convo-
Wide Web Companion, 2017, pp. 857–858. lutional neural networks for mobile virtual reality,” in Proc. 15th ACM
[231] Q. Meng, K. Wang, B. Liu, T. Miyazaki, and X. He, “QoE-based big SIGGRAPH Conf. Virtual Real. Continuum Appl. Ind., vol. 1, Zhuhai,
data analysis with deep learning in pervasive edge environment,” in China, 2016, pp. 475–478.
Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018,
[253] J. Rao, Y. Qiao, F. Ren, J. Wang, and Q. Du, “A mobile outdoor aug-
pp. 1–6.
mented reality method combining deep learning object detection and
[232] L. Fang, X. Cheng, H. Wang, and L. Yang, “Mobile demand fore-
spatial relationships for geovisualization,” Sensors, vol. 17, no. 9, 2017,
casting via deep graph-sequence spatiotemporal modeling in cellular
Art. no. E1951.
networks,” IEEE Internet Things J., vol. 5, no. 4, pp. 3091–3101,
[254] M. Zeng et al., “Convolutional neural networks for human activ-
Aug. 2018.
ity recognition using mobile sensors,” in Proc. 6th IEEE Int. Conf.
[233] C. Luo, J. Ji, Q. Wang, X. Chen, and P. Li, “Channel state information
Mobile Comput. Appl. Services (MobiCASE), Austin, TX, USA, 2014,
prediction for 5G wireless communications: A deep learning approach,”
pp. 197–205.
IEEE Trans. Netw. Sci. Eng., to be published.
[234] P. Li et al., “An improved stacked auto-encoder for network traffic flow [255] B. Almaslukh, J. AlMuhtadi, and A. Artoli, “An effective deep autoen-
classification,” IEEE Netw., vol. 32, no. 6, pp. 22–27, Nov./Dec. 2018. coder approach for online smartphone-based human activity recogni-
[235] J. Feng, X. Chen, R. Gao, M. Zeng, and Y. Li, “DeepTP: An end-to- tion,” Int. J. Comput. Sci. Netw. Security, vol. 17, no. 4, pp. 160–165,
end neural network for mobile cellular traffic prediction,” IEEE Netw., 2017.
vol. 32, no. 6, pp. 108–115, Nov./Dec. 2018. [256] X. Li, Y. Zhang, I. Marsic, A. Sarcevic, and R. S. Burd, “Deep learn-
[236] H. Zhu, Y. Cao, W. Wang, T. Jiang, and S. Jin, “Deep reinforcement ing for RFID-based activity recognition,” in Proc. 4th ACM Conf.
learning for mobile edge caching: Review, new features, and open Embedded Netw. Sensor Syst. (CD-ROM), Stanford, CA, USA, 2016,
issues,” IEEE Netw., vol. 32, no. 6, pp. 50–57, Nov./Dec. 2018. pp. 164–175.
[237] S. Liu and J. Du, “Poster: Mobiear-building an environment- [257] S. Bhattacharya and N. D. Lane, “From smart to deep: Robust activity
independent acoustic sensing platform for the deaf using deep learn- recognition on smartwatches using deep learning,” in Proc. IEEE Int.
ing,” in Proc. 14th ACM Annu. Int. Conf. Mobile Syst. Appl. Services Conf. Pervasive Comput. Commun. Workshops (PerCom Workshops),
Companion, Singapore, 2016, p. 50. Sydney, NSW, Australia, 2016, pp. 1–6.
[238] L. Sicong et al., “UbiEar: Bringing location-independent sound aware- [258] A. Antoniou and P. Angelov, “A general purpose intelligent surveil-
ness to the hard-of-hearing people with smartphones,” Proc. ACM lance system for mobile devices using deep learning,” in Proc. IEEE
Interact. Mobile Wearable Ubiquitous Technol., vol. 1, no. 2, 2017, Int. Joint Conf. Neural Netw. (IJCNN), Vancouver, BC, Canada, 2016,
Art. no. 17. pp. 2879–2886.
[239] V. Jindal, “Integrating mobile and cloud for PPG signal selection to [259] S. Wang, J. Song, J. Lien, I. Poupyrev, and O. Hilliges,
monitor heart rate during intensive physical exercise,” in Proc. ACM “Interacting with soli: Exploring fine-grained dynamic gesture recog-
Int. Workshop Mobile Softw. Eng. Syst., Austin, TX, USA, 2016, nition in the radio-frequency spectrum,” in Proc. 29th ACM
pp. 36–37. Annu. Symp. User Interface Softw. Technol., Tokyo, Japan, 2016,
[240] E. Kim, M. Corte-Real, and Z. Baloch, “A deep semantic mobile appli- pp. 851–860.
cation for thyroid cytopathology,” in Proc. Med. Imag. PACS Imag. [260] Y. Gao et al., “iHear food: Eating detection using commodity
Informat. Next Gener. Innov., vol. 9789. San Diego, CA, USA, 2016, Bluetooth headsets,” in Proc. IEEE 1st Int. Conf. Connected Health
Art. no. 97890A. Appl. Syst Eng. Technol. (CHASE), Washington, DC, USA, 2016,
[241] A. Sathyanarayana et al., “Sleep quality prediction from wearable data pp. 163–172.
using deep learning,” JMIR mHealth uHealth, vol. 4, no. 4, 2016, [261] J. Zhu, A. Pande, P. Mohapatra, and J. J. Han, “Using deep learning
Art. no. e125. for energy expenditure estimation with wearable sensors,” in Proc. 17th
[242] H. Li and M. Trocan, “Personal health indicators by deep learning IEEE Int. Conf. E-health Netw. Appl. Services (HealthCom), Boston,
of smart phone sensor data,” in Proc. 3rd IEEE Int. Conf. Cybern. MA, USA, 2015, pp. 501–506,.
(CYBCONF), Exeter, U.K., 2017, pp. 1–5. [262] P. Sundsøy, J. Bjelland, B.-A. Reme, A. M. Iqbal, and E. Jahani,
[243] M.-P. Hosseini, T. X. Tran, D. Pompili, K. Elisevich, and “Deep learning applied to mobile phone data for individual income
H. Soltanian-Zadeh, “Deep learning with edge computing for localiza- classification,” in Proc. ICAITA, 2016, pp. 96–99.
tion of epileptogenicity using multimodal rs-fMRI and EEG big data,” [263] Y. Chen and Y. Xue, “A deep learning approach to human activity
in Proc. IEEE Int. Conf. Auton. Comput. (ICAC), Columbus, OH, USA, recognition based on single accelerometer,” in Proc. IEEE Int. Conf.
2017, pp. 83–92. Syst. Man Cybern. (SMC), 2015, pp. 1488–1492.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2281

[264] S. Ha and S. Choi, “Convolutional neural networks for human activity [286] J. Lee, J. Kwon, and H. Kim, “Reducing distraction of smartwatch users
recognition using multiple accelerometer and gyroscope sensors,” in with deep learning,” in Proc. 18th ACM Int. Conf. Human–Comput.
Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Vancouver, BC, Interact. Mobile Devices Services Adjunct, 2016, pp. 948–953.
Canada, 2016, pp. 381–388. [287] T. H. Vu, L. Dung, and J.-C. Wang, “Transportation mode detection on
[265] M. Edel and E. Köppe, “Binarized-BLSTM-RNN based human activity mobile devices using recurrent nets,” in Proc. ACM Multimedia Conf.,
recognition,” in Proc. IEEE Int. Conf. Indoor Position. Indoor Navig. 2016, pp. 392–396.
(IPIN), 2016, pp. 1–7. [288] S.-H. Fang, Y.-X. Fei, Z. Xu, and Y. Tsao, “Learning transportation
[266] S. Xue et al., “AppDNA: App behavior profiling via graph-based deep modes from smartphone sensors based on deep neural network,” IEEE
learning,” in Proc. IEEE Conf. Comput. Commun., Honolulu, HI, USA, Sensors J., vol. 17, no. 18, pp. 6111–6118, Sep. 2017.
2018, pp. 1475–1483. [289] M. Zhao et al., “RF-based 3D skeletons,” in Proc. ACM Conf.
[267] H. Liu et al., “Finding the stars in the fireworks: Deep understanding ACM Special Interest Group Data Commun. (SIGCOMM), 2018,
of motion sensor fingerprint,” in Proc. IEEE Conf. Comput. Commun., pp. 267–281.
Honolulu, HI, USA, 2018, pp. 126–134. [290] K. Katevas, I. Leontiadis, M. Pielot, and J. Serrà, “Practical process-
[268] T. Okita and S. Inoue, “Recognition of multiple overlapping activities ing of mobile sensor data for continual deep learning predictions,” in
using compositional CNN-LSTM model,” in Proc. ACM Int. Joint Conf. Proc. 1st ACM Int. Workshop Deep Learn. Mobile Syst. Appl., 2017,
Pervasive Ubiquitous Comput. ACM Int. Symp. Wearable Comput., pp. 19–24.
2017, pp. 165–168. [291] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, “DeepSense:
[269] G. Mittal, K. B. Yagnik, M. Garg, and N. C. Krishnan, “SpotGarbage: A unified deep learning framework for time-series mobile sensing
Smartphone app to detect garbage using deep learning,” in Proc. ACM data processing,” in Proc. 26th Int. Conf. World Wide Web, 2017,
Int. Joint Conf. Pervasive Ubiquitous Comput., Heidelberg, Germany, pp. 351–360.
2016, pp. 940–945. [292] K. Ohara, T. Maekawa, and Y. Matsushita, “Detecting state changes of
[270] L. Seidenari et al., “Deep artwork detection and retrieval for automatic indoor everyday objects using Wi-Fi channel state information,” Proc.
context-aware audio guides,” ACM Trans. Mobile Comput. Commun. ACM Interact. Mobile Wearable Ubiquitous Technol., vol. 1, no. 3,
Appl., vol. 13, no. 3s, 2017, Art. no. 35. p. 88, 2017.
[271] X. Zeng, K. Cao, and M. Zhang, “MobileDeepPill: A small-footprint [293] W. Liu, H. Ma, H. Qi, D. Zhao, and Z. Chen, “Deep learning hashing
mobile deep learning system for recognizing unconstrained pill for mobile visual search,” EURASIP J. Image Video Process., vol. 2017,
images,” in Proc. 15th ACM Annu. Int. Conf. Mobile Syst. Appl. p. 17, Dec. 2017.
Services, 2017, pp. 56–67. [294] X. Ouyang, C. Zhang, P. Zhou, and H. Jiang, “DeepSpace: An online
[272] H. Zou et al., “DeepSense: Device-free human activity recognition via deep learning framework for mobile big data to understand human
autoencoder long-term recurrent convolutional network,” in Proc. IEEE mobility patterns,” arXiv preprint arXiv:1610.07009, 2016.
Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6. [295] H. Yang, Z. Li, and Z. Liu, “Neural networks for MANET AODV: An
[273] X. Zeng, “Mobile sensing through deep learning,” in Proc. Workshop optimization approach,” Clust. Comput., vol. 20, no. 4, pp. 3369–3377,
MobiSys Ph.D. Forum, 2017, pp. 5–6. 2017.
[274] X. Wang, L. Gao, and S. Mao, “PhaseFi: Phase fingerprinting for [296] X. Song, H. Kanasugi, and R. Shibasaki, “DeepTransport: Prediction
indoor localization with a deep learning approach,” in Proc. IEEE Glob. and simulation of human mobility and transportation mode at a city-
Commun. Conf. (GLOBECOM), San Diego, CA, USA, 2015, pp. 1–6. wide level,” in Proc. Int. Joint Conf. Artif. Intell., New York, NY, USA,
[275] X. Wang, L. Gao, and S. Mao, “CSI phase fingerprinting for indoor 2016, pp. 2618–2624.
localization with a deep learning approach,” IEEE Internet Things J., [297] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual
vol. 3, no. 6, pp. 1113–1123, Dec. 2016. networks for citywide crowd flows prediction,” in Proc. Nat. Conf.
[276] C. Feng, S. Arshad, R. Yu, and Y. Liu, “Evaluation and improve- Artif. Intell. (AAAI), 2017, pp. 1655–1661.
ment of activity detection systems with recurrent neural network,” in [298] J. V. Subramanian and M. A. K. Sadiq, “Implementation of artifi-
Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018, cial neural network for mobile movement prediction,” Indian J. Sci.
pp. 1–6. Technol., vol. 7, no. 6, pp. 858–863, 2014.
[277] B. Cao et al., “DeepMood: Modeling mobile phone typing dynamics [299] L. S. Ezema and C. I. Ani, “Artificial neural network approach
for mood detection,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. to mobile location estimation in GSM network,” Int. J. Electron.
Disc. Data Min., 2017, pp. 747–755. Telecommun., vol. 63, no. 1, pp. 39–44, 2017.
[278] X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, “DeepDecision: [300] W. Shao et al., “DePedo: Anti periodic negative-step movement
A mobile deep learning framework for edge video analytics,” in pedometer with deep convolutional neural networks,” in Proc. IEEE
Proc. IEEE Int. Conf. Comput. Commun., Honolulu, HI, USA, 2018, Int. Conf. Commun. (ICC), 2018, pp. 1–6.
pp. 1421–1429. [301] Y. Yayeh et al., “Mobility prediction in mobile ad-hoc network using
[279] Siri Team. (2017). Deep Learning for Siri’s Voice: On-Device deep learning,” in Proc. IEEE Int. Conf. Appl. Syst. Invention (ICASI),
Deep Mixture Density Networks for Hybrid Unit Selection 2018, pp. 1203–1206.
Synthesis. Accessed: Sep. 16, 2017. [Online]. Available: [302] Q. Chen, X. Song, H. Yamada, and R. Shibasaki, “Learning deep
https://machinelearning.apple.com/2017/08/06/siri-voices.html representation from big and heterogeneous data for traffic accident
[280] I. McGraw et al., “Personalized speech recognition on mobile devices,” inference,” in Proc. Nat. Conf. Artif. Intell. (AAAI), 2016, pp. 338–344.
in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), [303] X. Song et al., “DeepMob: Learning deep knowledge of human emer-
Shanghai, China, 2016, pp. 5955–5959. gency behavior and mobility from big and heterogeneous data,” ACM
[281] R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw, “On Trans. Inf. Syst., vol. 35, no. 4, p. 41, 2017.
the compression of recurrent neural networks with an application to [304] D. Yao, C. Zhang, Z. Zhu, J. Huang, and J. Bi, “Trajectory clustering
LVCSR acoustic modeling for embedded speech recognition,” in Proc. via deep representation learning,” in Proc. IEEE Int. Joint Conf. Neural
IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Shanghai, Netw. (IJCNN), 2017, pp. 3880–3887.
China, 2016, pp. 5970–5974. [305] Z. Liu, Z. Li, K. Wu, and M. Li, “Urban traffic prediction from mobil-
[282] T. Yoshioka et al., “The NTT CHiME-3 system: Advances in speech ity data using deep learning,” IEEE Netw., vol. 32, no. 4, pp. 40–46,
enhancement and recognition for mobile multi-microphone devices,” in Jul./Aug. 2018.
Proc. IEEE Workshop Autom. Speech Recognit. Understand. (ASRU), [306] D. S. Wickramasuriya, C. A. Perumalla, K. Davaslioglu, and
Scottsdale, AZ, USA, 2015, pp. 436–443. R. D. Gitlin, “Base station prediction and proactive mobility manage-
[283] S. Ruan, J. O. Wobbrock, K. Liou, A. Ng, and J. Landay, “Speech is ment in virtual cells using recurrent neural networks,” in Proc. IEEE
3x faster than typing for English and mandarin text entry on mobile Wireless Microw. Technol. Conf. (WAMICON), 2017, pp. 1–6.
devices,” arXiv preprint arXiv:1608.07323, 2016. [307] J. Tkačík and P. Kordík, “Neural Turing machine for sequential learning
[284] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. V. Gool, of human mobility patterns,” in Proc. IEEE Int. Joint Conf. Neural
“DSLR-quality photos on mobile devices with deep convolutional Netw. (IJCNN), Vancouver, BC, Canada, 2016, pp. 2790–2797.
networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, [308] D. Y. Kim and H. Y. Song, “Method of predicting human mobility
2017, pp. 3297–3305. patterns using deep learning,” Neurocomputing, vol. 280, pp. 56–64,
[285] Z. Lu, N. Felemban, K. Chan, and T. La Porta, “Demo abstract: Mar. 2018.
On-demand information retrieval from videos using deep learning in [309] R. Jiang et al., “DeepUrbanMomentum: An online deep-learning
wireless networks,” in Proc. IEEE/ACM 2nd Int. Conf. Internet Things system for short-term urban mobility prediction,” in Proc. Nat. Conf.
Design Implement. (IoTDI), 2017, pp. 279–280. Artif. Intell. (AAAI), 2018, pp. 784–791.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2282 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

[310] C. Wang, Z. Zhao, Q. Sun, and H. Zhang, “Deep learning-based intel- [332] A. Niitsoo, T. Edelhäußer, and C. Mutschler, “Convolutional neural
ligent dual connectivity for mobility management in dense network,” networks for position estimation in TDoA-based locating systems,” in
in Proc. IEEE 88th Veh. Technol. Conf., 2018, p. 5. Proc. IEEE Int. Conf. Indoor Position. Indoor Navig. (IPIN), 2018,
[311] R. Jiang et al., “Deep ROI-based modeling for urban human mobil- pp. 1–8.
ity prediction,” in Proc. ACM Interact. Mobile Wearable Ubiquitous [333] X. Wang, X. Wang, and S. Mao, “Deep convolutional neural networks
Technol. (IMWUT), vol. 2, no. 1, 2018, p. 14. for indoor localization with CSI images,” IEEE Trans. Netw. Sci. Eng.,
[312] J. Feng et al., “DeepMove: Predicting human mobility with attentional to be published.
recurrent networks,” in Proc. World Wide Web Conf. World Wide Web, [334] C. Xiao, D. Yang, Z. Chen, and G. Tan, “3-D BLE indoor local-
2018, pp. 1459–1468. ization based on denoising autoencoder,” IEEE Access, vol. 5,
[313] X. Wang, L. Gao, S. Mao, and S. Pandey, “DeepFi: Deep learning pp. 12751–12760, 2017.
for indoor fingerprinting using channel state information,” in Proc. [335] C.-Y. Hsu et al., “Zero-effort in-home sleep and insomnia monitor-
IEEE Wireless Commun. Netw. Conf. (WCNC), New Orleans, LA, USA, ing using radio signals,” in Proc. ACM Interact. Mobile Wearable
2015, pp. 1666–1671. Ubiquitous Technol. (IMWUT), vol. 1, no. 3, Sep. 2017, Art. no. 59.
[314] X. Wang, X. Wang, and S. Mao, “CiFi: Deep convolutional neural [336] W. Guan et al., “High-precision approach to localization scheme
networks for indoor localization with 5 GHz Wi-Fi,” in Proc. IEEE of visible light communication based on artificial neural networks
Int. Conf. Commun. (ICC), Paris, France, 2017, pp. 1–6. and modified genetic algorithms,” Opt. Eng., vol. 56, no. 10, 2017,
[315] X. Wang, L. Gao, and S. Mao, “BiLoc: Bi-modal deep learning for Art. no. 106103.
indoor localization with commodity 5GHz WiFi,” IEEE Access, vol. 5, [337] P.-J. Chuang and Y.-J. Jiang, “Effective neural network-based node
pp. 4209–4220, 2017. localisation scheme for wireless sensor networks,” IET Wireless Sensor
[316] M. Nowicki and J. Wietrzykowski, “Low-effort place recognition with Syst., vol. 4, no. 2, pp. 97–103, Jun. 2014.
WiFi fingerprints using deep learning,” in Proc. Int. Conf. Autom., 2017, [338] M. Bernas and B. Płaczek, “Fully connected neural networks ensem-
pp. 575–584. ble with signal strength clustering for indoor localization in wireless
[317] X. Zhang, J. Wang, Q. Gao, X. Ma, and H. Wang, “Device-free wire- sensor networks,” Int. J. Distrib. Sensor Netw., vol. 11, no. 12, 2015,
less localization and activity recognition with deep learning,” in Proc. Art. no. 403242.
IEEE Int. Conf. Pervasive Comput. Commun. Workshops (PerCom [339] A. Payal, C. S. Rai, and B. V. R. Reddy, “Analysis of some feedforward
Workshops), 2016, pp. 1–5. artificial neural network training algorithms for developing localiza-
[318] J. Wang, X. Zhang, Q. Gao, H. Yue, and H. Wang, “Device-free wire- tion framework in wireless sensor networks,” Wireless Pers. Commun.,
less localization and activity recognition: A deep learning approach,” vol. 82, no. 4, pp. 2519–2536, 2015.
IEEE Trans. Veh. Technol., vol. 66, no. 7, pp. 6258–6267, Jul. 2017. [340] Y. Dong, Z. Li, R. Wang, and K. Zhang, “Range-based localization in
[319] M. Mohammadi, A. Al-Fuqaha, M. Guizani, and J.-S. Oh, “Semi- underwater wireless sensor networks using deep neural network,” in
supervised deep reinforcement learning in support of IoT and smart Proc. IPSN, 2017, pp. 321–322.
city services,” IEEE Internet Things J., vol. 5, no. 2, pp. 624–635, [341] X. Yan et al., “Real-time identification of smoldering and flaming
Apr. 2018. combustion phases in forest using a wireless sensor network-based
[320] N. Anzum, S. F. Afroze, and A. Rahman, “Zone-based indoor local- multi-sensor system and artificial neural network,” Sensors, vol. 16,
ization using neural networks: A view from a real testbed,” in Proc. no. 8, p. 1228, 2016.
IEEE Int. Conf. Commun. (ICC), 2018, pp. 1–7. [342] B. Wang, X. Gu, L. Ma, and S. Yan, “Temperature error correc-
[321] X. Wang, Z. Yu, and S. Mao, “DeepML: Deep LSTM for indoor local- tion based on BP neural network in meteorological wireless sensor
ization with smartphone magnetic and light sensors,” in Proc. IEEE Int. network,” Int. J. Sensor Netw., vol. 23, no. 4, pp. 265–278, 2017.
Conf. Commun. (ICC), 2018, pp. 1–6. [343] K.-S. Lee, S.-R. Lee, Y. Kim, and C.-G. Lee, “Deep learning–based
[322] A. K. T. R. Kumar, B. Schäufele, D. Becker, O. Sawade, and real-time query processing for wireless sensor network,” Int. J. Distrib.
I. Radusch, “Indoor localization of vehicles using deep learning,” in Sensor Netw., vol. 13, no. 5, pp. 1–10, 2017.
Proc. 17th IEEE Int. Symp. World Wireless Mobile Multimedia Netw. [344] J. Li and G. Serpen, “Adaptive and intelligent wireless sensor networks
(WoWMoM), 2016, pp. 1–6. through neural networks: An illustration for infrastructure adaptation
[323] Z. Zhengj and J. Weng, “Mobile device based outdoor navigation through Hopfield network,” Appl. Intell., vol. 45, no. 2, pp. 343–362,
with on-line learning neural network: A comparison with convolutional 2016.
neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. [345] F. Khorasani and H. R. Naji, “Energy efficient data aggregation in
Workshops, 2016, pp. 11–18. wireless sensor networks using neural networks,” Int. J. Sensor Netw.,
[324] J. Vieira, E. Leitinger, M. Sarajlic, X. Li, and F. Tufvesson, “Deep vol. 24, no. 1, pp. 26–42, 2017.
convolutional neural networks for massive MIMO fingerprint-based [346] C. Li, X. Xie, Y. Huang, H. Wang, and C. Niu, “Distributed data
positioning,” in Proc. 28th IEEE Annu. Int. Symp. Pers. Indoor Mobile mining based on deep neural network for wireless sensor network,”
Radio Commun., Montreal, QC, Canada, 2017, pp. 1–6. Int. J. Distrib. Sensor Netw., vol. 11, no. 7, 2015, Art. no. 157453.
[325] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based fingerprinting [347] T. Luo and S. G. Nagarajany, “Distributed anomaly detection using
for indoor localization: A deep learning approach,” IEEE Trans. Veh. autoencoder neural networks in WSN for IoT,” in Proc. IEEE Int. Conf.
Technol., vol. 66, no. 1, pp. 763–776, Jan. 2017. Commun. (ICC), 2018, pp. 1–6.
[326] H. Chen, Y. Zhang, W. Li, X. Tao, and P. Zhang, “ConFi: Convolutional [348] D. P. Kumar, T. Amgoth, and C. S. R. Annavarapu, “Machine learning
neural networks based indoor Wi-Fi localization using channel state algorithms for wireless sensor networks: A survey,” Inf. Fusion, vol. 49,
information,” IEEE Access, vol. 5, pp. 18066–18074, 2017. pp. 1–25, Sep. 2019.
[327] A. Shokry, M. Torki, and M. Youssef, “DeepLoc: A ubiquitous [349] N. Heydari and B. Minaei-Bidgoli, “Reduce energy consumption and
accurate and low-overhead outdoor cellular localization system,” in send secure data wireless multimedia sensor networks using a combi-
Proc. 26th ACM SIGSPATIAL Int. Conf. Adv. Geograph. Inf. Syst., nation of techniques for multi-layer watermark and deep learning,” Int.
2018, pp. 339–348. J. Comput. Sci. Netw. Security, vol. 17, no. 2, pp. 98–105, 2017.
[328] R. Zhou, M. Hao, X. Lu, M. Tang, and Y. Fu, “Device-free localization [350] S. Phoemphon, C. So-In, and D. T. Niyato, “A hybrid model using
based on CSI fingerprints and deep neural networks,” in Proc. 15th fuzzy logic and an extreme learning machine with vector particle
Annu. IEEE Int. Conf. Sens. Commun. Netw. (SECON), 2018, swarm optimization for wireless sensor network localization,” Appl.
pp. 1–9. Soft Comput., vol. 65, pp. 101–120, Apr. 2018.
[329] W. Zhang, R. Sengupta, J. Fodero, and X. Li, “DeepPositioning: [351] S. S. Banihashemian, F. Adibnia, and M. A. Sarram, “A new range-
Intelligent fusion of pervasive magnetic field and WiFi finger- free and storage-efficient localization algorithm using neural networks
printing for smartphone indoor localization via deep learning,” in in wireless sensor networks,” Wireless Pers. Commun., vol. 98, no. 1,
Proc. 16th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), 2017, pp. 1547–1568, 2018.
pp. 7–13. [352] W. Sun et al., “WNN-LQE: Wavelet-neural-network-based link
[330] A. Adege, H.-P. Lin, G. Tarekegn, and S.-S. Jeng, “Applying deep quality estimation for smart grid WSNs,” IEEE Access, vol. 5,
neural network (DNN) for robust indoor localization in multi-building pp. 12788–12797, 2017.
environment,” Appl. Sci., vol. 8, no. 7, p. 1062, 2018. [353] J. Kang, Y.-J. Park, J. Lee, S.-H. Wang, and D.-S. Eom, “Novel leak-
[331] M. Ibrahim, M. Torki, and M. ElNainay, “CNN based indoor localiza- age detection by ensemble CNN-SVM and graph-based localization in
tion using RSS time-series,” in Proc. IEEE Symp. Comput. Commun. water distribution systems,” IEEE Trans. Ind. Electron., vol. 65, no. 5,
(ISCC), 2018, pp. 1044–1049. pp. 4279–4289, May 2018.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2283

[354] A. Mehmood, Z. Lv, J. Lloret, and M. M. Umar, “ELDC: An artificial [376] H. Ye and G. Y. Li, “Deep reinforcement learning for resource allo-
neural network based energy-efficient and robust routing scheme for cation in V2V communications,” in Proc. IEEE Int. Conf. Commun.
pollution monitoring in WSNs,” IEEE Trans. Emerg. Topics Comput., (ICC), 2018, pp. 1–6.
to be published. [377] U. Challita, L. Dong, and W. Saad, “Proactive resource management
[355] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Rate-distortion for LTE in unlicensed spectrum: A deep learning perspective,” IEEE
balanced data compression for wireless sensor networks,” IEEE Sensors Trans. Wireless Commun., vol. 17, no. 7, pp. 4674–4689, Jul. 2018.
J., vol. 16, no. 12, pp. 5072–5083, Jun. 2016. [378] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning
[356] A. E. Assaf, S. Zaidi, S. Affes, and N. Kandil, “Robust ANNs-based for dynamic spectrum access in multichannel wireless networks,” in
WSN localization in the presence of anisotropic signal attenuation,” Proc. IEEE Glob. Commun. Conf., 2017, pp. 1–7.
IEEE Wireless Commun. Lett., vol. 5, no. 5, pp. 504–507, Oct. 2016. [379] T. J. O’Shea and T. C. Clancy, “Deep reinforcement learning radio
[357] Y. Wang et al., “A deep learning approach for blind drift calibration control and signal detection with KeRLym, a Gym RL agent,” arXiv
of sensor networks,” IEEE Sensors J., vol. 17, no. 13, pp. 4158–4171, preprint arXiv:1605.09221, 2016.
Jul. 2017. [380] M. A. Wijaya, K. Fukawa, and H. Suzuki, “Intercell-interference can-
[358] Z. Jia et al., “Continuous low-power ammonia monitoring using cellation and neural network transmit power optimization for MIMO
long short-term memory neural networks,” in Proc. 16th ACM Conf. channels,” in Proc. IEEE 82nd Veh. Technol. Conf. (VTC Fall), Boston,
Embedded Netw. Sensor Syst., 2018, pp. 224–236. MA, USA, 2015, pp. 1–5.
[359] L. Liu, Y. Cheng, L. Cai, S. Zhou, and Z. Niu, “Deep learning based [381] H. Rutagemwa, A. Ghasemi, and S. Liu, “Dynamic spectrum assign-
optimization in wireless network,” in Proc. IEEE Int. Conf. Commun. ment for land mobile radio with deep recurrent neural networks,” in
(ICC), Paris, France, 2017, pp. 1–6. Proc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2018,
[360] S. Subramanian and A. Banerjee, “Poster: Deep learning enabled M2M pp. 1–6.
gateway for network optimization,” in Proc. 14th ACM Annu. Int. Conf. [382] M. A. Wijaya, K. Fukawa, and H. Suzuki, “Neural network based trans-
Mobile Syst. Appl. Services Companion, 2016, p. 144. mit power control and interference cancellation for MIMO small cell
[361] Y. He, C. Liang, F. R. Yu, N. Zhao, and H. Yin, “Optimization of networks,” IEICE Trans. Commun., vol. E99-B, no. 5, pp. 1157–1169,
cache-enabled opportunistic interference alignment wireless networks: 2016.
A big data deep reinforcement learning approach,” in Proc. IEEE Int. [383] H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video stream-
Conf. Commun. (ICC), Paris, France, 2017, pp. 1–6. ing with pensieve,” in Proc. Conf. ACM Special Interest Group Data
Commun. (SIGCOMM), 2017, pp. 197–210.
[362] Y. He et al., “Deep reinforcement learning-based optimization for
cache-enabled opportunistic interference alignment wireless networks,” [384] T. Oda, R. Obukata, M. Ikeda, L. Barolli, and M. Takizawa, “Design
IEEE Trans. Veh. Technol., vol. 66, no. 11, pp. 10433–10445, and implementation of a simulation system based on deep Q-network
Nov. 2017. for mobile actor node control in wireless sensor and actor networks,” in
Proc. 31st IEEE Int. Conf. Adv. Inf. Netw. Appl. Workshops (WAINA),
[363] F. B. Mismar and B. L. Evans, “Deep reinforcement learning for
2017, pp. 195–200.
improving downlink mmWave communication performance,” arXiv
[385] T. Oda et al., “Performance evaluation of a deep Q-network based
preprint arXiv:1707.02329, 2017.
simulation system for actor node mobility control in wireless sensor and
[364] Z. Wang, L. Li, Y. Xu, H. Tian, and S. Cui, “Handover optimization via
actor networks considering three-dimensional environment,” in Proc.
asynchronous multi-user deep reinforcement learning,” in Proc. IEEE
Int. Conf. Intell. Netw. Collaborative Syst., 2017, pp. 41–52.
Int. Conf. Commun. (ICC), 2018, pp. 1–6.
[386] H.-Y. Kim and J.-M. Kim, “A load balancing scheme based on deep-
[365] Z. Chen and D. B. Smith, “Heterogeneous machine-type communi- learning in IoT,” Clust. Comput., vol. 20, no. 1, pp. 873–878, 2017.
cations in cellular networks: Random access optimization by deep
[387] U. Challita, W. Saad, and C. Bettstetter, “Deep reinforcement learning
reinforcement learning,” in Proc. IEEE Int. Conf. Commun. (ICC),
for interference-aware path planning of cellular-connected UAVs,” in
2018, pp. 1–6.
Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018,
[366] L. Chen, J. Lingys, K. Chen, and F. Liu, “AuTO: Scaling pp. 1–7.
deep reinforcement learning for datacenter-scale automatic traffic [388] C. Luo, J. Ji, Q. Wang, L. Yu, and P. Li, “Online power control for 5G
optimization,” in Proc. ACM Conf. Special Interest Group Data wireless communications: A deep Q-network approach,” in Proc. IEEE
Commun. (SIGCOMM), 2018, pp. 191–205. Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6.
[367] Y. Lee, “Classification of node degree based on deep learning and [389] Y. Yu, T. Wang, and S. C. Liew, “Deep-reinforcement learning multiple
routing method applied for virtual route assignment,” Ad Hoc Netw., access for heterogeneous wireless networks,” in Proc. IEEE Int. Conf.
vol. 58, pp. 70–85, Apr. 2017. Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–7.
[368] F. Tang et al., “On removing routing protocol from future wireless [390] Z. Xu et al., “Experience-driven networking: A deep reinforcement
networks: A real-time deep learning approach for intelligent traf- learning based approach,” in Proc. IEEE Int. Conf. Comput. Commun.,
fic control,” IEEE Wireless Commun., vol. 25, no. 1, pp. 154–160, Honolulu, HI, USA, 2018, pp. 1871–1879.
Feb. 2018. [391] J. Liu, B. Krishnamachari, S. Zhou, and Z. Niu, “DeepNap: Data-driven
[369] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, and P. Li, “Energy- base station sleeping operations through deep reinforcement learning,”
efficient scheduling for real-time systems based on deep Q-learning IEEE Internet Things J., vol. 5, no. 6, pp. 4273–4282, Dec. 2018.
model,” IEEE Trans. Sustain. Comput., vol. 4, no. 1, pp. 132–141, [392] Z. Zhao et al., “Deep reinforcement learning for network slicing,” arXiv
Jan./Mar. 2019. preprint arXiv:1805.06591, 2018.
[370] R. Atallah, C. Assi, and M. Khabbaz, “Deep reinforcement learning- [393] J. Li, H. Gao, T. Lv, and Y. Lu, “Deep reinforcement learning based
based scheduling for roadside communication networks,” in Proc. 15th computation offloading and resource allocation for MEC,” in Proc.
IEEE Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), IEEE Wireless Commun. Netw. Conf. (WCNC), Barcelona, Spain, 2018,
Paris, France, 2017, pp. 1–8. pp. 1–6.
[371] S. Chinchali et al., “Cellular network traffic scheduling with deep [394] R. Mennes, M. Camelo, M. Claeys, and S. Latré, “A neural-network-
reinforcement learning,” in Proc. Nat. Conf. Artif. Intell. (AAAI), 2018. based MF-TDMA MAC scheduler for collaborative wireless networks,”
[372] Y. Wei, Z. Zhang, F. R. Yu, and Z. Han, “Joint user scheduling and in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Barcelona,
content caching strategy for mobile edge networks using deep rein- Spain, 2018, pp. 1–6.
forcement learning,” in Proc. IEEE Int. Conf. Commun. Workshops [395] Y. Zhou, Z. M. Fadlullah, B. Mao, and N. Kato, “A deep-learning-based
(ICC Workshops), 2018, pp. 1–6. radio resource assignment technique for 5G ultra dense networks,”
[373] H. Sun et al., “Learning to optimize: Training deep neural networks IEEE Netw., vol. 32, no. 6, pp. 28–34, Nov./Dec. 2018.
for wireless resource management,” in Proc. 18th IEEE Int. Workshop [396] B. Mao et al., “A tensor based deep learning technique for intelligent
Signal Process. Adv. Wireless Commun. (SPAWC), 2017, pp. 1–6. packet routing,” in Proc. IEEE Glob. Commun. Conf., Singapore, 2017,
[374] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep reinforce- pp. 1–6.
ment learning based framework for power-efficient resource allocation [397] F. Geyer and G. Carle, “Learning and generating distributed routing
in cloud RANs,” in Proc. IEEE Int. Conf. Commun. (ICC), 2017, protocols using graph-based deep learning,” in Proc. ACM Workshop
pp. 1–6. Big Data Anal. Mach. Learn. Data Commun. Netw., 2018, pp. 40–45.
[375] P. V. R. Ferreira et al., “Multi-objective reinforcement learning-based [398] N. C. Luong et al., “Joint transaction transmission and channel
deep neural networks for cognitive space communications,” in Proc. selection in cognitive radio based blockchain networks: A deep rein-
Cogn. Commun. Aerosp. Appl. Workshop (CCAA), 2017, pp. 1–8. forcement learning approach,” arXiv preprint arXiv:1810.10139, 2018.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2284 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

[399] X. Li et al., “Intelligent power control for spectrum sharing in cognitive [422] N. McLaughlin et al., “Deep Android malware detection,” in Proc. 7th
radios: A deep reinforcement learning approach,” IEEE Access, vol. 6, ACM Conf. Data Appl. Security Privacy, Scottsdale, AZ, USA, 2017,
pp. 25463–25473, 2018. pp. 301–308.
[400] W. Lee, M. Kim, and D.-H. Cho, “Deep learning based transmit power [423] Y. Chen, Y. Zhang, and S. Maharjan, “Deep learning for secure mobile
control in underlaid device-to-device communication,” IEEE Syst. J., edge computing,” arXiv preprint arXiv:1709.08025, 2017.
to be published. [424] M. Oulehla, Z. K. Oplatková, and D. Malanik, “Detection of mobile
[401] C. H. Liu, Z. Chen, J. Tang, J. Xu, and C. Piao, “Energy-efficient UAV botnets using neural networks,” in Proc. IEEE Future Technol. Conf.
control for effective and fair communication coverage: A deep rein- (FTC), San Francisco, CA, USA, 2016, pp. 1324–1326.
forcement learning approach,” IEEE J. Sel. Areas Commun., vol. 36, [425] P. Torres, C. Catania, S. Garcia, and C. G. Garino, “An analysis of
no. 9, pp. 2059–2070, Sep. 2018. recurrent neural networks for Botnet detection behavior,” in Proc. IEEE
[402] Y. He, F. R. Yu, N. Zhao, V. C. M. Leung, and H. Yin, “Software- Biennial Congr. Argentina (ARGENCON), 2016, pp. 1–6.
defined networks with mobile edge computing and caching for smart [426] M. Eslahi et al., “Mobile Botnet detection model based on retrospective
cities: A big data deep reinforcement learning approach,” IEEE pattern recognition,” Int. J. Security Appl., vol. 10, no. 9, pp. 39–54,
Commun. Mag., vol. 55, no. 12, pp. 31–37, Dec. 2017. 2016.
[403] X. Liu, Y. Xu, L. Jia, Q. Wu, and A. Anpalagan, “Anti-jamming [427] M. Alauthaman, N. Aslam, L. Zhang, R. Alasem, and M. A. Hossain,
communications using spectrum waterfall: A deep reinforcement learn- “A P2P Botnet detection scheme based on decision tree and adaptive
ing approach,” IEEE Commun. Lett., vol. 22, no. 5, pp. 998–1001, multilayer neural networks,” Neural Comput. Appl., vol. 29, no. 11,
May 2018. pp. 991–1004, 2018.
[404] Q. T. A. Pham, Y. Hadjadj-Aoul, and A. Outtagarts, “Deep rein-
[428] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
forcement learning based QoS-aware routing in knowledge-defined
Proc. 22nd ACM SIGSAC Conf. Comput. Commun. Security, 2015,
networking,” in Proc. Qshine EAI Int. Conf. Heterogeneous Netw. Qual.
pp. 1310–1321.
Rel. Security Robustness, 2018, pp. 1–13.
[405] P. Ferreira et al., “Multi-objective reinforcement learning for cogni- [429] L. T. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai, “Privacy-
tive radio–based satellite communications,” in Proc. 34th AIAA Int. preserving deep learning: Revisited and enhanced,” in Proc. Int. Conf.
Commun. Satellite Syst. Conf., 2016, p. 5726. Appl. Tech. Inf. Security, 2017, pp. 100–110.
[406] M. Yousefi-Azar, V. Varadharajan, L. Hamey, and U. Tupakula, [430] S. A. Ossia et al., “A hybrid deep learning architecture for privacy-
“Autoencoder-based feature learning for cyber security applications,” preserving mobile analytics,” arXiv preprint arXiv:1703.02952, 2017.
in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Anchorage, AK, [431] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM
USA, 2017, pp. 3854–3861. SIGSAC Conf. Comput. Commun. Security, Vienna, Austria, 2016,
[407] M. E. Aminanto and K. Kim, “Detecting impersonation attack in WiFi pp. 308–318.
networks using deep learning approach,” in Proc. Int. Workshop Inf. [432] S. A. Osia, A. S. Shamsabadi, A. Taheri, H. R. Rabiee, and H. Haddadi,
Security Appl., 2016, pp. 136–147. “Private and scalable personal data analytics using a hybrid edge-
[408] Q. Feng, Z. Dou, C. Li, and G. Si, “Anomaly detection of spectrum cloud deep learning,” IEEE Comput. Mag., vol. 51, no. 5, pp. 42–49,
in wireless communication via deep autoencoder,” in Proc. Int. Conf. May 2018.
Comput. Sci. Appl., 2016, pp. 259–265. [433] S. Servia-Rodriguez, L. Wang, J. R. Zhao, R. Mortier, and H. Haddadi,
[409] M. A. Khan, S. Khan, B. Shams, and J. Lloret, “Distributed flood attack “Personal model training under privacy constraints,” in Proc. 3rd
detection mechanism using artificial neural network in wireless mesh ACM/IEEE Int. Conf. Internet Things Design Implement., Apr 2018.
networks,” Security Commun. Netw., vol. 9, no. 15, pp. 2715–2729, [434] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the
2016. GAN: Information leakage from collaborative deep learning,” in Proc.
[410] A. A. Diro and N. Chilamkurti, “Distributed attack detection scheme ACM SIGSAC Conf. Comput. Commun. Security, Dallas, TX, USA,
using deep learning approach for Internet of Things,” Future Gener. 2017, pp. 603–618.
Comput. Syst., vol. 82, pp. 761–768, May 2018. [435] S. Greydanus, “Learning the enigma with recurrent neural networks,”
[411] A. Saied, R. E. Overill, and T. Radzik, “Detection of known arXiv preprint arXiv:1708.07576, 2017.
and unknown DDoS attacks using artificial neural networks,” [436] H. Maghrebi, T. Portigliatti, and E. Prouff, “Breaking cryptographic
Neurocomputing, vol. 172, pp. 385–393, Jan. 2016. implementations using deep learning techniques,” in Proc. Int. Conf.
[412] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, Security Privacy Appl. Cryptography Eng., 2016, pp. 3–26.
“Conditional variational autoencoder for prediction and feature recov- [437] Y. Liu et al., “GENPass: A general deep learning model for password
ery applied to intrusion detection in IoT,” Sensors, vol. 17, no. 9, guessing with PCFG rules and adversarial generation,” in Proc. IEEE
p. 1967, 2017. Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6.
[413] K. Hamedani, L. Liu, R. Atat, J. Wu, and Y. Yi, “Reservoir computing [438] R. Ning, C. Wang, C. Xin, J. Li, and H. Wu, “DeepMag: Sniffing
meets smart grids: Attack detection using delayed feedback networks,” mobile apps in magnetic field through deep convolutional neural
IEEE Trans. Ind. Informat., vol. 14, no. 2, pp. 734–743, Feb. 2018. networks,” in Proc. IEEE Int. Conf. Pervasive Comput. Commun.
[414] R. Das, A. Gadre, S. Zhang, S. Kumar, and J. M. F. Moura, “A deep (PerCom), 2018, pp. 1–10.
learning approach to IoT authentication,” in Proc. IEEE Int. Conf.
[439] T. J. O’Shea, T. Erpek, and T. C. Clancy, “Deep learning based MIMO
Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6.
communications,” arXiv preprint arXiv:1707.07980, 2017.
[415] P. Jiang, H. Wu, C. Wang, and C. Xin, “Virtual MAC spoofing detec-
tion through deep learning,” in Proc. IEEE Int. Conf. Commun. (ICC), [440] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep
Kansas City, MO, USA, 2018, pp. 1–6. networks for sparse linear inverse problems,” IEEE Trans. Signal
[416] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, “Droid-Sec: Deep learning in Process., vol. 65, no. 16, pp. 4293–4308, Aug. 2017.
Android malware detection,” in ACM SIGCOMM Comput. Commun. [441] T. Fujihashi, T. Koike-Akino, T. Watanabe, and P. V. Orlik, “Nonlinear
Rev., vol. 44, no. 4, pp. 371–372, 2014. equalization with deep learning for multi-purpose visual MIMO com-
[417] Z. Yuan, Y. Lu, and Y. Xue, “Droiddetector: Android malware charac- munications,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas City,
terization and detection using deep learning,” Tsinghua Sci. Technol., MO, USA, 2018, pp. 1–6.
vol. 21, no. 1, pp. 114–123, Feb. 2016. [442] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin,
[418] X. Su, D. Zhang, W. Li, and K. Zhao, “A deep learning approach “Deep learning models for wireless signal classification with distributed
to Android malware feature learning and detection,” in Proc. IEEE low-cost spectrum sensors,” IEEE Trans. Cogn. Commun. Netw., vol. 4,
Trustcom/BigDataSE/ISPA, Tianjin, China, 2016, pp. 244–251. no. 3, pp. 433–445, Sep. 2018.
[419] S. Hou, A. Saas, L. Chen, and Y. Ye, “Deep4MalDroid: A deep learn- [443] N. E. West and T. O’Shea, “Deep architectures for modulation recog-
ing framework for Android malware detection based on Linux kernel nition,” in Proc. IEEE Int. Symp. Dyn. Spectr. Access Netw. (DySPAN),
system call graphs,” in Proc. IEEE/WIC/ACM Int. Conf. Web Intell. Piscataway, NJ, USA, 2017, pp. 1–6.
Workshops (WIW), Omaha, NE, USA, 2016, pp. 104–111. [444] T. J. O’Shea, L. Pemula, D. Batra, and T. C. Clancy, “Radio trans-
[420] F. Martinelli, F. Marulli, and F. Mercaldo, “Evaluating convolutional former networks: Attention models for learning to synchronize in
neural network for effective mobile malware detection,” Procedia wireless systems,” in Proc. 50th Asilomar Conf. Signals Syst. Comput.,
Comput. Sci., vol. 112, pp. 2372–2381, 2017. Pacific Grove, CA, USA, 2016, pp. 662–666.
[421] K. K. Nguyen et al., “Cyberattack detection in mobile cloud computing: [445] J. Gante, G. Falcão, and L. Sousa, “Beamformed fingerprint
A deep learning approach,” in Proc. IEEE Wireless Commun. Netw. learning for accurate millimeter wave positioning,” arXiv preprint
Conf. (WCNC), Barcelona, Spain, 2018, pp. 1–6. arXiv:1804.04112, 2018.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2285

[446] A. Alkhateeb et al., “Deep learning coordinated beamforming for [470] I. Alawe, A. Ksentini, Y. Hadjadj-Aoul, and P. Bertin, “Improving traf-
highly-mobile millimeter wave systems,” IEEE Access, vol. 6, fic forecasting for 5G core network scalability: A machine learning
pp. 37328–37348, 2018. approach,” IEEE Netw., vol. 32, no. 6, pp. 42–49, Nov./Dec. 2018.
[447] D. Neumann, T. Wiese, and W. Utschick, “Deep channel estimation,” [471] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Mobile encrypted
in Proc. 21st Int. ITG Workshop Smart Antennas, Berlin, Germany, traffic classification using deep learning,” in Proc. 2nd IEEE Netw.
2017, pp. 1–6. Traffic Meas. Anal. Conf., Vienna, Austria, 2018, pp. 1–8.
[448] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” in [472] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and
Proc. IEEE 18th Int. Workshop Signal Process. Adv. Wireless Commun. M. Ayyash, “Internet of Things: A survey on enabling technologies,
(SPAWC), Sapporo, Japan, 2017, pp. 1–5. protocols, and applications,” IEEE Commun. Surveys Tuts., vol. 17,
[449] X. Yan et al., “Signal detection of MIMO-OFDM system based on auto no. 4, pp. 2347–2376, 4th Quart., 2015.
encoder and extreme learning machine,” in Proc. IEEE Int. Joint Conf. [473] S. Seneviratne et al., “A survey of wearable devices and chal-
Neural Netw. (IJCNN), Anchorage, AK, USA, 2017, pp. 1602–1606. lenges,” IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2573–2620,
[450] T. O’Shea and J. Hoydis, “An introduction to deep learning for the 4th Quart., 2017.
physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, [474] H. Li, K. Ota, and M. Dong, “Learning IoT in edge: Deep learning
pp. 563–575, Dec. 2017. for the Internet of Things with edge computing,” IEEE Netw., vol. 32,
[451] J. Jagannath et al., “Artificial neural network based automatic modula- no. 1, pp. 96–101, Jan./Feb. 2018.
tion classification over a software defined radio testbed,” in Proc. IEEE [475] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar,
Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6. “An early resource characterization of deep learning on wearables,
[452] T. J. O’Shea, S. Hitefield, and J. Corgan, “End-to-end radio traffic smartphones and Internet-of-Things devices,” in Proc. ACM Int.
sequence recognition with recurrent neural networks,” in Proc. IEEE Workshop Internet Things Towards Appl., Seoul, South Korea, 2015,
Glob. Conf. Signal Inf. Process. (GlobalSIP), Washington, DC, USA, pp. 7–12.
2016, pp. 277–281. [476] D. Ravì et al., “Deep learning for health informatics,” IEEE J. Biomed.
[453] T. J. O’Shea, K. Karra, and T. C. Clancy, “Learning to communicate: Health Inform., vol. 21, no. 1, pp. 4–21, Jan. 2017.
Channel auto-encoders, domain specific regularizers, and attention,” in [477] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learn-
Proc. IEEE Int. Symp. Signal Process. Inf. Technol. (ISSPIT), Limassol, ing for healthcare: Review, opportunities and challenges,” Briefings
Cyprus, 2016, pp. 223–228. Bioinformat., vol. 19, no. 6, pp. 1236–1246, 2017.
[454] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for chan- [478] C. A. Ronao and S.-B. Cho, “Human activity recognition with smart-
nel estimation and signal detection in OFDM systems,” IEEE Wireless phone sensors using deep learning neural networks,” Expert Syst. Appl.,
Commun. Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018. vol. 59, pp. 235–244, Oct. 2016.
[455] F. Liang, C. Shen, and F. Wu, “Exploiting noise correlation for channel [479] J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, “Deep learning for
decoding with convolutional neural networks,” in Proc. IEEE Int. Conf. sensor-based activity recognition: A survey,” Pattern Recognit. Lett.,
Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6. vol. 119, pp. 3–11, Mar. 2019.
[456] W. Lyu, Z. Zhang, C. Jiao, K. Qin, and H. Zhang, “Performance eval- [480] X. Ran, H. Chen, Z. Liu, and J. Chen, “Delivering deep learning to
uation of channel decoding with deep neural networks,” in Proc. IEEE mobile devices via offloading,” in Proc. ACM Workshop Virtual Reality
Int. Conf. Commun. (ICC), Kansas City, MO, USA, 2018, pp. 1–6. Augmented Reality Netw., Los Angeles, CA, USA, 2017, pp. 42–47.
[457] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, “Deep learning
[481] V. V. Vyas, K. H. Walse, and R. V. Dharaskar, “A survey on human
based communication over the air,” IEEE J. Sel. Topics Signal Process.,
activity recognition using smartphone,” Int. J. Adv. Res. Comput. Sci.
vol. 12, no. 1, pp. 132–143, Feb. 2018.
Manag. Stud., vol. 5, no. 3, pp. 118–125, 2017.
[458] R.-F. Liao et al., “The Rayleigh fading channel prediction via deep
[482] H. Zen and A. Senior, “Deep mixture density networks for acoustic
learning,” Wireless Commun. Mobile Comput., vol. 2018, Jul. 2018,
modeling in statistical parametric speech synthesis,” in Proc. IEEE Int.
Art. no. 6497340.
Conf. Acoust. Speech Signal Process. (ICASSP), Florence, Italy, 2014,
[459] H. Hongji, Y. Jie, S. Yiwei, H. Hao, and G. Guan, “Deep learning
pp. 3844–3848.
for super-resolution channel estimation and DOA estimation based
massive MIMO system,” IEEE Trans. Veh. Technol., vol. 67, no. 9, [483] K. Zhao, S. Tarkoma, S. Liu, and H. Vo, “Urban human mobility data
pp. 8549–8560, Sep. 2018. mining: An overview,” in Proc. IEEE Int. Conf. Big Data (Big Data),
Washington, DC, USA, 2016, pp. 1911–1920.
[460] S. Huang and H. Lin, “Fully optical spacecraft communications:
Implementing an omnidirectional PV-cell receiver and 8 Mb/s LED vis- [484] C. Yang, M. Sun, W. X. Zhao, Z. Liu, and E. Y. Chang, “A neu-
ible light downlink with deep learning error correction,” IEEE Aerosp. ral network approach to jointly modeling social networks and mobile
Electron. Syst. Mag., vol. 33, no. 4, pp. 16–22, Apr. 2018. trajectories,” ACM Trans. Inf. Syst., vol. 35, no. 4, 2017, Art. no. 36.
[461] R. Gonzalez, A. Garcia-Duran, F. Manco, M. Niepert, and P. Vallina, [485] A. Graves, G. Wayne, and I. Danihelka, “Neural Turing machines,”
“Network data monetization using Net2Vec,” in Proc. ACM SIGCOMM arXiv preprint arXiv:1410.5401, 2014.
Posters Demos, Los Angeles, CA, USA, 2017, pp. 37–39. [486] S. Xia, Y. Liu, G. Yuan, M. Zhu, and Z. Wang, “Indoor fingerprint posi-
[462] N. Kaminski et al., “A neural-network-based realization of in-network tioning based on Wi-Fi: An overview,” ISPRS Int. J. Geo Inf., vol. 6,
computation for the Internet of Things,” in Proc. IEEE Int. Conf. no. 5, p. 135, 2017.
Commun. (ICC), Paris, France, 2017, pp. 1–6. [487] P. Davidson and R. Piché, “A survey of selected indoor positioning
[463] L. Xiao, Y. Li, G. Han, H. Dai, and H. V. Poor, “A secure mobile methods for smartphones,” IEEE Commun. Surveys Tuts., vol. 19, no. 2,
crowdsensing game with deep reinforcement learning,” IEEE Trans. pp. 1347–1370, 2nd Quart., 2017.
Inf. Forensics Security, vol. 13, no. 1, pp. 35–47, Jan. 2018. [488] J. Xiao, Z. Zhou, Y. Yi, and L. M. Ni, “A survey on wireless indoor
[464] N. C. Luong, Z. Xiong, P. Wang, and D. Niyato, “Optimal auction for localization from the device perspective,” ACM Comput. Surveys,
edge computing resource management in mobile blockchain networks: vol. 49, no. 2, 2016, Art. no. 25.
A deep learning approach,” in Proc. IEEE Int. Conf. Commun. (ICC), [489] J. Xiao, K. Wu, Y. Yi, and L. M. Ni, “FIFS: Fine-grained indoor fin-
Kansas City, MO, USA, 2018, pp. 1–6. gerprinting system,” in Proc. 21st Int. Conf. Comput. Commun. Netw.
[465] A. Gulati, G. S. Aujla, R. Chaudhary, N. Kumar, and M. S. Obaidat, (ICCCN), Munich, Germany, 2012, pp. 1–7.
“Deep learning-based content centric data dissemination scheme for [490] M. Youssef and A. Agrawala, “The Horus WLAN location determina-
Internet of Vehicles,” in Proc. IEEE Int. Conf. Commun. (ICC), tion system,” in Proc. 3rd ACM Int. Conf. Mobile Syst. Appl. Services,
Kansas City, MO, USA, 2018, pp. 1–6. Seattle, WA, USA, 2005, pp. 205–218.
[466] E. Ahmed et al., “Recent advances and challenges in mobile big data,” [491] M. Brunato and R. Battiti, “Statistical learning theory for location
IEEE Commun. Mag., vol. 56, no. 2, pp. 102–108, Feb. 2018. fingerprinting in wireless LANs,” Comput. Netw., vol. 47, no. 6,
[467] D. Z. Yazti and S. Krishnaswamy, “Mobile big data analytics: Research, pp. 825–845, 2005.
practice, and opportunities,” in Proc. 15th IEEE Int. Conf. Mobile Data [492] J. Ho and S. Ermon, “Generative adversarial imitation learning,”
Manag. (MDM), vol. 1. Brisbane, QLD, Australia, 2014, pp. 1–2. in Proc. Adv. Neural Inf. Process. Syst., Barcelona, Spain, 2016,
[468] D. Naboulsi, M. Fiore, S. Ribot, and R. Stanica, “Large-scale mobile pp. 4565–4573.
traffic analysis: A survey,” IEEE Commun. Surveys Tuts., vol. 18, no. 1, [493] M. Zorzi, A. Zanella, A. Testolin, M. De Filippo De Grazia, and
pp. 124–161, 1st Quart., 2016. M. Zorzi, “COBANETS: A new paradigm for cognitive communi-
[469] J. Ngiam et al., “Multimodal deep learning,” in Proc. 28th Int. Conf. cations systems,” in Proc. IEEE Int. Conf. Comput. Netw. Commun.
Mach. Learn. (ICML), Bellevue, WA, USA, 2011, pp. 689–696. (ICNC), 2016, pp. 1–7.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
2286 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 3, THIRD QUARTER 2019

[494] M. Roopaei, P. Rad, and M. Jamshidi, “Deep learning control for [519] Q. Cao, N. Balasubramanian, and A. Balasubramanian, “MobiRNN:
complex and large scale cloud systems,” Intell. Autom. Soft Comput., Efficient recurrent neural network execution on mobile GPU,” in
vol. 23, no. 3, pp. 1–3, 2017. Proc. 1st ACM Int. Workshop Deep Learn. Mobile Syst. Appl.,
[495] P. V. R. Ferreira et al., “Multiobjective reinforcement learning for cog- Niagara Falls, NY, USA, 2017, pp. 1–6.
nitive satellite communications using deep neural network ensembles,” [520] C.-F. Chen, G. G. Lee, V. Sritapan, and C.-Y. Lin, “Deep convolutional
IEEE J. Sel. Areas Commun., vol. 36, no. 5, pp. 1030–1041, May 2018. neural network on iOS mobile devices,” in Proc. IEEE Int. Workshop
[496] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience Signal Process. Syst. (SiPS), Dallas, TX, USA, 2016, pp. 130–135.
replay,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016. [521] S. Rallapalli et al., “Are very deep neural networks feasible on mobile
[497] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted devices,” IEEE Trans. Circuits Syst. Video Technol., to be published.
MMSE approach to distributed sum-utility maximization for a MIMO [522] N. D. Lane et al., “DeepX: A software accelerator for low-power
interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, deep learning inference on mobile devices,” in Proc. 15th ACM/IEEE
no. 9, pp. 4331–4340, Sep. 2011. Int. Conf. Inf. Process. Sensor Netw. (IPSN), Vienna, Austria, 2016,
[498] A. L. Buczak and E. Guven, “A survey of data mining and pp. 1–12.
machine learning methods for cyber security intrusion detection,” IEEE [523] L. N. Huynh, R. K. Balan, and Y. Lee, “Demo: DeepMon: Building
Commun. Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart., mobile GPU deep learning models for continuous vision applications,”
2016. in Proc. 15th ACM Annu. Int. Conf. Mobile Syst. Appl. Services, 2017,
[499] J. Wang et al., “Not just privacy: Improving performance of private p. 186.
deep learning in mobile cloud,” in Proc. 24th ACM SIGKDD Int. Conf. [524] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized
Knowl. Disc. Data Min., London, U.K., 2018, pp. 2407–2416. convolutional neural networks for mobile devices,” in Proc. IEEE
[500] D. Kwon et al., “A survey of deep learning-based network anomaly Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, 2016,
detection,” Clust. Comput., pp. 1–13, Aug. 2017. pp. 4820–4828.
[501] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed [525] S. Bhattacharya and N. D. Lane, “Sparsification and separation of
analysis of the KDD CUP 99 data set,” in Proc. IEEE Symp. Comput. deep learning layers for constrained resource inference on wearables,”
Intell. Security Defense Appl., Ottawa, ON, Canada, 2009, pp. 1–6. in Proc. 14th ACM Conf. Embedded Netw. Sensor Syst. CD-ROM,
[502] K. Tam, A. Feizollah, N. B. Anuar, R. Salleh, and L. Cavallaro, “The Stanford, CA, USA, 2016, pp. 176–189.
evolution of Android malware and Android analysis techniques,” ACM [526] M. Cho and D. Brand, “MEC: Memory-efficient convolution for deep
Comput. Surveys, vol. 49, no. 4, 2017, Art. no. 76. neural network,” in Proc. Int. Conf. Mach. Learn. (ICML), Sydney,
[503] R. A. Rodríguez-Gómez, G. Maciá-Fernández, and P. García-Teodoro, NSW, Australia, 2017, pp. 815–824.
“Survey and taxonomy of Botnet research through life-cycle,” ACM [527] J. Guo and M. Potkonjak, “Pruning filters and classes: Towards on-
Comput. Surveys, vol. 45, no. 4, 2013, Art. no. 45. device customization of convolutional neural networks,” in Proc. 1st
[504] M. Liu et al., “A collaborative privacy-preserving deep learning system ACM Int. Workshop Deep Learn. Mobile Syst. Appl., Niagara Falls,
in distributed mobile environment,” in Proc. IEEE Int. Conf. Comput. NY, USA, 2017, pp. 13–17.
Sci. Comput. Intell. (CSCI), Las Vegas, NV, USA, 2016, pp. 192–197.
[528] S. Li et al., “FitCNN: A cloud-assisted lightweight convolutional neural
[505] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric
network framework for mobile devices,” in Proc. 23rd IEEE Int. Conf.
discriminatively, with application to face verification,” in Proc. IEEE
Embedded Real Time Comput. Syst. Appl. (RTCSA), Hsinchu, Taiwan,
Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1. San Diego, CA,
2017, pp. 1–6.
USA, 2005, pp. 539–546.
[529] H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, and
[506] A. S. Shamsabadi, H. Haddadi, and A. Cavallaro, “Distributed one-
P. Szczepaniak, “Fast, compact, and high quality LSTM-RNN based
class learning,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Athens,
statistical parametric speech synthesizers for mobile devices,” arXiv
Greece, 2018, pp. 4123–4127.
preprint arXiv:1606.06061, 2016.
[507] B. Hitaj, P. Gasti, G. Ateniese, and F. Perez-Cruz, “PassGAN:
[530] G. Falcao, L. A. Alexandre, J. Marques, X. Fraz ao, and J. Maria, “On
A deep learning approach for password guessing,” arXiv preprint
the evaluation of energy-efficient deep learning using stacked autoen-
arXiv:1709.00440, 2017.
coders on mobile GPUs,” in Proc. 25th IEEE Euromicro Int. Conf.
[508] H. Ye, G. Y. Li, B.-H. F. Juang, and K. Sivanesan, “Channel agnos-
Parallel Distrib. Netw. Based Process., St. Petersburg, Russia, 2017,
tic end-to-end learning based communication systems with conditional
pp. 270–273.
GAN,” arXiv preprint arXiv:1807.00447, 2018.
[509] R. Gonzalez et al., “Net2Vec: Deep learning for the network,” in Proc. [531] B. Fang, X. Zeng, and M. Zhang, “NestDNN: Resource-aware multi-
ACM Workshop Big Data Anal. Mach. Learn. Data Commun. Netw., tenant on-device deep learning for continuous mobile vision,” in
2017, pp. 13–18. Proc. 24th ACM Annu. Int. Conf. Mobile Comput. Netw., New Delhi,
[510] D. H. Wolpert and W. G. Macready, “No free lunch theorems for India, 2018, pp. 115–127.
optimization,” IEEE Trans. Evol. Comput., vol. 1, no. 1, pp. 67–82, [532] M. Xu, M. Zhu, Y. Liu, F. X. Lin, and X. Liu, “DeepCache: Principled
Apr. 1997. cache for mobile deep vision,” in Proc. 24th ACM Annu. Int. Conf.
[511] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “Model compression and Mobile Comput. Netw., New Delhi, India, 2018, pp. 129–144.
acceleration for deep neural networks: The principles, progress, and [533] S. Liu et al., “On-demand deep model compression for mobile devices:
challenges,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 126–136, A usage-driven model selection framework,” in Proc. 16th ACM
Jan. 2018. Annu. Int. Conf. Mobile Syst. Appl. Services, Munich, Germany, 2018,
[512] N. D. Lane et al., “Squeezing deep learning into mobile and embedded pp. 389–400.
devices,” IEEE Pervasive Comput., vol. 16, no. 3, pp. 82–88, 2017. [534] T. Chen et al., “TVM: An automated end-to-end optimizing compiler
[513] J. Tang, D. Sun, S. Liu, and J.-L. Gaudiot, “Enabling deep learning on for deep learning,” in Proc. 13th USENIX Symp. Oper. Syst. Design
IoT devices,” Computer, vol. 50, no. 10, pp. 92–96, 2017. Implement. (OSDI), Carlsbad, CA, USA, 2018, pp. 578–594.
[514] J. Wang et al., “Deep learning towards mobile applications,” in [535] S. Yao et al., “FastDeepIoT: Towards understanding and optimizing
Proc. 38th IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS), 2018, neural network execution time on mobile and embedded devices,” in
pp. 1385–1393. Proc. 16th ACM Conf. Embedded Netw. Sensor Syst., Shenzhen, China,
[515] F. N. Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x 2018, pp. 278–291.
fewer parameters and <0.5MB model size,” in Proc. Int. Conf. Learn. [536] D. Li, X. Wang, and D. Kong, “DeepRebirth: Accelerating deep neural
Represent. (ICLR), 2017. network execution on mobile devices,” in Proc. Nat. Conf. Artif. Intell.
[516] A. G. Howard et al., “MobileNets: Efficient convolutional neu- (AAAI), New Orleans, LA, USA, 2018, pp. 2322–2330.
ral networks for mobile vision applications,” arXiv preprint [537] S. Teerapittayanon, B. McDanel, and H. T. Kung, “Distributed deep
arXiv:1704.04861, 2017. neural networks over the cloud, the edge and end devices,” in Proc. 37th
[517] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An extremely IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS), Atlanta, GA, USA,
efficient convolutional neural network for mobile devices,” in Proc. 2017, pp. 328–339.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, [538] S. Omidshafiei et al., “Deep decentralized multi-task multi-agent rein-
UT, USA, Jun. 2018, pp. 6848–6856. forcement learning under partial observability,” in Proc. Int. Conf.
[518] Q. Zhang, L. T. Yang, X. Liu, Z. Chen, and P. Li, “A tucker deep Mach. Learn. (ICML), Sydney, NSW, Australia, 2017, pp. 2681–2690.
computation model for mobile multimedia feature learning,” ACM [539] B. Recht, C. Ré, S. J. Wright, and F. Niu, “Hogwild: A lock-free
Trans. Multimedia Comput. Commun. Appl., vol. 13, no. 3s, Aug. 2017, approach to parallelizing stochastic gradient descent,” in Proc. Adv.
Art. no. 39. Neural Inf. Process. Syst., Granada, Spain, 2011, pp. 693–701.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DEEP LEARNING IN MOBILE AND WIRELESS NETWORKING: SURVEY 2287

[540] P. Goyal et al., “Accurate, large minibatch SGD: Training ImageNet in [564] L. Liu, W. Wei, D. Zhao, and H. Ma, “Urban resolution: New metric for
1 hour,” arXiv preprint arXiv:1706.02677, 2017. measuring the quality of urban sensing,” IEEE Trans. Mobile Comput.,
[541] R. Zhang, S. Zheng, and J. T. Kwok, “Asynchronous distributed semi- vol. 14, no. 12, pp. 2560–2575, Dec. 2015.
stochastic gradient optimization,” in Proc. Nat. Conf. Artif. Intell. [565] D. Tikunov and T. Nishimura, “Traffic prediction for mobile network
(AAAI), Phoenix, AZ, USA, 2016, pp. 2323–2329. using holt-winter’s exponential smoothing,” in Proc. 15th IEEE Int.
[542] C. Hardy, E. L. Merrer, and B. Sericola, “Distributed deep learning Conf. Softw. Telecommun. Comput. Netw., 2007, pp. 1–5.
on edge-devices: Feasibility via adaptive compression,” in Proc. 16th [566] H.-W. Kim, J.-H. Lee, Y.-H. Choi, Y.-U. Chung, and H. Lee, “Dynamic
IEEE Int. Symp. Netw. Comput. Appl. (NCA), Cambridge, MA, USA, bandwidth provisioning using ARIMA-based traffic forecasting for
2017, pp. 1–8. mobile WiMAX,” Comput. Commun., vol. 34, no. 1, pp. 99–106, 2011.
[543] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, [567] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “Pointnet: Deep
“Communication-efficient learning of deep networks from decentral- learning on point sets for 3D classification and segmentation,” in Proc.
ized data,” in Proc. 20th Int. Conf. Artif. Intell. Stat., vol. 54. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 77–85.
Fort Lauderdale, FL, USA, Apr. 2017, pp. 1273–1282. [568] A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris,
[544] K. Bonawitz et al., “Practical secure aggregation for privacy preserving “Deep learning advances in computer vision with 3D data: A survey,”
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. ACM Comput. Surveys, vol. 50, no. 2, 2017, Art. no. 20.
Security (CCS), Dallas, TX, USA, Nov. 2017, pp. 1175–1191. [569] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
[545] S. Gupta, W. Zhang, and F. Wang, “Model accuracy and runtime trade- “The graph neural network model,” IEEE Trans. Neural Netw., vol. 20,
off in distributed deep learning: A systematic study,” in Proc. IEEE 16th no. 1, pp. 61–80, Jan. 2009.
Int. Conf. Data Min. (ICDM), Barcelona, Spain, 2016, pp. 171–180. [570] Y. Yuan, X. Liang, X. Wang, D.-Y. Yeung, and A. Gupta, “Temporal
[546] B. McMahan and D. Ramage, Federated Learning: Collaborative dynamic graph LSTM for action-driven video object detection,” in
Machine Learning Without Centralized Training Data, Google Res. Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, 2017,
Blog, 2017. pp. 1819–1828.
[547] K. Bonawitz et al., “Towards federated learning at scale: System [571] M. Usama et al., “Unsupervised machine learning for networking:
design,” arXiv preprint arXiv:1902.01046, 2019. Techniques, applications and research challenges,” arXiv preprint
[548] A. Fumo, M. Fiore, and R. Stanica, “Joint spatial and temporal clas- arXiv:1709.06599, 2017.
sification of mobile traffic demands,” in Proc. IEEE Conf. Comput. [572] C. Zhou and R. C. Paffenroth, “Anomaly detection with robust deep
Commun., Atlanta, GA, USA, 2017, pp. 1–9. autoencoders,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Disc.
[549] Z. Chen and B. Liu, “Lifelong machine learning,” in Synthesis Lectures Data Min., 2017, pp. 665–674.
on Artificial Intelligence and Machine Learning, vol. 10, Morgan & [573] M. Abadi and D. G. Andersen, “Learning to protect communica-
Claypool, 2016, pp. 1–145. tions with adversarial neural cryptography,” in Proc. Int. Conf. Learn.
[550] S.-W. Lee et al., “Dual-memory deep learning architectures for lifelong Represent. (ICLR), 2017.
learning of everyday human behaviors,” in Proc. Int. Joint Conf. Artif. [574] D. Silver et al., “A general reinforcement learning algorithm that mas-
Intell., New York, NY, USA, 2016, pp. 1669–1675. ters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419,
pp. 1140–1144, 2018.
[551] A. Graves et al., “Hybrid computing using a neural network with
dynamic external memory,” Nature, vol. 538, no. 7626, pp. 471–476,
2016.
[552] G. I. Parisi, J. Tani, C. Weber, and S. Wermter, “Lifelong learning
of human actions with deep neural network self-organization,” Neural Chaoyun Zhang received the B.Sc. degree
Netw., vol. 96, pp. 137–149, Dec. 2017. from the School of Electronic Information and
[553] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor, Communications, Huazhong University of Science
“A deep hierarchical approach to lifelong learning in minecraft,” in and Technology, China, and the M.Sc. degree in arti-
Proc. Nat. Conf. Artif. Intell. (AAAI), San Francisco, CA, USA, 2017, ficial intelligence from the University of Edinburgh,
pp. 1553–1561. with a focus on machine learning, where he is cur-
[554] D. López-Sánchez, A. G. Arrieta, and J. M. Corchado, “Deep neural rently pursuing the Ph.D. degree with the School
networks and transfer learning applied to multimedia Web mining,” in of Informatics. His current research interests include
Proc. 14th Int. Conf. Distrib. Comput. Artif. Intell., vol. 620. Porto, the application of deep learning to problems in com-
Portugal, 2018, pp. 124–131. puter networking, including traffic analysis, resource
[555] E. Baştuğ, M. Bennis, and M. Debbah, “A transfer learning approach allocation, and network control.
for cache-enabled wireless networks,” in Proc. 13th IEEE Int.
Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), 2015,
pp. 161–166.
[556] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object Paul Patras (M’11–SM’18) received the M.Sc. and
categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, Ph.D. degrees from the Universidad Carlos III de
pp. 594–611, Apr. 2006. Madrid in 2008 and 2011, respectively. He is a
[557] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, “Zero- Lecturer (Assistant Professor) and a Chancellor’s
shot learning with semantic output codes,” in Proc. Adv. Neural Inf. Fellow with the School of Informatics, University
Process. Syst., Vancouver, BC, Canada, 2009, pp. 1410–1418. of Edinburgh, where he leads the Internet of Things
[558] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, Research Programme. His research interests include
“Matching networks for one shot learning,” in Proc. Adv. Neural Inf. performance optimization in wireless and mobile
Process. Syst., Barcelona, Spain, 2016, pp. 3630–3638. networks, applied machine learning, mobile traffic
[559] S. Changpinyo, W.-L. Chao, B. Gong, and F. Sha, “Synthesized classi- analytics, security and privacy, prototyping, and test
fiers for zero-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern beds.
Recognit., Las Vegas, NV, USA, 2016, pp. 5327–5336.
[560] J. Oh, S. P. Singh, H. Lee, and P. Kohli, “Zero-shot task generalization
with multi-task deep reinforcement learning,” in Proc. Int. Conf. Mach.
Learn. (ICML), Sydney, NSW, Australia, 2017, pp. 2661–2670.
[561] H. Wang, F. Xu, Y. Li, P. Zhang, and D. Jin, “Understanding mobile Hamed Haddadi is a Senior Lecturer (Associate
traffic patterns of large scale cellular towers in urban environment,” in Professor) and the Deputy Director of Research with
Proc. ACM Internet Meas. Conf., 2015, pp. 225–238. the Dyson School of Design Engineering, and an
[562] C. Marquez et al., “Not all Apps are created equal: Analysis of Academic Fellow with the Data Science Institute,
spatiotemporal heterogeneity in nationwide mobile service usage,” Faculty of Engineering, Imperial College London.
in Proc. 13th ACM Conf. Emerg. Netw. Exp. Technol., Incheon, He is interested in user-centered systems, human–
South Korea, 2017, pp. 180–186. data interaction, applied machine learning, and data
[563] G. Barlacchi et al., “A multi-source dataset of urban life in the city security and privacy. He enjoys designing and build-
of Milan and the Province of Trentino,” Sci. Data, vol. 2, Oct. 2015, ing systems that enable better use of our digital
Art. no. 150055. footprint, while respecting users’ privacy.

Authorized licensed use limited to: Universitaetsbibliothek der RWTH Aachen. Downloaded on December 10,2020 at 15:02:10 UTC from IEEE Xplore. Restrictions apply.

You might also like