Building Applications with Apache MXNet

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cyrus Vahid <cyrusmv@amazon.com> Principal Evangelist, AI Labs – MXNet Aug 2018 Apache MXNet and gluon Building Deep Learning Applications with

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deductive Reasoning P Q P ∧ Q P ∨ Q P ∴ Q T T T T T T F F T F F T F T T F F F F T • $ = & ∧ ' = & ∴ $ ∧ ' = & • $ ∧ ' ∴ $ → '; ∼ $ ∴ $ → ' • P → Q P _________ ∴ Q

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Programming with Data Understand your data Algorithmically Discover Hidden Patents Generalize Solution Algorithm Apply solution to unseen patterns Make Predictions

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Biological & Artificial Neuron Source: http://cs231n.github.io/neural-networks-1/

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 ! "#, %# = Φ() + Σ#(%#. "#)) Φ " = . 1, 0! " ≥ 0.5 0, 0! " < 0.5

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1 !1 = 1$1 + 1$1 + −1.5 = 0.5 ∴ Φ(!1) = 1 .1 = .2 = 01 = 1 !1 = 1$1 + 0$1 + −1.5 = −0.5 ∴ Φ(!1) = 0 .2 = 0 ; .1 = 01 = 1

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Non-Linearity P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T P Q x0 0 0 P Q x0 x 0

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning hidden layersInput layer output Add Non Linearity to output of hidden layer To transform output into continuous range

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... ... backpropagation (gradient descent) X1 != X 0.4 ± ! 0.3 ± ! new weights new weights 0 1 0 1 1 . . - X input label ... X1 ...

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Preprocessing, Batches, Epochs Preprocessing § Random separation of data into training, validation, and test sets § Necessary to measuring the accuracy of the model Batch § Amount of data propagated through network at every iteration § Enables faster optimization through shorter iteration cycles Epoch § Complete pass through all the training data § Optimization will have multiple epochs to reduce error rate

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding MNIST data https://www.tensorflow.org/get_started/mnist/beginners

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification with the Softmax Function Softmax converts the output layer into probabilities – necessary for classification Softmax Function

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Weight Initialization https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Rates • Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient • Online Learning: Weights are updated at each step (slow to learn) • Batch Learning: Weights are updated after all training data is processed (hard to optimize) • Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training and Validation Data Best model When only evaluating accuracy using the training set, we face the Overfitting issue

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic Programming Imperative Symbolic Execution Flow is the same as flow of the code: Abstract functions are defined and compiled first, data binding happens next. Flexible but inefficient: Efficient • Memory: 4 * 10 * 8 = 320 bytes • Interim values are available • No Operation Folding. • Familiar coding paradigm. • Memory: 2 * 10 * 8 = 160 bytes • Interim values are not available • Operation Folding: Folding multiple operations into one. We run one op. instead of many on GPU. This is possible because we have access to whole comp. graph

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of the Gluon API Simple, Easy-to- Understand Code Flexible, Imperative Structure Dynamic Graphs High Performance § Neural networks can be defined using simple, clear, concise code § Plug-and-play neural network building blocks – including predefined layers, optimizers, and initializers § Eliminates rigidity of neural network model definition and brings together the model with the training algorithm § Intuitive, easy-to-debug, familiar code § Neural networks can change in shape or size during the training process to address advanced use cases where the size of data fed is variable § Important area of innovation in Natural Language Processing (NLP) § There is no sacrifice with respect to training speed § When it is time to move from prototyping to production, easily cache neural networks for high performance and a reduced memory footprint

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonCV, a Deep Learning Toolkit for Computer Vision • Features: • training scripts that reproduces SOTA results reported in latest papers, • a large set of pre-trained models, • carefully designed APIs and easy to understand implementations, • community support.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonNLP, a Deep Learning Toolkit for Natural Language Processing • Features: • Training scripts to reproduce SOTA results reported in research papers. • Pre-trained models for common NLP tasks. • Carefully designed APIs that greatly reduce the implementation complexity. • Community support.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • MXNet backend for Keras: Keras is a high-level neural networks API, written in Python and capable of running on top of Apache MXNet, Tensorflow, CNTK, and Theano. • Performance: MXNet backend provides scalable and fast backend for new projects and existing code, hence with least effort it can improve performance of existing models. For more on benchmarking please check: https://github.com/awslabs/keras-apache- mxnet/tree/master/benchmark

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Refrences • Mxnet: http://mxnet.incubator.apache.org/ • Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/ • Deep Learning book based on gluon: https://gluon.mxnet.io/ • GluonCV: https://gluon-cv.mxnet.io/ • GluonNLP: https://gluon-nlp.mxnet.io/ • Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet

Building Applications with Apache MXNet

More Related Content

Similar to Building Applications with Apache MXNet

More from Apache MXNet

Recently uploaded

Building Applications with Apache MXNet