© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cyrus Vahid <cyrusmv@amazon.com> Principal Evangelist, AI Labs – MXNet Aug 2018 Apache MXNet and gluon Building Deep Learning Applications with
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Background
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deductive Reasoning P Q P ∧ Q P ∨ Q P ∴ Q T T T T T T F F T F F T F T T F F F F T • $ = & ∧ ' = & ∴ $ ∧ ' = & • $ ∧ ' ∴ $ → '; ∼ $ ∴ $ → ' • P → Q P _________ ∴ Q
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rule Based Programming
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Plausible Reasoning
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Programming with Data Understand your data Algorithmically Discover Hidden Patents Generalize Solution Algorithm Apply solution to unseen patterns Make Predictions
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fundamentals
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Biological & Artificial Neuron Source: http://cs231n.github.io/neural-networks-1/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 ! "#, %# = Φ() + Σ#(%#. "#)) Φ " = . 1, 0! " ≥ 0.5 0, 0! " < 0.5
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1 !1 = 1$1 + 1$1 + −1.5 = 0.5 ∴ Φ(!1) = 1 .1 = .2 = 01 = 1 !1 = 1$1 + 0$1 + −1.5 = −0.5 ∴ Φ(!1) = 0 .2 = 0 ; .1 = 01 = 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Non-Linearity P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T P Q x0 0 0 P Q x0 x 0
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning hidden layersInput layer output Add Non Linearity to output of hidden layer To transform output into continuous range
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... ... backpropagation (gradient descent) X1 != X 0.4 ± ! 0.3 ± ! new weights new weights 0 1 0 1 1 . . - X input label ... X1 ...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activation Function (Φ)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Preprocessing, Batches, Epochs Preprocessing § Random separation of data into training, validation, and test sets § Necessary to measuring the accuracy of the model Batch § Amount of data propagated through network at every iteration § Enables faster optimization through shorter iteration cycles Epoch § Complete pass through all the training data § Optimization will have multiple epochs to reduce error rate
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding MNIST data https://www.tensorflow.org/get_started/mnist/beginners
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding Pictures into Data 7 x 7 x 3 Matrix
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification with the Softmax Function Softmax converts the output layer into probabilities – necessary for classification Softmax Function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Weight Initialization https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimizers http://imgur.com/a/Hqolp
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Rates • Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient • Online Learning: Weights are updated at each step (slow to learn) • Batch Learning: Weights are updated after all training data is processed (hard to optimize) • Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training and Validation Data Best model When only evaluating accuracy using the training set, we face the Overfitting issue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ideal Inception v3 Resnet Alexnet 88% Efficiency 1 2 4 8 16 32 64 128 256 Scaling with MXNet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic Programming Imperative Symbolic Execution Flow is the same as flow of the code: Abstract functions are defined and compiled first, data binding happens next. Flexible but inefficient: Efficient • Memory: 4 * 10 * 8 = 320 bytes • Interim values are available • No Operation Folding. • Familiar coding paradigm. • Memory: 2 * 10 * 8 = 160 bytes • Interim values are not available • Operation Folding: Folding multiple operations into one. We run one op. instead of many on GPU. This is possible because we have access to whole comp. graph
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Evolution of DL Frameworks
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of the Gluon API Simple, Easy-to- Understand Code Flexible, Imperative Structure Dynamic Graphs High Performance § Neural networks can be defined using simple, clear, concise code § Plug-and-play neural network building blocks – including predefined layers, optimizers, and initializers § Eliminates rigidity of neural network model definition and brings together the model with the training algorithm § Intuitive, easy-to-debug, familiar code § Neural networks can change in shape or size during the training process to address advanced use cases where the size of data fed is variable § Important area of innovation in Natural Language Processing (NLP) § There is no sacrifice with respect to training speed § When it is time to move from prototyping to production, easily cache neural networks for high performance and a reduced memory footprint
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonCV, a Deep Learning Toolkit for Computer Vision • Features: • training scripts that reproduces SOTA results reported in latest papers, • a large set of pre-trained models, • carefully designed APIs and easy to understand implementations, • community support.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonNLP, a Deep Learning Toolkit for Natural Language Processing • Features: • Training scripts to reproduce SOTA results reported in research papers. • Pre-trained models for common NLP tasks. • Carefully designed APIs that greatly reduce the implementation complexity. • Community support.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • MXNet backend for Keras: Keras is a high-level neural networks API, written in Python and capable of running on top of Apache MXNet, Tensorflow, CNTK, and Theano. • Performance: MXNet backend provides scalable and fast backend for new projects and existing code, hence with least effort it can improve performance of existing models. For more on benchmarking please check: https://github.com/awslabs/keras-apache- mxnet/tree/master/benchmark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Refrences • Mxnet: http://mxnet.incubator.apache.org/ • Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/ • Deep Learning book based on gluon: https://gluon.mxnet.io/ • GluonCV: https://gluon-cv.mxnet.io/ • GluonNLP: https://gluon-nlp.mxnet.io/ • Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! c y r u s m v @ a m a z o n . c o m

Building Applications with Apache MXNet

  • 1.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cyrus Vahid <cyrusmv@amazon.com> Principal Evangelist, AI Labs – MXNet Aug 2018 Apache MXNet and gluon Building Deep Learning Applications with
  • 2.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Background
  • 3.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Deductive Reasoning P Q P ∧ Q P ∨ Q P ∴ Q T T T T T T F F T F F T F T T F F F F T • $ = & ∧ ' = & ∴ $ ∧ ' = & • $ ∧ ' ∴ $ → '; ∼ $ ∴ $ → ' • P → Q P _________ ∴ Q
  • 4.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Rule Based Programming
  • 5.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Plausible Reasoning
  • 6.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Programming with Data Understand your data Algorithmically Discover Hidden Patents Generalize Solution Algorithm Apply solution to unseen patterns Make Predictions
  • 7.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fundamentals
  • 8.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Biological & Artificial Neuron Source: http://cs231n.github.io/neural-networks-1/
  • 9.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 ! "#, %# = Φ() + Σ#(%#. "#)) Φ " = . 1, 0! " ≥ 0.5 0, 0! " < 0.5
  • 10.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1 !1 = 1$1 + 1$1 + −1.5 = 0.5 ∴ Φ(!1) = 1 .1 = .2 = 01 = 1 !1 = 1$1 + 0$1 + −1.5 = −0.5 ∴ Φ(!1) = 0 .2 = 0 ; .1 = 01 = 1
  • 11.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Non-Linearity P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T P Q x0 0 0 P Q x0 x 0
  • 12.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning hidden layersInput layer output Add Non Linearity to output of hidden layer To transform output into continuous range
  • 13.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... ... backpropagation (gradient descent) X1 != X 0.4 ± ! 0.3 ± ! new weights new weights 0 1 0 1 1 . . - X input label ... X1 ...
  • 14.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Activation Function (Φ)
  • 15.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Preprocessing, Batches, Epochs Preprocessing § Random separation of data into training, validation, and test sets § Necessary to measuring the accuracy of the model Batch § Amount of data propagated through network at every iteration § Enables faster optimization through shorter iteration cycles Epoch § Complete pass through all the training data § Optimization will have multiple epochs to reduce error rate
  • 16.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding MNIST data https://www.tensorflow.org/get_started/mnist/beginners
  • 17.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding Pictures into Data 7 x 7 x 3 Matrix
  • 18.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification with the Softmax Function Softmax converts the output layer into probabilities – necessary for classification Softmax Function
  • 19.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch
  • 20.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
  • 21.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Weight Initialization https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
  • 22.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data
  • 23.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimizers http://imgur.com/a/Hqolp
  • 24.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Rates • Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient • Online Learning: Weights are updated at each step (slow to learn) • Batch Learning: Weights are updated after all training data is processed (hard to optimize) • Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch
  • 25.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training and Validation Data Best model When only evaluating accuracy using the training set, we face the Overfitting issue
  • 26.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014
  • 27.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet
  • 28.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3
  • 29.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 30.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 31.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • ! = # ⋅ % • & = ' ⋅ ( • ) = *! + & x y ! x * , x a x b k ) + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 32.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Ideal Inception v3 Resnet Alexnet 88% Efficiency 1 2 4 8 16 32 64 128 256 Scaling with MXNet
  • 33.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic Programming Imperative Symbolic Execution Flow is the same as flow of the code: Abstract functions are defined and compiled first, data binding happens next. Flexible but inefficient: Efficient • Memory: 4 * 10 * 8 = 320 bytes • Interim values are available • No Operation Folding. • Familiar coding paradigm. • Memory: 2 * 10 * 8 = 160 bytes • Interim values are not available • Operation Folding: Folding multiple operations into one. We run one op. instead of many on GPU. This is possible because we have access to whole comp. graph
  • 34.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon
  • 35.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Evolution of DL Frameworks
  • 36.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of the Gluon API Simple, Easy-to- Understand Code Flexible, Imperative Structure Dynamic Graphs High Performance § Neural networks can be defined using simple, clear, concise code § Plug-and-play neural network building blocks – including predefined layers, optimizers, and initializers § Eliminates rigidity of neural network model definition and brings together the model with the training algorithm § Intuitive, easy-to-debug, familiar code § Neural networks can change in shape or size during the training process to address advanced use cases where the size of data fed is variable § Important area of innovation in Natural Language Processing (NLP) § There is no sacrifice with respect to training speed § When it is time to move from prototyping to production, easily cache neural networks for high performance and a reduced memory footprint
  • 37.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
  • 38.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonCV, a Deep Learning Toolkit for Computer Vision • Features: • training scripts that reproduces SOTA results reported in latest papers, • a large set of pre-trained models, • carefully designed APIs and easy to understand implementations, • community support.
  • 39.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonNLP, a Deep Learning Toolkit for Natural Language Processing • Features: • Training scripts to reproduce SOTA results reported in research papers. • Pre-trained models for common NLP tasks. • Carefully designed APIs that greatly reduce the implementation complexity. • Community support.
  • 40.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • MXNet backend for Keras: Keras is a high-level neural networks API, written in Python and capable of running on top of Apache MXNet, Tensorflow, CNTK, and Theano. • Performance: MXNet backend provides scalable and fast backend for new projects and existing code, hence with least effort it can improve performance of existing models. For more on benchmarking please check: https://github.com/awslabs/keras-apache- mxnet/tree/master/benchmark
  • 41.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Refrences • Mxnet: http://mxnet.incubator.apache.org/ • Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/ • Deep Learning book based on gluon: https://gluon.mxnet.io/ • GluonCV: https://gluon-cv.mxnet.io/ • GluonNLP: https://gluon-nlp.mxnet.io/ • Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
  • 42.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! c y r u s m v @ a m a z o n . c o m