Deep Learning with TensorFlow Understanding Tensors, Computation Graphs, Images, Text Viswanath Puttagunta Technical Program Manager, Linaro
Vish Puttagunta Technical Program Manager, Linaro Statistics/Signal Processing in Images, Audio, RF Data Analysis and Machine Learning on ARM SoCs. Leading collaboration in ARMTM Ecosystem Enterprise, Mobile, Home, Networking, IoT Built on Open Source
Goals Basics behind Neural Networks Picturing Neural Networks. Understand the operations. Representing Neural Networks using Tensors and Computation Graphs Vision (MNIST), Text (Word2Vec) TensorFlow Gives you good tools to build a Neural Network Model. Some basic models… but Google's secret sauce still inside Google :)
The Basics Linear Regression Weight / Bias Cost Function Gradient Descent to optimize a cost function Logistic Regression Neuron Activation
Linear Regression Objective: Fit line/curve to minimize a cost function Line has "Weight / Slope" and "Bias / y-
Cost Function, Gradient Descent Source: https://www.youtube.com/watch?v=SqA6TujbmWw&list=PLE6Wd9FR--Ecf_5nCbnSQMHqORpiChfJf&index=16 https://youtu.be/WnqQrPNYz5Q?list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJ_imw&t=284 ● Cost Function, J(𝞑) β—‹ Weight / Slope: 𝞑1 β—‹ Bias / y-intercept: 𝞑2 ● Gradient Descent β—‹ Go down to the lowest point in the cost function in small steps BIG IDEA ● Starts with some Weights & Biases ● Define a Cost Function ● Optimize the Cost function by doing some sort of Gradient Descent ● Lots of Help from TensorFlow.
Weight, Bias in 0,1 Dimensions
Neuron and Activation Function BIG IDEAS ● Only one of the Neurons will fire at a time (value shoots greater than 0.5) ● Which neuron depends on "x", Weights, Biases ● Rest of neurons will not fire. (value much less than 0.5) ● Outcomes are between 0 and 1 β†’ Probabilities (all outputs add to one) β—‹ β†’ Good for classification
Neuron and Activation Function BIG IDEAS ● Only one of the Neurons will fire at a time (value shoots greater than 0.5) ● Which neuron depends on "x", Weights, Biases ● Rest of neurons will not fire. (value much less than 0.5) ● Outcomes are between 0 and 1 β†’ Probabilities (all outputs add to one) β—‹ β†’ Good for classification
Neuron and Activation Function BIG IDEAS ● Only one of the Neurons will fire at a time (value shoots greater than 0.5) ● Which neuron depends on "x", Weights, Biases ● Rest of neurons will not fire. (value much less than 0.5) ● Outcomes are between 0 and 1 β†’ Probabilities (all outputs add to one) β—‹ β†’ Good for classification
Same picture: Less mess, more classes
Simple Neural Network for MNIST Source: https://www.tensorflow.org/versions/r0.7/tutorials/mnist/beginners/index.html
Simple Neural Network for MNIST Weights actually Mean something!!
Weights/Bias β†’ Evidence β†’ Probability BIG IDEAS ● After proper Gradient Descent, Weights Generalize well. ● Correlation operation corresponds to "Evidence" ● Then do softmax(evidence) and you will get Probability(x 𝞊 specific class) Source: https://www.tensorflow.org/versions/r0.7/tutorials/mnist/beginners/index.html
MNIST Simple: Operations / Dimensions ● input image(x) β—‹ 28x28 matrix β†’ 1x784 β—‹ each element type: 0 or 1 ● Weights (W0, W1…. W9) β—‹ 28x28 matrix β†’ 1x784 β—‹ each element: float(-1, +1) ● Eg: W1Ⓧx (Correlation) β—‹ Multiply Accumulate! β–  matmul(W, xT) β—‹ single float (-1, +1) ● Biases (B0, B1…. B9) β—‹ single floating pt (-1, +1) ● Evidence: Eg: W1Ⓧx + B1 β—‹ single floating pt (- 784.0,+784.0) ● Probability β—‹ single float (0,1) ● Real output (Y), Predicted o/p(Y^) β—‹ 1x10 matrix β—‹ each is 0 or 1
MNIST Simple: In TensorFlow
MNIST Simple: In TensorFlow
MNIST Simple: Back-Propagation
MNIST using Convolution Neural Network
Convolution: Signal/Image processing Source
Convolution: Signal/Image Processing Source
Weight, Bias in 0,1 Dimensions
MNIST: Using CNN
MNIST: Using CNN
MNIST: Using CNN
MNIST: Using CNN
Text Analytics Word2Vec Thinking of words and as vectors Sequence-to-Sequence models Recurrent Neural Networks (LSTM) Understanding sequence of words "Don't like" is closer to "hate"
Words as vectors (Embeddings) list β†’ [human, cow] β†’ int list [0, 1] β†’ embeddings [(2,2), (4,0)]
Sequence of words as vectors!
Projections/Dimensionality Reduction (PCA/TSNE)
Projections/Dimensionality Reduction (PCA/TSNE)
source: https://www.youtube.com/watch?v=VINCQghQRuM
word2vec: using Neural Network
word2vec: using Neural Network Note: Only trying to understand relationship between words and understand best way to vectorize words! Not really keen on sequence of words! (Later RNN)
word2vec: using NN (softmax classifier) Realize wv could easily be w50000 words in vocabulary!
word2vec: using NN (softmax classifier-pitfall) Too many computations for every set of inputs (Eg: Classifying across 50,000 classes if you vocabulary size is 50,000)
word2vec: using Noise Classifier Big Idea: Instead of trying to move every word(vector) by little bit in each step Move few random words (~16) at a time If you have lots of sentences, you should be good
word2vec: Compute Graph
word2vec Code Walk Through https://github.com/tensorflow/tensorflow/blob/master/tensorflow/exa mples/tutorials/word2vec/word2vec_basic.py
Summary TensorFlow was simple to install (Ubuntu 14.04) sudo apt-get install python-pip python-dev sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow- 0.8.0rc0-cp27-none-linux_x86_64.whl Tutorials are 'relatively' easy to follow Good pointers to papers and blogs Would like to hear Google's perspective on
References Convolution Neural Network: https://youtu.be/n6hpQwq7Inw TensorFlow Tutorials: https://www.tensorflow.org/versions/r0.7/tutorials/index.html Khan Academy (Linear algebra): https://www.khanacademy.org/ LSTM, Text Analytics: Alec Ratford, Indico https://www.youtube.com/watch?v=VINCQghQRuM Current trends in Deep Learning (Andrew Ng, Baidu) https://www.youtube.com/watch?v=O0VN0pGgBZM

Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Images, and Text

  • 1.
    Deep Learning with TensorFlow UnderstandingTensors, Computation Graphs, Images, Text Viswanath Puttagunta Technical Program Manager, Linaro
  • 2.
    Vish Puttagunta Technical ProgramManager, Linaro Statistics/Signal Processing in Images, Audio, RF Data Analysis and Machine Learning on ARM SoCs. Leading collaboration in ARMTM Ecosystem Enterprise, Mobile, Home, Networking, IoT Built on Open Source
  • 3.
    Goals Basics behind NeuralNetworks Picturing Neural Networks. Understand the operations. Representing Neural Networks using Tensors and Computation Graphs Vision (MNIST), Text (Word2Vec) TensorFlow Gives you good tools to build a Neural Network Model. Some basic models… but Google's secret sauce still inside Google :)
  • 4.
    The Basics Linear Regression Weight/ Bias Cost Function Gradient Descent to optimize a cost function Logistic Regression Neuron Activation
  • 5.
    Linear Regression Objective: Fitline/curve to minimize a cost function Line has "Weight / Slope" and "Bias / y-
  • 6.
    Cost Function, GradientDescent Source: https://www.youtube.com/watch?v=SqA6TujbmWw&list=PLE6Wd9FR--Ecf_5nCbnSQMHqORpiChfJf&index=16 https://youtu.be/WnqQrPNYz5Q?list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJ_imw&t=284 ● Cost Function, J(𝞑) β—‹ Weight / Slope: 𝞑1 β—‹ Bias / y-intercept: 𝞑2 ● Gradient Descent β—‹ Go down to the lowest point in the cost function in small steps BIG IDEA ● Starts with some Weights & Biases ● Define a Cost Function ● Optimize the Cost function by doing some sort of Gradient Descent ● Lots of Help from TensorFlow.
  • 7.
    Weight, Bias in0,1 Dimensions
  • 8.
    Neuron and ActivationFunction BIG IDEAS ● Only one of the Neurons will fire at a time (value shoots greater than 0.5) ● Which neuron depends on "x", Weights, Biases ● Rest of neurons will not fire. (value much less than 0.5) ● Outcomes are between 0 and 1 β†’ Probabilities (all outputs add to one) β—‹ β†’ Good for classification
  • 9.
    Neuron and ActivationFunction BIG IDEAS ● Only one of the Neurons will fire at a time (value shoots greater than 0.5) ● Which neuron depends on "x", Weights, Biases ● Rest of neurons will not fire. (value much less than 0.5) ● Outcomes are between 0 and 1 β†’ Probabilities (all outputs add to one) β—‹ β†’ Good for classification
  • 10.
    Neuron and ActivationFunction BIG IDEAS ● Only one of the Neurons will fire at a time (value shoots greater than 0.5) ● Which neuron depends on "x", Weights, Biases ● Rest of neurons will not fire. (value much less than 0.5) ● Outcomes are between 0 and 1 β†’ Probabilities (all outputs add to one) β—‹ β†’ Good for classification
  • 11.
    Same picture: Lessmess, more classes
  • 12.
    Simple Neural Networkfor MNIST Source: https://www.tensorflow.org/versions/r0.7/tutorials/mnist/beginners/index.html
  • 13.
    Simple Neural Networkfor MNIST Weights actually Mean something!!
  • 14.
    Weights/Bias β†’ Evidenceβ†’ Probability BIG IDEAS ● After proper Gradient Descent, Weights Generalize well. ● Correlation operation corresponds to "Evidence" ● Then do softmax(evidence) and you will get Probability(x 𝞊 specific class) Source: https://www.tensorflow.org/versions/r0.7/tutorials/mnist/beginners/index.html
  • 15.
    MNIST Simple: Operations/ Dimensions ● input image(x) β—‹ 28x28 matrix β†’ 1x784 β—‹ each element type: 0 or 1 ● Weights (W0, W1…. W9) β—‹ 28x28 matrix β†’ 1x784 β—‹ each element: float(-1, +1) ● Eg: W1Ⓧx (Correlation) β—‹ Multiply Accumulate! β–  matmul(W, xT) β—‹ single float (-1, +1) ● Biases (B0, B1…. B9) β—‹ single floating pt (-1, +1) ● Evidence: Eg: W1Ⓧx + B1 β—‹ single floating pt (- 784.0,+784.0) ● Probability β—‹ single float (0,1) ● Real output (Y), Predicted o/p(Y^) β—‹ 1x10 matrix β—‹ each is 0 or 1
  • 16.
    MNIST Simple: InTensorFlow
  • 17.
    MNIST Simple: InTensorFlow
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Weight, Bias in0,1 Dimensions
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    Text Analytics Word2Vec Thinking ofwords and as vectors Sequence-to-Sequence models Recurrent Neural Networks (LSTM) Understanding sequence of words "Don't like" is closer to "hate"
  • 28.
    Words as vectors(Embeddings) list β†’ [human, cow] β†’ int list [0, 1] β†’ embeddings [(2,2), (4,0)]
  • 29.
    Sequence of wordsas vectors!
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    word2vec: using NeuralNetwork Note: Only trying to understand relationship between words and understand best way to vectorize words! Not really keen on sequence of words! (Later RNN)
  • 35.
    word2vec: using NN(softmax classifier) Realize wv could easily be w50000 words in vocabulary!
  • 36.
    word2vec: using NN(softmax classifier-pitfall) Too many computations for every set of inputs (Eg: Classifying across 50,000 classes if you vocabulary size is 50,000)
  • 37.
    word2vec: using NoiseClassifier Big Idea: Instead of trying to move every word(vector) by little bit in each step Move few random words (~16) at a time If you have lots of sentences, you should be good
  • 38.
  • 39.
    word2vec Code WalkThrough https://github.com/tensorflow/tensorflow/blob/master/tensorflow/exa mples/tutorials/word2vec/word2vec_basic.py
  • 40.
    Summary TensorFlow was simpleto install (Ubuntu 14.04) sudo apt-get install python-pip python-dev sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow- 0.8.0rc0-cp27-none-linux_x86_64.whl Tutorials are 'relatively' easy to follow Good pointers to papers and blogs Would like to hear Google's perspective on
  • 41.
    References Convolution Neural Network:https://youtu.be/n6hpQwq7Inw TensorFlow Tutorials: https://www.tensorflow.org/versions/r0.7/tutorials/index.html Khan Academy (Linear algebra): https://www.khanacademy.org/ LSTM, Text Analytics: Alec Ratford, Indico https://www.youtube.com/watch?v=VINCQghQRuM Current trends in Deep Learning (Andrew Ng, Baidu) https://www.youtube.com/watch?v=O0VN0pGgBZM