Location: QuantUniversity Meetup January 19th 2017 Boston MA Deep Learning : An introduction Part II 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com
2 Slides and Code will be available at: http://www.analyticscertificate.com/DeepLearning
- Analytics Advisory services - Custom training programs - Architecture assessments, advice and audits
• Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launched the Analytics Certificate Program in September ▫ New Cohort in March 2017 • Coming soon: Deep Learning and Cognitive computing Certificate!
6 • February 2017 ▫ Apache Spark Lecture – Feb 3rd ▫ Deep Learning Workshop – Boston – March 27-28 ▫ Anomaly Detection Workshop – Boston – April 24-25 • March 2017 ▫ Deep Learning Workshop – New York (Date TBD) Events of Interest
7 • Neural Networks 101 • Multi-Layer Perceptron • Convolutional Neural Networks Recap
8 • AutoEncoders • Recurrent Neural Networks ▫ LSTM Agenda for today
9 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering, etc. ▫ Create a transformed representation of the original data=> PCA Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
10 • Supervised Algorithms ▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
11 • Motivation1: Autoencoders 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
12 https://blog.google/products/google-plus/saving-you- bandwidth-through-machine-learning/
13 • Goal is to have 𝑥 to approximate x • Interesting applications such as ▫ Data compression ▫ Visualization ▫ Pre-train neural networks Autoencoder
14 Demo in Keras1 1. https://blog.keras.io/building-autoencoders-in-keras.html 2. https://keras.io/models/model/
15 • Pretraining step: Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data. • Fine-tuning step 1: train the last layer using supervised data • Fine-tuning step 2: use backpropagation to fine-tune the entire network using supervised data Autoencoders1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
Supervised learning Cross-sectional ▫ Observations are independent ▫ Given X1----Xi, predict Y ▫ CNNs
Supervised learning Sequential ▫ Sequentially ordered ▫ Given O1---OT, predict OT+1 1 Normal 2 Normal 3 Abnormal 4 Normal 5 Abnormal
18 • Given : X1,X2,X3----XN • Convert the Univariate time series dataset to a cross sectional Dataset Time series modeling in Keras using MLPs X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X Y X1 X2 X2 X3 X3 X4 X4 X5 X5 X6 X6 X7 X7 X8 X8 X9 X9 X10 X10 X11 X11 X12 X12 X13 X13 X14 X14 X15
19 • Monthly data • Computational Intelligence in Forecasting • Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download Sample data 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106
20 • Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Keras
21 • Use 72 for training and 36 for testing • Lookback 1, 10 • Longer the lookback, larger the network Multi-Layer Perceptron Size 8 Size 1
22 Demo Train Score: 1972.20 MSE (44.41 RMSE) Test Score: 3001.77 MSE (54.79 RMSE) Train Score: 2631.49 MSE (51.30 RMSE) Test Score: 4166.64 MSE (64.55 RMSE) Lookback = 1 Lookback = 10
23 • Has 3 types of parameters ▫ W – Hidden weights ▫ U – Hidden to Hidden weights ▫ V – Hidden to Label weights • All W,U,V are shared Recurrent Neural Networks1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
24 Where can Recurrent Neural Networks be used?1 1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
25 • Andrej Karpathy’s article ▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Hand writing generation demo ▫ http://www.cs.toronto.edu/~graves/handwriting.html Sample applications
26 Recurrent Neural Networks • A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 1 • Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops http://colah.github.io/posts/2015-08-Understanding-LSTMs/
27 • BPTT begins by unfolding a recurrent neural network through time as shown in the figure. • Training then proceeds in a manner similar to training a feed- forward neural network with backpropagation, except that the training patterns are visited in sequential order. Back Propagation through time (BPTT)1 1. https://en.wikipedia.org/wiki/Backpropagation_through_time
28 • Backpropagation through time (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network. • This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1 Addressing the problem of Vanishing/Exploding gradient http://deeplearning.net/tutorial/lstm.html
29 • Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy • See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural- networks-python-keras/ ▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf Demo – IMDB Dataset
30 Network The most frequent 5000 words are chosen and mapped to 32 length vector Sequences are restricted to 500 words; > 500 cut off ; < 500 pad LSTM layer with 100 output dimensions Accuracy: 84.08%
31 • Use 72 for training and 36 for testing • Lookback 1 Using RNNs for the CIF forecasting problem 0 200 400 600 800 1000 1200 1400 1600 1800 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105
32 Result Train Score: 50.54 RMSE Test Score: 65.34 RMSE Lookback = 1 Train Score: 41.65 RMSE Test Score: 90.68 RMSE Lookback = 10
33 • Approach using Microsoft’s Cognitive Toolkit ▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/
34 Q&A
Thank you! Members & Sponsors! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 35

Deep learning Tutorial - Part II

  • 1.
    Location: QuantUniversity Meetup January 19th2017 Boston MA Deep Learning : An introduction Part II 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com
  • 2.
    2 Slides and Codewill be available at: http://www.analyticscertificate.com/DeepLearning
  • 3.
    - Analytics Advisoryservices - Custom training programs - Architecture assessments, advice and audits
  • 4.
    • Founder ofQuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  • 5.
    5 Quantitative Analytics andBig Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launched the Analytics Certificate Program in September ▫ New Cohort in March 2017 • Coming soon: Deep Learning and Cognitive computing Certificate!
  • 6.
    6 • February 2017 ▫Apache Spark Lecture – Feb 3rd ▫ Deep Learning Workshop – Boston – March 27-28 ▫ Anomaly Detection Workshop – Boston – April 24-25 • March 2017 ▫ Deep Learning Workshop – New York (Date TBD) Events of Interest
  • 7.
    7 • Neural Networks101 • Multi-Layer Perceptron • Convolutional Neural Networks Recap
  • 8.
    8 • AutoEncoders • RecurrentNeural Networks ▫ LSTM Agenda for today
  • 9.
    9 • Unsupervised Algorithms ▫Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering, etc. ▫ Create a transformed representation of the original data=> PCA Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  • 10.
    10 • Supervised Algorithms ▫Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  • 11.
  • 12.
  • 13.
    13 • Goal isto have 𝑥 to approximate x • Interesting applications such as ▫ Data compression ▫ Visualization ▫ Pre-train neural networks Autoencoder
  • 14.
    14 Demo in Keras1 1.https://blog.keras.io/building-autoencoders-in-keras.html 2. https://keras.io/models/model/
  • 15.
    15 • Pretraining step:Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data. • Fine-tuning step 1: train the last layer using supervised data • Fine-tuning step 2: use backpropagation to fine-tune the entire network using supervised data Autoencoders1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  • 16.
    Supervised learning Cross-sectional ▫ Observationsare independent ▫ Given X1----Xi, predict Y ▫ CNNs
  • 17.
    Supervised learning Sequential ▫ Sequentiallyordered ▫ Given O1---OT, predict OT+1 1 Normal 2 Normal 3 Abnormal 4 Normal 5 Abnormal
  • 18.
    18 • Given :X1,X2,X3----XN • Convert the Univariate time series dataset to a cross sectional Dataset Time series modeling in Keras using MLPs X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X Y X1 X2 X2 X3 X3 X4 X4 X5 X5 X6 X6 X7 X7 X8 X8 X9 X9 X10 X10 X11 X11 X12 X12 X13 X13 X14 X14 X15
  • 19.
    19 • Monthly data •Computational Intelligence in Forecasting • Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download Sample data 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106
  • 20.
    20 • Keras isa high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Keras
  • 21.
    21 • Use 72for training and 36 for testing • Lookback 1, 10 • Longer the lookback, larger the network Multi-Layer Perceptron Size 8 Size 1
  • 22.
    22 Demo Train Score: 1972.20MSE (44.41 RMSE) Test Score: 3001.77 MSE (54.79 RMSE) Train Score: 2631.49 MSE (51.30 RMSE) Test Score: 4166.64 MSE (64.55 RMSE) Lookback = 1 Lookback = 10
  • 23.
    23 • Has 3types of parameters ▫ W – Hidden weights ▫ U – Hidden to Hidden weights ▫ V – Hidden to Label weights • All W,U,V are shared Recurrent Neural Networks1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  • 24.
    24 Where can RecurrentNeural Networks be used?1 1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
  • 25.
    25 • Andrej Karpathy’sarticle ▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Hand writing generation demo ▫ http://www.cs.toronto.edu/~graves/handwriting.html Sample applications
  • 26.
    26 Recurrent Neural Networks •A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 1 • Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 27.
    27 • BPTT beginsby unfolding a recurrent neural network through time as shown in the figure. • Training then proceeds in a manner similar to training a feed- forward neural network with backpropagation, except that the training patterns are visited in sequential order. Back Propagation through time (BPTT)1 1. https://en.wikipedia.org/wiki/Backpropagation_through_time
  • 28.
    28 • Backpropagation throughtime (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network. • This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1 Addressing the problem of Vanishing/Exploding gradient http://deeplearning.net/tutorial/lstm.html
  • 29.
    29 • Dataset of25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy • See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural- networks-python-keras/ ▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf Demo – IMDB Dataset
  • 30.
    30 Network The most frequent5000 words are chosen and mapped to 32 length vector Sequences are restricted to 500 words; > 500 cut off ; < 500 pad LSTM layer with 100 output dimensions Accuracy: 84.08%
  • 31.
    31 • Use 72for training and 36 for testing • Lookback 1 Using RNNs for the CIF forecasting problem 0 200 400 600 800 1000 1200 1400 1600 1800 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105
  • 32.
    32 Result Train Score: 50.54RMSE Test Score: 65.34 RMSE Lookback = 1 Train Score: 41.65 RMSE Test Score: 90.68 RMSE Lookback = 10
  • 33.
    33 • Approach usingMicrosoft’s Cognitive Toolkit ▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/
  • 34.
  • 35.
    Thank you! Members &Sponsors! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 35