Deep learning Tutorial - Part II

Location: QuantUniversity Meetup January 19th 2017 Boston MA Deep Learning : An introduction Part II 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com

2 Slides and Code will be available at: http://www.analyticscertificate.com/DeepLearning

- Analytics Advisory services - Custom training programs - Architecture assessments, advice and audits

• Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4

5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launched the Analytics Certificate Program in September ▫ New Cohort in March 2017 • Coming soon: Deep Learning and Cognitive computing Certificate!

6 • February 2017 ▫ Apache Spark Lecture – Feb 3rd ▫ Deep Learning Workshop – Boston – March 27-28 ▫ Anomaly Detection Workshop – Boston – April 24-25 • March 2017 ▫ Deep Learning Workshop – New York (Date TBD) Events of Interest

7 • Neural Networks 101 • Multi-Layer Perceptron • Convolutional Neural Networks Recap

8 • AutoEncoders • Recurrent Neural Networks ▫ LSTM Agenda for today

9 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering, etc. ▫ Create a transformed representation of the original data=> PCA Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1

10 • Supervised Algorithms ▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y

11 • Motivation1: Autoencoders 1. http://ai.stanford.edu/~quocle/tutorial2.pdf

12 https://blog.google/products/google-plus/saving-you- bandwidth-through-machine-learning/

13 • Goal is to have 𝑥 to approximate x • Interesting applications such as ▫ Data compression ▫ Visualization ▫ Pre-train neural networks Autoencoder

14 Demo in Keras1 1. https://blog.keras.io/building-autoencoders-in-keras.html 2. https://keras.io/models/model/

15 • Pretraining step: Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data. • Fine-tuning step 1: train the last layer using supervised data • Fine-tuning step 2: use backpropagation to fine-tune the entire network using supervised data Autoencoders1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf

Supervised learning Cross-sectional ▫ Observations are independent ▫ Given X1----Xi, predict Y ▫ CNNs

Supervised learning Sequential ▫ Sequentially ordered ▫ Given O1---OT, predict OT+1 1 Normal 2 Normal 3 Abnormal 4 Normal 5 Abnormal

18 • Given : X1,X2,X3----XN • Convert the Univariate time series dataset to a cross sectional Dataset Time series modeling in Keras using MLPs X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X Y X1 X2 X2 X3 X3 X4 X4 X5 X5 X6 X6 X7 X7 X8 X8 X9 X9 X10 X10 X11 X11 X12 X12 X13 X13 X14 X14 X15

19 • Monthly data • Computational Intelligence in Forecasting • Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download Sample data 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106

20 • Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Keras

21 • Use 72 for training and 36 for testing • Lookback 1, 10 • Longer the lookback, larger the network Multi-Layer Perceptron Size 8 Size 1

22 Demo Train Score: 1972.20 MSE (44.41 RMSE) Test Score: 3001.77 MSE (54.79 RMSE) Train Score: 2631.49 MSE (51.30 RMSE) Test Score: 4166.64 MSE (64.55 RMSE) Lookback = 1 Lookback = 10

23 • Has 3 types of parameters ▫ W – Hidden weights ▫ U – Hidden to Hidden weights ▫ V – Hidden to Label weights • All W,U,V are shared Recurrent Neural Networks1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf

24 Where can Recurrent Neural Networks be used?1 1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).

25 • Andrej Karpathy’s article ▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Hand writing generation demo ▫ http://www.cs.toronto.edu/~graves/handwriting.html Sample applications

26 Recurrent Neural Networks • A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 1 • Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops http://colah.github.io/posts/2015-08-Understanding-LSTMs/

27 • BPTT begins by unfolding a recurrent neural network through time as shown in the figure. • Training then proceeds in a manner similar to training a feed- forward neural network with backpropagation, except that the training patterns are visited in sequential order. Back Propagation through time (BPTT)1 1. https://en.wikipedia.org/wiki/Backpropagation_through_time

28 • Backpropagation through time (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network. • This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1 Addressing the problem of Vanishing/Exploding gradient http://deeplearning.net/tutorial/lstm.html

29 • Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy • See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural- networks-python-keras/ ▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf Demo – IMDB Dataset

30 Network The most frequent 5000 words are chosen and mapped to 32 length vector Sequences are restricted to 500 words; > 500 cut off ; < 500 pad LSTM layer with 100 output dimensions Accuracy: 84.08%

31 • Use 72 for training and 36 for testing • Lookback 1 Using RNNs for the CIF forecasting problem 0 200 400 600 800 1000 1200 1400 1600 1800 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105

32 Result Train Score: 50.54 RMSE Test Score: 65.34 RMSE Lookback = 1 Train Score: 41.65 RMSE Test Score: 90.68 RMSE Lookback = 10

33 • Approach using Microsoft’s Cognitive Toolkit ▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/

Thank you! Members & Sponsors! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 35

Deep learning Tutorial - Part II

More Related Content

What's hot

Viewers also liked

Similar to Deep learning Tutorial - Part II

More from QuantUniversity

Recently uploaded

Deep learning Tutorial - Part II