Handwritten Recognition using Deep Learning with R Poo Kuan Hoong August 17, 2016 1
Google DeepMind Alphago 2
Introduction In the past 10 years, machine learning and Artificial Intelligence (AI) have shown tremendous progress The recent success can be attributed to: Explosion of data Cheap computing cost - CPUs and GPUs Improvement of machine learning models Much of the current excitement concerns a subfield of it called “deep learning”. 3
Human Brain 4
Neural Networks Deep Learning is primarily about neural networks, where a network is an interconnected web of nodes and edges. Neural nets were designed to perform complex tasks, such as the task of placing objects into categories based on a few attributes. Neural nets are highly structured networks, and have three kinds of layers - an input, an output, and so called hidden layers, which refer to any layers between the input and the output layers. Each node (also called a neuron) in the hidden and output layers has a classifier. 5
Neural Network Layers 6
Neural Network: Forward Propagation The input neurons first receive the data features of the object. After processing the data, they send their output to the first hidden layer. The hidden layer processes this output and sends the results to the next hidden layer. This continues until the data reaches the final output layer, where the output value determines the object’s classification. This entire process is known as Forward Propagation, or Forward prop. 7
Neural Network: Backward Propagation To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value. 8
The 1990s view of what was wrong with back-propagation It required a lot of labelled training data Almost all data is unlabeled The learning time did not scale well It was very slow in networks with multiple hidden layers. It got stuck at local optima These were often surprisingly good but there was no good theory 9
Deep Learning Deep learning refers to artificial neural networks that are composed of many layers. It’s a growing trend in Machine Learning due to some favorable results in applications where the target function is very complex and the datasets are large. 10
Deep Learning: Benefits Robust No need to design the features ahead of time - features are automatically learned to be optimal for the task at hand Robustness to natural variations in the data is automatically learned Generalizable The same neural net approach can be used for many different applications and data types Scalable Performance improves with more data, method is massively parallelizable 11
Deep Learning: Weaknesses Deep Learning requires a large dataset, hence long training period. In term of cost, Machine Learning methods like SVMs and other tree ensembles are very easily deployed even by relative machine learning novices and can usually get you reasonably good results. Deep learning methods tend to learn everything. It’s better to encode prior knowledge about structure of images (or audio or text). The learned features are often difficult to understand. Many vision features are also not really human-understandable (e.g, concatenations/combinations of different features). Requires a good understanding of how to model multiple modalities with traditional tools. 12
Deep Learning: Applications 13
H2O Library H2O is an open source, distributed, Java machine learning library Ease of Use via Web Interface R, Python, Scala, Spark & Hadoop Interfaces Distributed Algorithms Scale to Big Data Package can be downloaded from http://www.h2o.ai/download/h2o/r 14
H2O R Package on CRAN 15
H2O booklets H2O reference booklets can be downwloaded from https://github.com/h2oai/h2o-3 /tree/master/h2o-docs/src/booklets/v2_2015/PDFs/online 16
MNIST Handwritten Dataset The MNIST database consists of handwritten digits. The training set has 60,000 examples, and the test set has 10,000 examples. The MNIST database is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image For this demo, the Kaggle pre-processed training and testing dataset were used. The training dataset, (train.csv), has 42000 rows and 785 columns. 17
Demo The sourcecode can be accessed from here https://github.com/kuanhoong/myRUG_DeepLearning 18
Create training and testing datasets 19
Start H2O Cluster from R and load data into H2O 20
Deep Learning in R: Train & Test 21
Result 22
Lastly… 23

Handwritten Recognition using Deep Learning with R

  • 1.
    Handwritten Recognition usingDeep Learning with R Poo Kuan Hoong August 17, 2016 1
  • 2.
  • 3.
    Introduction In the past10 years, machine learning and Artificial Intelligence (AI) have shown tremendous progress The recent success can be attributed to: Explosion of data Cheap computing cost - CPUs and GPUs Improvement of machine learning models Much of the current excitement concerns a subfield of it called “deep learning”. 3
  • 4.
  • 5.
    Neural Networks Deep Learningis primarily about neural networks, where a network is an interconnected web of nodes and edges. Neural nets were designed to perform complex tasks, such as the task of placing objects into categories based on a few attributes. Neural nets are highly structured networks, and have three kinds of layers - an input, an output, and so called hidden layers, which refer to any layers between the input and the output layers. Each node (also called a neuron) in the hidden and output layers has a classifier. 5
  • 6.
  • 7.
    Neural Network: ForwardPropagation The input neurons first receive the data features of the object. After processing the data, they send their output to the first hidden layer. The hidden layer processes this output and sends the results to the next hidden layer. This continues until the data reaches the final output layer, where the output value determines the object’s classification. This entire process is known as Forward Propagation, or Forward prop. 7
  • 8.
    Neural Network: BackwardPropagation To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value. 8
  • 9.
    The 1990s viewof what was wrong with back-propagation It required a lot of labelled training data Almost all data is unlabeled The learning time did not scale well It was very slow in networks with multiple hidden layers. It got stuck at local optima These were often surprisingly good but there was no good theory 9
  • 10.
    Deep Learning Deep learningrefers to artificial neural networks that are composed of many layers. It’s a growing trend in Machine Learning due to some favorable results in applications where the target function is very complex and the datasets are large. 10
  • 11.
    Deep Learning: Benefits Robust Noneed to design the features ahead of time - features are automatically learned to be optimal for the task at hand Robustness to natural variations in the data is automatically learned Generalizable The same neural net approach can be used for many different applications and data types Scalable Performance improves with more data, method is massively parallelizable 11
  • 12.
    Deep Learning: Weaknesses DeepLearning requires a large dataset, hence long training period. In term of cost, Machine Learning methods like SVMs and other tree ensembles are very easily deployed even by relative machine learning novices and can usually get you reasonably good results. Deep learning methods tend to learn everything. It’s better to encode prior knowledge about structure of images (or audio or text). The learned features are often difficult to understand. Many vision features are also not really human-understandable (e.g, concatenations/combinations of different features). Requires a good understanding of how to model multiple modalities with traditional tools. 12
  • 13.
  • 14.
    H2O Library H2O isan open source, distributed, Java machine learning library Ease of Use via Web Interface R, Python, Scala, Spark & Hadoop Interfaces Distributed Algorithms Scale to Big Data Package can be downloaded from http://www.h2o.ai/download/h2o/r 14
  • 15.
    H2O R Packageon CRAN 15
  • 16.
    H2O booklets H2O referencebooklets can be downwloaded from https://github.com/h2oai/h2o-3 /tree/master/h2o-docs/src/booklets/v2_2015/PDFs/online 16
  • 17.
    MNIST Handwritten Dataset TheMNIST database consists of handwritten digits. The training set has 60,000 examples, and the test set has 10,000 examples. The MNIST database is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image For this demo, the Kaggle pre-processed training and testing dataset were used. The training dataset, (train.csv), has 42000 rows and 785 columns. 17
  • 18.
    Demo The sourcecode canbe accessed from here https://github.com/kuanhoong/myRUG_DeepLearning 18
  • 19.
    Create training andtesting datasets 19
  • 20.
    Start H2O Clusterfrom R and load data into H2O 20
  • 21.
    Deep Learning inR: Train & Test 21
  • 22.
  • 23.