Python for Image Understanding: Deep Learning with Convolutional Neural Nets

for Image Understanding: Deep Learning with Convolutional Neural Nets Roelof Pieters PhD candidate at KTH & Data Science consultant at Graph Technologies @graphiﬁc London 2015roelof@graph-technologies.com

“Deep learning is a set of algorithms in machine learning that attempt to learn in multiple levels, corresponding to different levels of abstraction.” (much debated definition) A Deﬁnition

• A host of statistical machine learning techniques • Enables the automatic learning of feature hierarchies • Generally based on artiﬁcial neural networks A typology Deep learning is:

Manually designed features are often over-specified, incomplete and take a long time to design and validate Learned features are easy to adapt, fast to learn Deep learning provides a very flexible, (possibly?) universal, learnable framework for representing world, visual and linguistic information. Deep learning can learn unsupervised (from raw text/ audio/images/whatever content) and supervised (with specific labels like positive/negative)  Summary by Richard Socher. Old vs new school?

No More Handcrafted Features ! 6

“Brain”-like: Feature Hierarchies

input layer output layer hidden layers Feature Hierarchies: Vision

input layer output layer hidden layers Feature Hierarchies: Audio

cars elephants chairs Feature Hierarchies: And so on…

History: audio recognition chart by Clarifai

History: image recognition chart by Clarifai

History: image recognition Krizhevsky et al. ImageNet Classiﬁcation with Deep Convolutional Neural Networks, ILSVRC2010

16 Karpathy, A., Fei Fei, L. (2015)   Deep Visual-Semantic Alignments for Generating Image Descriptions Image-Text: Joint Visual Semantic embeddings

http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html

Activation Functions ReLU often approximated by just

python has a wide range of deep learning-related libraries available Deep Learning with Python Low level High level (efficient gpu-powered math) (computer-vision oriented DL framework,  model-zoo, prototxt model deﬁnitions)   pythoniﬁcation ongoing! (wrapper for theano, yaml, experiment-oriented) (theano-wrapper, models in python code,   abstracts theano away) (theano-extension, models in python code,   theano not hidden) and of course:

python has a wide range of deep learning-related libraries available Deep Learning with Python Low level High level deeplearning.net/software/theano caffe.berkeleyvision.org deeplearning.net/software/pylearn2 keras.io lasagne.readthedocs.org/en/latest and of course:

python has a wide range of deep learning-related libraries available Deep Learning with Python Low level High level deeplearning.net/software/theano caffe.berkeleyvision.org deeplearning.net/software/pylearn2 keras.io lasagne.readthedocs.org/en/latest and of course: we will use lasagne in our examples

1. Preprocess the data 2. Choose architecture 3. Train 4. Optimize/Regularize 5. Tips/Tricks Training a (deep) Neural Network

• Mean subtraction • Normalization • PCA and Whitening 1. Preprocess the data

• Mean subtraction • Normalization • PCA and Whitening 1. Preprocess the data: Normalization (mean image visualised of cifar-10)

• Mean subtraction • Normalization • PCA and Whitening 1. Preprocess the data: PCA & Whitening

1. Preprocess the data: PCA & Whitening

1. Preprocess the data, the right way Warning: • compute preprocessing statistics on training data • apply on all (training / validation / test) data

• Deep Belief Network (DBN) • Convolutional Net (CNN) • Recurrent Net (RNN) 2. Choosing the right architecture

Convolutional Neural Net Pbird Psunset Pdog Pcat

DrawCNN: visualizing the units' connections Agrawal, et al. Analyzing the performance of multilayer neural networks for object recognition. ECCV, 2014 Szegedy, et al. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013 Zeiler, M. et al. Visualizing and Understanding Convolutional Networks, ECCV 2014

1. Preprocess the data 2. Choose architecture 3. Train (Code Finally!) 4. Optimize/Regularize 5. Tips/Tricks Training a (deep) Neural Network

Training a (deep) Neural Network (…) layer deﬁnitions layer  parameters

Training a (deep) Neural Network hyper  parameters

Training a (deep) Neural Network Lift off!

Training a (deep) Neural Network 1. Preprocess the data 2. Choose architecture 3. Train 4. Optimize/Regularize 5. Tips/Tricks Debug

Debug Training: Visualize Loss Curve

Debug Training: Visualize Loss Curve This looks (too) linear: lower your learning rate! This looks (too) wide: increase your batch size!

Debug Training: Visualize Accuracy big gap: overﬁtting: regularize! no gap: underﬁtting (increase model size)

Debug Training: Visualize Weights (usually: ﬁrst layer) complete mess, doesn't get past the random initialisation

Debug Training: Visualize Weights (usually: ﬁrst layer) better, but still “noisy” weights mostly solvable by stronger regularisation

Debug Training: Visualize Weights (usually: ﬁrst layer) good: now operates as “edge detector”

• Tweak Hyperparameters / Architecture • Data Augmentation • Dropout • Batch Normalization Optimize / Regularize

• Grid search won't work on your millions + parameters • Random Search? Mwah… • Bayesian Optimization: Yeh baby! • Spearmint: https://github.com/HIPS/Spearmint • Hypergrad: https://github.com/HIPS/hypergrad Choosing Hyperparameters

• Tweak Hyperparameters / Architecture • Data Augmentation • Dropout • Batch Normalization Overﬁtting

Data Augmentation http://benanne.github.io/2015/03/17/plankton.html (realtime data augmentation at Kaggle’s #1 National Data Science Bowl  ≋ Deep Sea ≋ team) rotation: random with angle between 0° and 360° (uniform) translation: random with shift between -10 and 10 pixels (uniform) rescaling: random with scale factor between 1/1.6 and 1.6 (log- uniform) ﬂipping: yes or no (bernoulli) shearing: random with angle between -20° and 20° (uniform) stretching: random with stretch factor between 1/1.3 and 1.3 (log- uniform)

Dropout as Regularization (naively trained net) Overﬁts !

Dropout as Regularization (naively trained net) Overﬁts ! Dropout!

Dropout as Regularization (naively trained net) (net with dropout) less strongly overﬁtted &   can run for more epochs higher accuracy

Overﬁtting • Tweak Hyperparameters / Architecture • Data Augmentation • Dropout • Batch Normalization

•Normalize the activations in each layer within a minibatch •Learn the mean and variance of each layer as parameters Batch Normalization as regularization Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift   S Ioﬀe and C Szegedy (2015) 85% 50% 15%

Training a (deep) Neural Network 1. Preprocess the data 2. Choose architecture 3. Train 4. Optimize/Regularize 5. Further Tips & Tricks to improve Model Accuracy Debug

• Ensembles • Finetuning pre-trained/earlier-trained net • Sticking extracted layer features in another classiﬁer (ie SVM) Other “Tricks”

Ensembles • majority vote when hard predictions (ie classes) • average vote for soft predictions (continious scale) • make sure classiﬁers are uncorrelated • cross validate ensemble weights (by grid search, or rank average) • stacked • blending

Ensembles (10 similar nets with varying hyperparameters on same tiny-imagenet dataset)

avg: 0.3647 Ensembles (10 similar nets with varying hyperparameters on same tiny-imagenet dataset)

predict by mean of all: 0.4244 avg: 0.3647 Ensembles (10 similar nets with varying hyperparameters on same tiny-imagenet dataset)

(10 similar nets with varying hyperparameters on same tiny-imagenet dataset) predict by mean of all: 0.4244 leave out model9: 0.4259 avg: 0.3647 Ensembles

(10 similar nets with varying hyperparameters on same tiny-imagenet dataset) 0.4259 0.4244 0.3647 Ensembles

danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/ (ensembling specialist nets by Daniel Nouri, Kaggle facial keypoint tutorial) Ensembles “Specialists” Ensemble

https://www.kaggle.com/c/facial-keypoints-detection/ try it yourself :) 3 similar nets trained on the same data but with different hyper parameters.  RMSE’s: •2,08449 •2,04575 •2.01565 together:   1.93397 disclaimer: Kaggle is not real life, people…

Machine learning systems can easily be fooled but beware… / no free lunch:

Saliency Maps K. Simonyan, A. Vedaldi, A. Zisserman , "Deep Inside Convolutional Networks: Visualising Image Classiﬁcation Models and Saliency Maps", ICLR Workshop 2014 ﬁrst we predict on a pixel level

Fooling ConvNets Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint, 2013. Nguyen, Anh, Jason Yosinski, and Jeﬀ Clune. "Deep Neural Networks are Easily Fooled: High Conﬁdence Predictions for Unrecognizable Images." arXiv preprint then we do our “magic”

Failing ConvNets “Suddenly, a leopard print sofa appears”, rocknrollnerd.github.io

thanks for listening ;) questions? or ﬁnd me @graphiﬁc

• Computer Vision:   Fei-Fei Li & Andrej Karpathy, Stanford course “Convolutional Neural Networks for Visual Recognition”  http://vision.stanford.edu/teaching/cs231n • Natural Language Processing:  Richard Socher, Stanford course “Deep Learning for Natural Language Processing”,  http://cs224d.stanford.edu/ • Neural Nets:  Geoffrey Hinton, Coursera/Toronto, “Neural Networks for Machine Learning"  https://www.coursera.org/course/neuralnets Wanna Play? • Bunch of tutorials:   http://deeplearning.net/tutorial/ • Book:  Yoshua Bengio, et al, “Deep Learning”  http://www.iro.umontreal.ca/~bengioy/dlbook/ • UFLDL Tutorial  http://deeplearning.stanford.edu/tutorial/ • Reading Lists:  http://deeplearning.net/reading-list/   http://memkite.com/deep-learning-bibliography/ • Podcast  Talking Machines, http://www.thetalkingmachines.com/

Python for Image Understanding: Deep Learning with Convolutional Neural Nets

In this document

More Related Content

What's hot

Viewers also liked

Similar to Python for Image Understanding: Deep Learning with Convolutional Neural Nets

More from Roelof Pieters

Recently uploaded

Python for Image Understanding: Deep Learning with Convolutional Neural Nets