Convolutional Neural Networks for Computer vision Applications

Convolutional Neural Networks Alex Conway alex @ numberboost.com & JHB Deep Learning Hackathon Info :) For Computer Vision Applications

Check out the Deep Learning Indaba videos & practicals! http://www.deeplearningindaba.com/videos.html http://www.deeplearningindaba.com/practicals.html

Image Classification 4 http://yann.lecun.com/exdb/mnist/ https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py (99.25% test accuracy in 192 seconds and 70 lines of code)

https://research.googleblog.com/2017/06/supercharg e-your-computer-vision-models.html Object detection

https://www.youtube.com/watch?v=VOC3huqHrss Object detection

Image Captioning & Visual Attention XXX 9 https://einstein.ai/research/knowing-when-to-look-adaptive- attention-via-a-visual-sentinel-for-image-captioning

Image Q&A 10 https://arxiv.org/pdf/1612.00837.pdf

Video Q&A XXX 11 https://www.youtube.com/watch?v=UeheTiBJ0Io

Pix2Pix https://affinelayer.com/pix2pix/ https://github.com/affinelayer/pix2pix- tensorflow 12

Pix2Pix https://medium.com/towards-data- science/face2face-a-pix2pix-demo-that-mimics- the-facial-expression-of-the-german-chancellor- b6771d65bf66 13

14 Original input Pix2pix output Remastered https://hackernoon.co m/remastering-classic- films-in-tensorflow- with-pix2pix- f4d551fa0503

Style Transfer https://github.com/junyanz/CycleGAN 15

Style Transfer https://github.com/junyanz/CycleGAN 16

1. What is a neural network? 2. What is a convolutional neural network? 3. How to use a convolutional neural network 4. More advanced methods 5. Practical tips 6. Hackathon Challenge Info 17

Big Shout Outs Jeremy Howard & Rachel Thomas http://course.fast.ai Andrej Karpathy http://cs231n.github.io François Chollet (Keras lead dev) https://keras.io/ 18

What is a single neuron? 20 • 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3] • Element-wise multiply and sum • Apply activation function f • Often add a bias too (weight of 1) – not

What is an Activation Function? 21 Sigmoid Tanh ReLU Nonlinearities … “squashing functions” … transform neuron’s output NB: sigmoid output in [0,1]

What is a Deep Neural Network? 22 Inputs outputs hidden layer 1 hidden layer 2 hidden layer 3 Outputs of one layer are inputs into the next layer

How does a neural network learn? 23 • You need labelled examples “training data” • Initially, the network makes random predictions (weights initialized randomly) • For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (“loss function” e.g. mean squared error) • Using a method called ‘backpropagation’ (really just the chain rule from calculus 1), we use the error to update the weights (using ‘gradient descent’) so that the error is a little bit smaller next time (it learns from past errors)

http://playground.tensorflow.org

What is a Neural Network? For much more detail, see: 1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html 2. Anrej Karpathy’s CS231n Notes http://neuralnetworksanddeeplearning.com/chap1.html 25

2. What is a convolutional neural network?

What is a Convolutional Neural Network? 27 “like a simple neural network but with special types of layers that work well on images” (math works on numbers) • Pixel = 3 colour channels (R, G, B) • Pixel intensity = number in [0,255] • Image has width w and height h • Therefore image is w x h x 3 numbers

28 This is VGGNet – don’t panic, we’ll break it down piece by piece Example Architecture

Convolutions 29 http://setosa.io/ev/image-kernels/

Convolutions 30 http://deeplearning.net/software/theano/tutorial/conv_arithmetic.ht ml

New Layer Type: Convolutional Layer 31 • 2-d weighted average when multiply kernel over pixel patches • We slide the kernel over all pixels of the image (handle borders) • Kernel starts off with “random” values and network updates (learns) the kernel values (using backpropagation) to try minimize loss • Kernels shared across the whole image (parameter

Many Kernels = Many “Activation Maps” = Volume 32http://cs231n.github.io/convolutional-networks/

Convolutions 33 https://github.com/fchollet/keras/blob/master/examples/conv_filter_visuali zation.py

Great vid 37 https://www.youtube.com/watch?v=AgkfIQ4IGaM

New Layer Type: Max Pooling 38

New Layer Type: Max Pooling • Reduces dimensionality from one layer to next • By replacing NxN sub-area with max value • Makes network “look” at larger areas of the image at a time e.g. Instead of identifying fur, identify cat • Reduces computational load • Reduces overfitting since losing information helps the network generalize 39

Softmax • Convert scores ∈ ℝ to probabilities ∈ [0,1] • Then predict the class with highest probability 40

Bringing it all together xxx 41 Convolution + max pooling + fully connected + softmax

Bringing it all together 42 Convolution + max pooling + fully connected + softmax

We need labelled training data!

ImageNet 44 http://image-net.org/explore

3. How to use a convolutional neural network

Using a Pre-Trained ImageNet-Winning CNN 49 • We’ve been looking at “VGGNet” • Oxford Visual Geometry Group (VGG) • The runner-up in ILSVRC 2014 • Network is 16 layers (deep!) • Its main contribution was in showing that the depth of the network is a critical component for good performance. • Only 3x3 (stride 1 pad 1) convolutions and 2x2 max pooling • Easy to fine-tune https://blog.keras.io/building-powerful-image-classification- models-using-very-little-data.html

Example: Classifying Product Images 50 https://github.com/alexcnwy/CTDL_CNN_TALK_20 170620

51 https://blog.keras.io/building-powerful-image-classification- models-using-very-little-data.html

Fine-tuning A CNN To Solve A New Problem • Cut off last layer of pre-trained Imagenet winning CNN • Keep learned network but replace final layer • Can learn to predict different classes • Fine-tuning is re-training new final layer 52

Fine-tuning A CNN To Solve A New Problem • Fix weights in convolutional layers (set trainable=False) • Remove final dense layer that predicts 1000 imagenet • Replace with new dense layer to predict 9 categories 55 Gets 88% accuracy in classifying products into categories

Visual Similarity 56 • Chop off last 2 VGG layers • Use dense layer with 4096 activations • Compute nearest neighbours in the space of these activations https://memeburn.com/2017/06/spree-image-search/

57 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf Lots of Computer Vision Tasks

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015 Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015 Semantic Segmentation

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf Object detection

http://blog.romanofoti.com/style_transfer/ https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py Style Transfer Loss = content_loss + style_loss Content loss = convolutions from pre-trained network Style loss = gram matrix from style image convolutions

http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition- winners-interview-2nd-place-felix-lau/ Computer Vision Pipelines https://flyyufelix.github.io/2017/04/16/kaggle- nature-conservancy.html

Practical Tips • use a GPU – AWS p2 instances (use spot!) • when overfitting (validation_accuracy <<< training_accuracy) – Increase dropout – Early stopping • when underfitting (low training_accuracy) 1. Add more data 2. Use data augmentation – Flipping / stretching / rotating – Slightly change hues / brightness 3. Use more complex model – Resnet / Inception / Densenet – Ensamble (average <<< weighted average with learned weights) 70

Hackathon Details Challenge: Detect potholes in images Binary classification problem Given +- 5000 training images with labels Given number of potholes + bounding box annotations Given convolutional output for each image (in train & test) Submit predictions on held-out test images MOST ACCURATE MODEL WINS! 74

Hackathon Details Can form teams of up to 4 people You need to present your approach (2 min max) Prizes for best presentation / most innovative approach (even if not most accurate) Don’t worry if you don’t know any of this stuff – it’s a great opportunity to learn! Hackathons are fun!!!! 75

Why this is an important problem Potholes cause road deaths: “The Automobile Association (AA) said that if there was proper maintenance of our roads, then there could be an immediate decrease in about 5% of road deaths … South Africa’s has more than 700000 accidents occurring annually.” http://www.roadcover.co.za/potholes-how-they-worsen-our-roads/ 35’000 lives!! 2 levels: • Alert the road authorities where they are to fix quickly • Real time pothole avoidance (much harder!) 76 Big thanks to Dr. Steve Kroon for suggesting and helping with the dataset

See you tomorrow! Same venue (THE DIZ 111 Smit Street) 9am – 830pm

QUESTIONS? Email me :) Alex Conway alex @ numberboost.com

Convolutional Neural Networks for Computer vision Applications

More Related Content

What's hot

Similar to Convolutional Neural Networks for Computer vision Applications

Recently uploaded

Convolutional Neural Networks for Computer vision Applications

Editor's Notes