0% found this document useful (0 votes)
811 views557 pages

Deep Learning With Keras and Tensorflow

Uploaded by

Cleiber Nichida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
811 views557 pages

Deep Learning With Keras and Tensorflow

Uploaded by

Cleiber Nichida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 557

Deep Learning with Keras and Tensorflow 1.

1
Requirements 1.2
1.1 Introduction - Deep Learning and ANN 1.3
1.1.1 Perceptron and Adaline 1.4
1.1.2 MLP and MNIST 1.5
2.1 Introduction - Theano 1.6
2.2 Introduction - Tensorflow 1.7
2.3 Introduction to Keras 1.8
2.3.1 Keras Backend 1.9
3.0 - MNIST Dataset 1.10
3.1 Hidden Layer Representation and Embeddings 1.11
4.1 Convolutional Neural Networks 1.12
4.2. MNIST CNN 1.13
4.3 CIFAR10 CNN 1.14
4.4 Deep Convolutional Neural Networks 1.15
5.1 HyperParameter Tuning 1.16
5.3 Transfer Learning & Fine-Tuning 1.17
5.3.1 Keras and TF Integration 1.18
6.1. AutoEncoders and Embeddings 1.19
6.2 NLP and Deep Learning 1.20
7.1 RNN and LSTM 1.21
7.2 LSTM for Sentence Generation 1.22
8.1 Custom Layer 1.23
8.2 Multi-Modal Networks 1.24
Conclusions 1.25
Deep Learning with Keras and
Tensorflow
Author: Valerio Maggio
PostDoc Data Scientist @ FBK/MPBA
Contacts:

@leriomaggio +ValerioMaggio

valeriomaggio vmaggio_at_fbk_dot_eu

git clone [Link]


[Link]

Part I: Introduction
Intro to Artificial Neural Networks
Perceptron and MLP
naive pure-Python implementation
fast forward, sgd, backprop
Introduction to Deep Learning Frameworks
Intro to Theano
Intro to Tensorflow
Intro to Keras
Overview and main features
Overview of the core layers
Multi-Layer Perceptron and Fully Connected
Examples with
[Link] and Dense
Keras Backend
Part II: Supervised Learning
Fully Connected Networks and Embeddings
Intro to MNIST Dataset
Hidden Leayer Representation and Embeddings
Convolutional Neural Networks
meaning of convolutional filters
examples from ImageNet
Visualising ConvNets
Advanced CNN
Dropout
MaxPooling
Batch Normalisation
HandsOn: MNIST Dataset
FC and MNIST
CNN and MNIST
Deep Convolutiona Neural Networks with Keras
(ref: [Link] )
VGG16
VGG19
ResNet50
Transfer Learning and FineTuning
Hyperparameters Optimisation
Part III: Unsupervised Learning
AutoEncoders and Embeddings
AutoEncoders and MNIST
word2vec and doc2vec (gensim) with
[Link]
word2vec and CNN
Part IV: Recurrent Neural Networks
Recurrent Neural Network in Keras
SimpleRNN , LSTM , GRU
LSTM for Sentence Generation
PartV: Additional Materials:
Custom Layers in Keras
Multi modal Network Topologies with Keras
Requirements
This tutorial requires the following packages:
Python version 3.5
Python 3.4 should be fine as well
likely Python 2.7 would be also fine, but who knows? :P
numpy version 1.10 or later: [Link]

scipy version 0.16 or later: [Link]


matplotlib version 1.4 or later: [Link]
pandas version 0.16 or later: [Link]
scikit-learn version 0.15 or later: [Link]
keras version 2.0 or later: [Link]
tensorflow version 1.0 or later:
[Link]
ipython / jupyter version 4.0 or later, with notebook
support
(Optional but recommended):
pyyaml
hdf5 and h5py (required if you use model
saving/loading functions in keras)
NVIDIA cuDNN if you have NVIDIA GPUs on your
machines. [Link]
download
The easiest way to get (most) these is to use an all-in-one
installer such as Anaconda from Continuum. These are
available for multiple architectures.
Python Version
I'm currently running this tutorial with Python 3 on Anaconda

!python --version

Python 3.5.2
Setting the Environment
In this repository, files to re-create virtual env with conda are
provided for Linux and OSX systems, namely
[Link] and [Link] ,
respectively.
To re-create the virtual environments (on Linux, for example):

conda env create -f [Link]

For OSX, just change the filename, accordingly.

Notes about Installing Theano with GPU support


NOTE: Read this section only if after pip installing theano , it
raises error in enabling the GPU support!
Since version 0.9 Theano introduced the libgpuarray in
the stable release (it was previously only available in the
development version).
The goal of libgpuarray is (from the documentation) make a
common GPU ndarray (n dimensions array) that can be reused
by all projects that is as future proof as possible, while keeping
it easy to use for simple need/quick test.
Here are some useful tips (hopefully) I came up with to properly
install and configure theano on (Ubuntu) Linux with GPU
support:
1) [If you're using Anaconda] conda install theano pygpu
should be just fine!
Sometimes it is suggested to install pygpu using the
conda-forge channel:

conda install -c conda-forge pygpu

2) [Works with both Anaconda Python or Official CPython]


Install libgpuarray from source: Step-by-step install
libgpuarray user library
Then, install pygpu from source: (in the same source
folder)
python [Link] build && python [Link] install

pip install theano .

After Theano is installed:

echo "[global]
device = cuda
floatX = float32

[lib]
cnmem = 1.0" > ~/.theanorc

Installing Tensorflow
To date tensorflow comes in two different packages, namely
tensorflow and tensorflow-gpu , whether you want to
install the framework with CPU-only or GPU support,
respectively.
For this reason, tensorflow has not been included in the
conda envs and has to be installed separately.

Tensorflow for CPU only:

pip install tensorflow

Tensorflow with GPU support:

pip install tensorflow-gpu

Note: NVIDIA Drivers and CuDNN must be installed and


configured before hand. Please refer to the official Tensorflow
documentation for further details.

Important Note:
All the code provided+ in this tutorial can run even if
tensorflow is not installed, and so using theano as the
(default) backend!
This is exactly the power of Keras!
Therefore, installing tensorflow is not stricly required!

+: Apart from the 1.2 Introduction to Tensorflow tutorial, of


course.

Configure Keras with tensorflow


By default, Keras is configured with theano as backend.

If you want to use tensorflow instead, these are the simple


steps to follow:
1) Create the [Link] (if it does not exist):

touch $HOME/.keras/[Link]

2) Copy the following content into the file:

{
"epsilon": 1e-07,
"backend": "tensorflow",
"floatx": "float32",
"image_data_format": "channels_last"
}

3) Verify it is properly configured:

!cat ~/.keras/[Link]

{
"epsilon": 1e-07,
"backend": "tensorflow",
"floatx": "float32",
"image_data_format": "channels_last"
}
Test if everything is up&running

1. Check import

import numpy as np
import scipy as sp
import pandas as pd
import [Link] as plt
import sklearn

import keras

Using TensorFlow backend.


2. Check installed Versions

import numpy
print('numpy:', numpy.__version__)

import scipy
print('scipy:', scipy.__version__)

import matplotlib
print('matplotlib:', matplotlib.__version__)

import IPython
print('iPython:', IPython.__version__)

import sklearn
print('scikit-learn:', sklearn.__version__)

numpy: 1.11.1
scipy: 0.18.0
matplotlib: 1.5.2
iPython: 5.1.0
scikit-learn: 0.18

import keras
print('keras: ', keras.__version__)

# optional
import theano
print('Theano: ', theano.__version__)
import tensorflow as tf
print('Tensorflow: ', tf.__version__)

keras: 2.0.2
Theano: 0.9.0
Tensorflow: 1.0.1
If everything worked till down here,
you're ready to start!
Introduction to Deep Learning
Deep learning allows computational models that are composed
of multiple processing layers to learn representations of data
with multiple levels of abstraction.
These methods have dramatically improved the state-of-the-art
in speech recognition, visual object recognition, object detection
and many other domains such as drug discovery and
genomics.
Deep learning is one of the leading tools in data analysis these
days and one of the most common frameworks for deep
learning is Keras.
The Tutorial will provide an introduction to deep learning using
keras with practical code examples.
This Section will cover:
Getting a conceptual understanding of multi-layer neural
networks
Training neural networks for image classification
Implementing the powerful backpropagation algorithm
Debugging neural network implementations
Building Blocks: Artificial Neural
Networks (ANN)
In machine learning and cognitive science, an artificial neural
network (ANN) is a network inspired by biological neural
networks which are used to estimate or approximate functions
that can depend on a large number of inputs that are generally
unknown
An ANN is built from nodes (neurons) stacked in layers
between the feature vector and the target vector.
A node in a neural network is built from Weights and Activation
function
An early version of ANN built from one node was called the
Perceptron

The Perceptron is an algorithm for supervised learning of binary


classifiers. functions that can decide whether an input
(represented by a vector of numbers) belongs to one class or
another.
Much like logistic regression, the weights in a neural net are
being multiplied by the input vertor summed up and feeded into
the activation function's input.
A Perceptron Network can be designed to have multiple layers,
leading to the Multi-Layer Perceptron (aka MLP )
Single Layer Neural Network

(Source: Python Machine Learning, S. Raschka)

Weights Update Rule


We use a gradient descent optimization algorithm to learn
the Weights Coefficients of the model.

In every epoch (pass over the training set), we update the


weight vector $w$ using the following update rule:
$$ w = w + \Delta w, \text{where } \Delta w = - \eta \nabla J(w)
$$

In other words, we computed the gradient based on the whole


training set and updated the weights of the model by taking a
step into the opposite direction of the gradient $ \nabla J(w)$.
In order to fin the optimal weights of the model, we optimized
an objective function (e.g. the Sum of Squared Errors (SSE))
cost function $J(w)$.
Furthermore, we multiply the gradient by a factor, the learning
rate $\eta$ , which we choose carefully to balance the speed of
learning against the risk of overshooting the global minimum of
the cost function.

Gradient Descent
In gradient descent optimization, we update all the weights
simultaneously after each epoch, and we define the partial
derivative for each weight $w_j$ in the weight vector $w$ as
follows:
$$ \frac{\partial}{\partial wj} J(w) = \sum{i} ( y^{(i)} - a^{(i)} )
x^{(i)}_j
$$
Note: The superscript $(i)$ refers to the i-th sample. The
subscript $j$ refers to the j-th dimension/feature
Here $y^{(i)}$ is the target class label of a particular sample
$x^{(i)}$ , and $a^{(i)}$ is the activation of the neuron
(which is a linear function in the special case of Perceptron).
We define the activation function $\phi(\cdot)$ as follows:
$$ \phi(z) = z = a = \sum_{j} w_j x_j = \mathbf{w}^T \mathbf{x}
$$
Binary Classification
While we used the activation $\phi(z)$ to compute the gradient
update, we may use a threshold function (Heaviside function)
to squash the continuous-valued output into binary class labels
for prediction:
$$ \hat{y} = \begin{cases} 1 & \text{if } \phi(z) \geq 0 \ 0 &
\text{otherwise} \end{cases}
$$
Building Neural Nets from scratch

Idea:
We will build the neural networks from first principles. We will
create a very simple model and understand how it works. We
will also be implementing backpropagation algorithm.
Please note that this code is not optimized and not to be
used in production.
This is for instructive purpose - for us to understand how ANN
works.
Libraries like theano have highly optimized code.

Perceptron and Adaline Models


Take a look at this notebook : Perceptron and Adaline
If you want a sneak peek of alternate (production ready)
implementation of Perceptron for instance try:

from sklearn.linear_model import Perceptron


Introducing the multi-layer neural
network architecture

(Source: Python Machine Learning, S. Raschka)


Now we will see how to connect multiple single neurons to a
multi-layer feedforward neural network; this special type of
network is also called a multi-layer perceptron (MLP).
The figure shows the concept of an MLP consisting of three
layers: one input layer, one hidden layer, and one output layer.
The units in the hidden layer are fully connected to the input
layer, and the output layer is fully connected to the hidden layer,
respectively.
If such a network has more than one hidden layer, we also
call it a deep artificial neural network.

Notation
we denote the ith activation unit in the lth layer as
$ai^{(l)}$ , and the activation units $a_0^{(1)}$ and $a_0^{(2)}$
are the bias units, respectively, which we set equal to $1$.

The _activation of the units in the input layer is just its input
plus the bias unit:
$$ \mathbf{a}^{(1)} = [a_0^{(1)}, a_1^{(1)}, \ldots, a_m^{(1)}]^T
= [1, x_1^{(i)}, \ldots, x_m^{(i)}]^T
$$

Note: $x_j^{(i)}$ refers to the jth feature/dimension of the ith


sample

Notes on Notation (usually) Adopted


The terminology around the indices (subscripts and
superscripts) may look a little bit confusing at first.

You may wonder why we wrote $w{j,k}^{(l)}$ and not


$w{k,j}^{(l)}$ to refer to the weight coefficient that connects
the kth unit in layer $l$ to the jth unit in layer $l+1$.

What may seem a little bit quirky at first will make much more
sense later when we vectorize the neural network
representation.

For example, we will summarize the weights that connect the


input and hidden layer by a matrix
$$ W^{(1)} \in \mathbb{R}^{h×[m+1]}$$
where $h$ is the number of hidden units and $m + 1$ is the
number of hidden units plus bias unit.

(Source: Python Machine Learning, S. Raschka)


Forward Propagation
Starting at the input layer, we forward propagate the
patterns of the training data through the network to
generate an output.
Based on the network's output, we calculate the error that
we want to minimize using a cost function that we will
describe later.
We backpropagate the error, find its derivative with respect
to each weight in the network, and update the model.

Sigmoid Activation

(Source: Python Machine Learning, S. Raschka)


(Source: Python Machine Learning, S. Raschka)

(Source: Python Machine Learning, S. Raschka)


Backward Propagation
The weights of each neuron are learned by gradient descent,
where each neuron's error is derived with respect to it's weight.

(Source: Python Machine Learning, S. Raschka)


Optimization is done for each layer with respect to the previous
layer in a technique known as BackPropagation.

(The following code is inspired from these terrific notebooks)

# Import the required packages


import numpy as np
import pandas as pd
import matplotlib
import [Link] as plt
import scipy

# Display plots in notebook


%matplotlib inline
# Define plot's default figure size
[Link]['[Link]'] = (10.0,
8.0)

#read the datasets


train = pd.read_csv("../data/intro_to_ann.csv")

X, y = [Link]([Link][:,0:2]),
[Link]([Link][:,2])

[Link]

(500, 2)

[Link]

(500,)
#Let's plot the dataset and see how it is
[Link](X[:,0], X[:,1], s=40, c=y,
cmap=[Link])

<[Link] at
0x10e9329b0>
Start Building our MLP building blocks
Note: This process will eventually result in our own Neural
Networks class

A look at the details

import random
[Link](123)

# calculate a random number where: a <= rand < b


def rand(a, b):
return (b-a)*[Link]() + a

Function to generate a random number, given two numbers


Where will it be used?: When we initialize the neural
networks, the weights have to be randomly assigned.

# Make a matrix
def makeMatrix(I, J, fill=0.0):
return [Link]([I,J])

Define our activation function. Let's use sigmoid


function

# our sigmoid function


def sigmoid(x):
#return [Link](x)
return 1/(1+[Link](-x))

Derivative of our activation function.


Note: We need this when we run the backpropagation algorithm

# derivative of our sigmoid function, in terms of


the output (i.e. y)
def dsigmoid(y):
return y - y**2

Our neural networks class


When we first create a neural networks architecture, we need to
know the number of inputs, number of hidden layers and
number of outputs.
The weights have to be randomly initialized.

class MLP:
def __init__(self, ni, nh, no):
# number of input, hidden, and output
nodes
[Link] = ni + 1 # +1 for bias node
[Link] = nh
[Link] = no

# activations for nodes


[Link] = [1.0]*[Link]
[Link] = [1.0]*[Link]
[Link] = [1.0]*[Link]

# create weights
[Link] = makeMatrix([Link], [Link])
[Link] = makeMatrix([Link], [Link])

# set them to random vaules


[Link] = rand(-0.2, 0.2,
size=[Link])
[Link] = rand(-2.0, 2.0,
size=[Link])

# last change in weights for momentum


[Link] = makeMatrix([Link], [Link])
[Link] = makeMatrix([Link], [Link])

Activation Function

def activate(self, inputs):

if len(inputs) != [Link]-1:
print(inputs)
raise ValueError('wrong number of inputs')

# input activations
for i in range([Link]-1):
[Link][i] = inputs[i]

# hidden activations
for j in range([Link]):
sum_h = 0.0
for i in range([Link]):
sum_h += [Link][i] * [Link][i][j]
[Link][j] = sigmoid(sum_h)

# output activations
for k in range([Link]):
sum_o = 0.0
for j in range([Link]):
sum_o += [Link][j] * [Link][j][k]
[Link][k] = sigmoid(sum_o)

return [Link][:]

BackPropagation

def backPropagate(self, targets, N, M):

if len(targets) != [Link]:
print(targets)
raise ValueError('wrong number of target
values')

# calculate error terms for output


output_deltas = [Link]([Link])
for k in range([Link]):
error = targets[k]-[Link][k]
output_deltas[k] = dsigmoid([Link][k]) *
error

# calculate error terms for hidden


hidden_deltas = [Link]([Link])
for j in range([Link]):
error = 0.0
for k in range([Link]):
error += output_deltas[k]*[Link][j]
[k]
hidden_deltas[j] = dsigmoid([Link][j]) *
error

# update output weights


for j in range([Link]):
for k in range([Link]):
change = output_deltas[k] * [Link][j]
[Link][j][k] += N*change +
M*[Link][j][k]
[Link][j][k] = change

# update input weights


for i in range([Link]):
for j in range([Link]):
change = hidden_deltas[j]*[Link][i]
[Link][i][j] += N*change +
M*[Link][i][j]
[Link][i][j] = change

# calculate error
error = 0.0
for k in range(len(targets)):
error += 0.5*(targets[k]-[Link][k])**2
return error

# Putting all together

class MLP:
def __init__(self, ni, nh, no):
# number of input, hidden, and output
nodes
[Link] = ni + 1 # +1 for bias node
[Link] = nh
[Link] = no

# activations for nodes


[Link] = [1.0]*[Link]
[Link] = [1.0]*[Link]
[Link] = [1.0]*[Link]

# create weights
[Link] = makeMatrix([Link], [Link])
[Link] = makeMatrix([Link], [Link])

# set them to random vaules


for i in range([Link]):
for j in range([Link]):
[Link][i][j] = rand(-0.2, 0.2)
for j in range([Link]):
for k in range([Link]):
[Link][j][k] = rand(-2.0, 2.0)

# last change in weights for momentum


[Link] = makeMatrix([Link], [Link])
[Link] = makeMatrix([Link], [Link])

def backPropagate(self, targets, N, M):

if len(targets) != [Link]:
print(targets)
raise ValueError('wrong number of
target values')

# calculate error terms for output


output_deltas = [Link]([Link])
for k in range([Link]):
error = targets[k]-[Link][k]
output_deltas[k] =
dsigmoid([Link][k]) * error

# calculate error terms for hidden


hidden_deltas = [Link]([Link])
for j in range([Link]):
error = 0.0
for k in range([Link]):
error +=
output_deltas[k]*[Link][j][k]
hidden_deltas[j] =
dsigmoid([Link][j]) * error

# update output weights


for j in range([Link]):
for k in range([Link]):
change = output_deltas[k] *
[Link][j]
[Link][j][k] += N*change +
M*[Link][j][k]
[Link][j][k] = change

# update input weights


for i in range([Link]):
for j in range([Link]):
change =
hidden_deltas[j]*[Link][i]
[Link][i][j] += N*change +
M*[Link][i][j]
[Link][i][j] = change

# calculate error
error = 0.0
for k in range(len(targets)):
error += 0.5*(targets[k]-
[Link][k])**2
return error

def test(self, patterns):


[Link] = [Link]([len(patterns),
[Link]])
for i, p in enumerate(patterns):
[Link][i] = [Link](p)
#[Link][i] = [Link](p[0])

def activate(self, inputs):

if len(inputs) != [Link]-1:
print(inputs)
raise ValueError('wrong number of
inputs')

# input activations
for i in range([Link]-1):
[Link][i] = inputs[i]

# hidden activations
for j in range([Link]):
sum_h = 0.0
for i in range([Link]):
sum_h += [Link][i] * [Link][i]
[j]
[Link][j] = sigmoid(sum_h)

# output activations
for k in range([Link]):
sum_o = 0.0
for j in range([Link]):
sum_o += [Link][j] * [Link][j]
[k]
[Link][k] = sigmoid(sum_o)
return [Link][:]

def train(self, patterns, iterations=1000,


N=0.5, M=0.1):
# N: learning rate
# M: momentum factor
patterns = list(patterns)
for i in range(iterations):
error = 0.0
for p in patterns:
inputs = p[0]
targets = p[1]
[Link](inputs)
error +=
[Link]([targets], N, M)
if i % 5 == 0:
print('error in interation %d :
%-.5f' % (i,error))
print('Final training error: %-.5f' %
error)

Running the model on our dataset

# create a network with two inputs, one hidden,


and one output nodes
ann = MLP(2, 1, 1)

%timeit -n 1 -r 1 [Link](zip(X,y),
iterations=2)

error in interation 0 : 53.62995


Final training error: 53.62995
Final training error: 47.35136
1 loop, best of 1: 36.7 ms per loop

Predicting on training dataset and measuring in-


sample accuracy

%timeit -n 1 -r 1 [Link](X)

1 loop, best of 1: 11.8 ms per loop

prediction = [Link](data=[Link]([y,
[Link]([Link])]).T,
columns=["actual",
"prediction"])
[Link]()

actual prediction
0 1.0 0.491100
1 1.0 0.495469
2 0.0 0.097362
3 0.0 0.400006
4 1.0 0.489664

[Link]([Link])
0.076553078113180129

Let's visualize and observe the results

# Helper function to plot a decision boundary.


# This generates the contour plot to show the
decision boundary visually
def plot_decision_boundary(nn_model):
# Set min and max values and give it some
padding
x_min, x_max = X[:, 0].min() - .5, X[:,
0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:,
1].max() + .5
h = 0.01
# Generate a grid of points with distance h
between them
xx, yy = [Link]([Link](x_min, x_max,
h),
[Link](y_min, y_max,
h))
# Predict the function value for the whole gid
nn_model.test(np.c_[[Link](), [Link]()])
Z = nn_model.predict
Z[Z>=0.5] = 1
Z[Z<0.5] = 0
Z = [Link]([Link])
# Plot the contour and training examples
[Link](xx, yy, Z, cmap=[Link])
[Link](X[:, 0], X[:, 1], s=40, c=y,
cmap=[Link])
plot_decision_boundary(ann)
[Link]("Our initial model")

<[Link] at 0x110bafd68>

Exercise:
Create Neural networks with 10 hidden nodes on the above
code.
What's the impact on accuracy?
# Put your code here
#(or load the solution if you wanna cheat :-)

# %load ../solutions/sol_111.py

Exercise:
Train the neural networks by increasing the epochs.
What's the impact on accuracy?

#Put your code here

# %load ../solutions/sol_112.py
Addendum
There is an additional notebook in the repo, i.e. MLP and
MNIST for a more complete (but still naive implementation) of
SGD and MLP applied on MNIST dataset.
Another terrific reference to start is the online book
[Link] Highly
recommended!
Perceptron and Adaline
(exceprt from Python Machine Learning Essentials,
Supplementary Materials)

Sections
Implementing a perceptron learning algorithm in Python
Training a perceptron model on the Iris dataset
Adaptive linear neurons and the convergence of learning
Implementing an adaptive linear neuron in Python

# Display plots in notebook


%matplotlib inline
# Define plot's default figure size
import matplotlib
Implementing a perceptron learning
algorithm in Python
[back to top]

import numpy as np

class Perceptron(object):
"""Perceptron classifier.

Parameters
------------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.

Attributes
-----------
w_ : 1d-array
Weights after fitting.
errors_ : list
Number of misclassifications in every
epoch.

"""
def __init__(self, eta=0.01, n_iter=10):
[Link] = eta
self.n_iter = n_iter

def fit(self, X, y):


"""Fit training data.
Parameters
----------
X : {array-like}, shape = [n_samples,
n_features]
Training vectors, where n_samples is
the number of samples and
n_features is the number of features.
y : array-like, shape = [n_samples]
Target values.

Returns
-------
self : object

"""
self.w_ = [Link](1 + [Link][1])
self.errors_ = []

for _ in range(self.n_iter):
errors = 0
for xi, target in zip(X, y):
update = [Link] * (target -
[Link](xi))
self.w_[1:] += update * xi
self.w_[0] += update
errors += int(update != 0.0)
self.errors_.append(errors)
return self

def net_input(self, X):


"""Calculate net input"""
return [Link](X, self.w_[1:]) + self.w_[0]

def predict(self, X):


"""Return class label after unit step"""
return [Link](self.net_input(X) >= 0.0,
1, -1)

Training a perceptron model on the Iris dataset


[back to top]

Reading-in the Iris data

import numpy as np
import pandas as pd
from [Link] import load_iris

iris = load_iris()
X = [Link]
y = [Link]
data = [Link]((X, y[:, [Link]]))

labels = iris.target_names
features = iris.feature_names

df = [Link](data,
columns=iris.feature_names+['label'])
[Link] = [Link]({k:v for k,v in
enumerate(labels)})
[Link]()

sepal sepal petal petal


length width length width label
(cm) (cm) (cm) (cm)
145 6.7 3.0 5.2 2.3 virginica
146 6.3 2.5 5.0 1.9 virginica
147 6.5 3.0 5.2 2.0 virginica
148 6.2 3.4 5.4 2.3 virginica
149 5.9 3.0 5.1 1.8 virginica

Plotting the Iris data

import numpy as np
import [Link] as plt

# select setosa and versicolor


y = [Link][0:100, 4].values
y = [Link](y == 'setosa', -1, 1)

# extract sepal length and petal length


X = [Link][0:100, [0, 2]].values

# plot data
[Link](X[:50, 0], X[:50, 1],
color='red', marker='o',
label='setosa')
[Link](X[50:100, 0], X[50:100, 1],
color='blue', marker='x',
label='versicolor')

[Link]('petal length [cm]')


[Link]('sepal length [cm]')
[Link](loc='upper left')
[Link]()

Training the perceptron model

ppn = Perceptron(eta=0.1, n_iter=10)

[Link](X, y)

[Link](range(1, len(ppn.errors_) + 1),


ppn.errors_, marker='o')
[Link]('Epochs')
[Link]('Number of misclassifications')

plt.tight_layout()
[Link]()
A function for plotting decision regions

from [Link] import ListedColormap

def plot_decision_regions(X, y, classifier,


resolution=0.02):

# setup marker generator and color map


markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray',
'cyan')
cmap =
ListedColormap(colors[:len([Link](y))])

# plot the decision surface


x1_min, x1_max = X[:, 0].min() - 1, X[:,
0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:,
1].max() + 1
xx1, xx2 = [Link]([Link](x1_min,
x1_max, resolution),
[Link](x2_min, x2_max,
resolution))
Z = [Link]([Link]([[Link](),
[Link]()]).T)
Z = [Link]([Link])
[Link](xx1, xx2, Z, alpha=0.4,
cmap=cmap)
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())

# plot class samples


for idx, cl in enumerate([Link](y)):
[Link](x=X[y == cl, 0], y=X[y == cl,
1],
alpha=0.8, c=cmap(idx),
marker=markers[idx], label=cl)

plot_decision_regions(X, y, classifier=ppn)
[Link]('sepal length [cm]')
[Link]('petal length [cm]')
[Link](loc='upper left')

plt.tight_layout()
[Link]()
Adaptive linear neurons and the
convergence of learning
[back to top]

Implementing an adaptive linear neuron in


Python

class AdalineGD(object):
"""ADAptive LInear NEuron classifier.

Parameters
------------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.

Attributes
-----------
w_ : 1d-array
Weights after fitting.
errors_ : list
Number of misclassifications in every
epoch.

"""
def __init__(self, eta=0.01, n_iter=50):
[Link] = eta
self.n_iter = n_iter

def fit(self, X, y):


""" Fit training data.

Parameters
----------
X : {array-like}, shape = [n_samples,
n_features]
Training vectors, where n_samples is
the number of samples and
n_features is the number of features.
y : array-like, shape = [n_samples]
Target values.

Returns
-------
self : object

"""
self.w_ = [Link](1 + [Link][1])
self.cost_ = []

for i in range(self.n_iter):
output = self.net_input(X)
errors = (y - output)
self.w_[1:] += [Link] *
[Link](errors)
self.w_[0] += [Link] * [Link]()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self

def net_input(self, X):


"""Calculate net input"""
return [Link](X, self.w_[1:]) + self.w_[0]

def activation(self, X):


"""Compute linear activation"""
return self.net_input(X)
def predict(self, X):
"""Return class label after unit step"""
return [Link]([Link](X) >= 0.0,
1, -1)

fig, ax = [Link](nrows=1, ncols=2, figsize=


(8, 4))

ada1 = AdalineGD(n_iter=10, eta=0.01).fit(X, y)


ax[0].plot(range(1, len(ada1.cost_) + 1),
np.log10(ada1.cost_), marker='o')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Adaline - Learning rate 0.01')

ada2 = AdalineGD(n_iter=10, eta=0.0001).fit(X, y)


ax[1].plot(range(1, len(ada2.cost_) + 1),
ada2.cost_, marker='o')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Adaline - Learning rate 0.0001')

plt.tight_layout()
[Link]()
Standardizing features and re-training adaline

# standardize features
X_std = [Link](X)
X_std[:,0] = (X[:,0] - X[:,0].mean()) /
X[:,0].std()
X_std[:,1] = (X[:,1] - X[:,1].mean()) /
X[:,1].std()

ada = AdalineGD(n_iter=15, eta=0.01)


[Link](X_std, y)

plot_decision_regions(X_std, y, classifier=ada)
[Link]('Adaline - Gradient Descent')
[Link]('sepal length [standardized]')
[Link]('petal length [standardized]')
[Link](loc='upper left')
plt.tight_layout()
[Link]()

[Link](range(1, len(ada.cost_) + 1), ada.cost_,


marker='o')
[Link]('Epochs')
[Link]('Sum-squared-error')

plt.tight_layout()
[Link]()
Large scale machine learning and stochastic
gradient descent
[back to top]

from [Link] import seed

class AdalineSGD(object):
"""ADAptive LInear NEuron classifier.

Parameters
------------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.

Attributes
-----------
w_ : 1d-array
Weights after fitting.
errors_ : list
Number of misclassifications in every
epoch.
shuffle : bool (default: True)
Shuffles training data every epoch if True
to prevent cycles.
random_state : int (default: None)
Set random state for shuffling and
initializing the weights.

"""
def __init__(self, eta=0.01, n_iter=10,
shuffle=True, random_state=None):
[Link] = eta
self.n_iter = n_iter
self.w_initialized = False
[Link] = shuffle
if random_state:
seed(random_state)

def fit(self, X, y):


""" Fit training data.

Parameters
----------
X : {array-like}, shape = [n_samples,
n_features]
Training vectors, where n_samples is
the number of samples and
n_features is the number of features.
y : array-like, shape = [n_samples]
Target values.

Returns
-------
self : object

"""
self._initialize_weights([Link][1])
self.cost_ = []
for i in range(self.n_iter):
if [Link]:
X, y = self._shuffle(X, y)
cost = []
for xi, target in zip(X, y):

[Link](self._update_weights(xi, target))
avg_cost = sum(cost)/len(y)
self.cost_.append(avg_cost)
return self

def partial_fit(self, X, y):


"""Fit training data without
reinitializing the weights"""
if not self.w_initialized:
self._initialize_weights([Link][1])
if [Link]().shape[0] > 1:
for xi, target in zip(X, y):
self._update_weights(xi, target)
else:
self._update_weights(X, y)
return self

def _shuffle(self, X, y):


"""Shuffle training data"""
r = [Link](len(y))
return X[r], y[r]

def _initialize_weights(self, m):


"""Initialize weights to zeros"""
self.w_ = [Link](1 + m)
self.w_initialized = True

def _update_weights(self, xi, target):


"""Apply Adaline learning rule to update
the weights"""
output = self.net_input(xi)
error = (target - output)
self.w_[1:] += [Link] * [Link](error)
self.w_[0] += [Link] * error
cost = 0.5 * error**2
return cost

def net_input(self, X):


"""Calculate net input"""
return [Link](X, self.w_[1:]) + self.w_[0]

def activation(self, X):


"""Compute linear activation"""
return self.net_input(X)

def predict(self, X):


"""Return class label after unit step"""
return [Link]([Link](X) >= 0.0,
1, -1)

ada = AdalineSGD(n_iter=15, eta=0.01,


random_state=1)
[Link](X_std, y)

plot_decision_regions(X_std, y, classifier=ada)
[Link]('Adaline - Stochastic Gradient Descent')
[Link]('sepal length [standardized]')
[Link]('petal length [standardized]')
[Link](loc='upper left')

plt.tight_layout()
[Link]()

[Link](range(1, len(ada.cost_) + 1), ada.cost_,


marker='o')
[Link]('Epochs')
[Link]('Average Cost')

plt.tight_layout()
[Link]()
ada.partial_fit(X_std[0, :], y[0])

<__main__.AdalineSGD at 0x112023cf8>
MLP and MNIST
(exceprt from Python Machine Learning Essentials,
Supplementary Materials)
Sections
Classifying handwritten digits
Obtaining the MNIST dataset
Implementing a multi-layer perceptron
Training an artificial neural network
Debugging neural networks with gradient checking
Classifying handwritten digits

Obtaining the MNIST dataset


[back to top]
The MNIST dataset is publicly available at
[Link] and consists of the following
four parts:
Training set images: [Link] (9.9 MB, 47
MB unzipped, 60,000 samples)
Training set labels: [Link] (29 KB, 60 KB
unzipped, 60,000 labels)
Test set images: [Link] (1.6 MB, 7.8
MB, 10,000 samples)
Test set labels: [Link] (5 KB, 10 KB
unzipped, 10,000 labels)
In this section, we will only be working with a subset of MNIST,
thus, we only need to download the training set images and
training set labels. After downloading the files, I recommend
unzipping the files using the Unix/Linux gzip tool from the
terminal for efficiency, e.g., using the command

gzip *[Link] -d

in your local MNIST download directory, or, using your favorite


unzipping tool if you are working with a machine running on
Microsoft Windows. The images are stored in byte form, and
using the following function, we will read them into NumPy
arrays that we will use to train our MLP.

Get MNIST Dataset


Note: The following commands will work on Linux/Unix (e.g.
Mac OSX) Platforms

!mkdir -p ../data/mnist

!curl [Link]
[Link] --output ../data/mnist/train-
[Link]

!curl [Link]
[Link] --output ../data/mnist/train-
[Link]

!curl [Link]
[Link] --output ../data/mnist/t10k-
[Link]

!curl [Link]
[Link] --output ../data/mnist/t10k-
[Link]

Load MNIST Data


import os
import struct
import numpy as np

def load_mnist(path, kind='train'):


"""Load MNIST data from `path`"""
labels_path = [Link](path,
'%s-labels-idx1-
ubyte'
% kind)
images_path = [Link](path,
'%s-images-idx3-
ubyte'
% kind)

with open(labels_path, 'rb') as lbpath:


magic, n = [Link]('>II',
[Link](8))
labels = [Link](lbpath,
dtype=np.uint8)

with open(images_path, 'rb') as imgpath:


magic, num, rows, cols =
[Link](">IIII",

[Link](16))
images = [Link](imgpath,

dtype=np.uint8).reshape(len(labels), 784)

return images, labels

X_train, y_train = load_mnist('data/mnist',


kind='train')
print('Rows: %d, columns: %d' % (X_train.shape[0],
X_train.shape[1]))

Rows: 60000, columns: 784

X_test, y_test = load_mnist('data/mnist',


kind='t10k')
print('Rows: %d, columns: %d' % (X_test.shape[0],
X_test.shape[1]))

Rows: 10000, columns: 784

Visualize the first digit of each class:

import [Link] as plt


%matplotlib inline

fig, ax = [Link](nrows=2, ncols=5,


sharex=True, sharey=True,)
ax = [Link]()
for i in range(10):
img = X_train[y_train == i][0].reshape(28, 28)
ax[i].imshow(img, cmap='Greys',
interpolation='nearest')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# [Link]('./figures/mnist_all.png', dpi=300)
[Link]()
Visualize 25 different versions of "7":

fig, ax = [Link](nrows=5, ncols=5,


sharex=True, sharey=True,)
ax = [Link]()
for i in range(25):
img = X_train[y_train == 7][i].reshape(28, 28)
ax[i].imshow(img, cmap='Greys',
interpolation='nearest')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# [Link]('./figures/mnist_7.png', dpi=300)
[Link]()
Uncomment the following lines to optionally save the data in
CSV format. However, note that those CSV files will take up a
substantial amount of storage space:
train_img.csv 1.1 GB (gigabytes)
train_labels.csv 1.4 MB (megabytes)
test_img.csv 187.0 MB
test_labels 144 KB (kilobytes)

#[Link]('train_img.csv', X_train, fmt='%i',


delimiter=',')
#[Link]('train_labels.csv', y_train, fmt='%i',
delimiter=',')
X_train = [Link]('train_img.csv',
dtype=int, delimiter=',')
y_train = [Link]('train_labels.csv',
dtype=int, delimiter=',')

#[Link]('test_img.csv', X_test, fmt='%i',


delimiter=',')
#[Link]('test_labels.csv', y_test, fmt='%i',
delimiter=',')
X_test = [Link]('test_img.csv', dtype=int,
delimiter=',')
y_test = [Link]('test_labels.csv',
dtype=int, delimiter=',')
Implementing a multi-layer perceptron
[back to top]

import numpy as np
from [Link] import expit
import sys

class NeuralNetMLP(object):
""" Feedforward neural network / Multi-layer
perceptron classifier.

Parameters
------------
n_output : int
Number of output units, should be equal to
the
number of unique class labels.

n_features : int
Number of features (dimensions) in the
target dataset.
Should be equal to the number of columns in
the X array.

n_hidden : int (default: 30)


Number of hidden units.

l1 : float (default: 0.0)


Lambda value for L1-regularization.
No regularization if l1=0.0 (default)

l2 : float (default: 0.0)


Lambda value for L2-regularization.
No regularization if l2=0.0 (default)

epochs : int (default: 500)


Number of passes over the training set.

eta : float (default: 0.001)


Learning rate.

alpha : float (default: 0.0)


Momentum constant. Factor multiplied with
the
gradient of the previous epoch t-1 to
improve
learning speed
w(t) := w(t) - (grad(t) + alpha*grad(t-1))

decrease_const : float (default: 0.0)


Decrease constant. Shrinks the learning rate
after each epoch via eta / (1 +
epoch*decrease_const)

shuffle : bool (default: False)


Shuffles training data every epoch if True
to prevent circles.

minibatches : int (default: 1)


Divides training data into k minibatches for
efficiency.
Normal gradient descent learning if k=1
(default).

random_state : int (default: None)


Set random state for shuffling and
initializing the weights.

Attributes
-----------
cost_ : list
Sum of squared errors after each epoch.

"""
def __init__(self, n_output, n_features,
n_hidden=30,
l1=0.0, l2=0.0, epochs=500,
eta=0.001,
alpha=0.0, decrease_const=0.0,
shuffle=True,
minibatches=1,
random_state=None):

[Link](random_state)
self.n_output = n_output
self.n_features = n_features
self.n_hidden = n_hidden
self.w1, self.w2 =
self._initialize_weights()
self.l1 = l1
self.l2 = l2
[Link] = epochs
[Link] = eta
[Link] = alpha
self.decrease_const = decrease_const
[Link] = shuffle
[Link] = minibatches

def _encode_labels(self, y, k):


"""Encode labels into one-hot
representation

Parameters
------------
y : array, shape = [n_samples]
Target values.
Returns
-----------
onehot : array, shape = (n_labels,
n_samples)

"""
onehot = [Link]((k, [Link][0]))
for idx, val in enumerate(y):
onehot[val, idx] = 1.0
return onehot

def _initialize_weights(self):
"""Initialize weights with small random
numbers."""
w1 = [Link](-1.0, 1.0,
size=self.n_hidden*(self.n_features + 1))
w1 = [Link](self.n_hidden,
self.n_features + 1)
w2 = [Link](-1.0, 1.0,
size=self.n_output*(self.n_hidden + 1))
w2 = [Link](self.n_output,
self.n_hidden + 1)
return w1, w2

def _sigmoid(self, z):


"""Compute logistic function (sigmoid)

Uses [Link] to avoid overflow


error for very small input values z.

"""
# return 1.0 / (1.0 + [Link](-z))
return expit(z)

def _sigmoid_gradient(self, z):


"""Compute gradient of the logistic
function"""
sg = self._sigmoid(z)
return sg * (1 - sg)

def _add_bias_unit(self, X, how='column'):


"""Add bias unit (column or row of 1s) to
array at index 0"""
if how == 'column':
X_new = [Link](([Link][0],
[Link][1]+1))
X_new[:, 1:] = X
elif how == 'row':
X_new = [Link](([Link][0]+1,
[Link][1]))
X_new[1:, :] = X
else:
raise AttributeError('`how` must be
`column` or `row`')
return X_new

def _feedforward(self, X, w1, w2):


"""Compute feedforward step

Parameters
-----------
X : array, shape = [n_samples, n_features]
Input layer with original features.

w1 : array, shape = [n_hidden_units,


n_features]
Weight matrix for input layer -> hidden
layer.

w2 : array, shape = [n_output_units,


n_hidden_units]
Weight matrix for hidden layer -> output
layer.
Returns
----------
a1 : array, shape = [n_samples,
n_features+1]
Input values with bias unit.

z2 : array, shape = [n_hidden, n_samples]


Net input of hidden layer.

a2 : array, shape = [n_hidden+1,


n_samples]
Activation of hidden layer.

z3 : array, shape = [n_output_units,


n_samples]
Net input of output layer.

a3 : array, shape = [n_output_units,


n_samples]
Activation of output layer.

"""
a1 = self._add_bias_unit(X, how='column')
z2 = [Link](a1.T)
a2 = self._sigmoid(z2)
a2 = self._add_bias_unit(a2, how='row')
z3 = [Link](a2)
a3 = self._sigmoid(z3)
return a1, z2, a2, z3, a3

def _L2_reg(self, lambda_, w1, w2):


"""Compute L2-regularization cost"""
return (lambda_/2.0) * ([Link](w1[:, 1:]
** 2) + [Link](w2[:, 1:] ** 2))

def _L1_reg(self, lambda_, w1, w2):


"""Compute L1-regularization cost"""
return (lambda_/2.0) * ([Link](w1[:,
1:]).sum() + [Link](w2[:, 1:]).sum())

def _get_cost(self, y_enc, output, w1, w2):


"""Compute cost function.

y_enc : array, shape = (n_labels,


n_samples)
one-hot encoded class labels.

output : array, shape = [n_output_units,


n_samples]
Activation of the output layer
(feedforward)

w1 : array, shape = [n_hidden_units,


n_features]
Weight matrix for input layer -> hidden
layer.

w2 : array, shape = [n_output_units,


n_hidden_units]
Weight matrix for hidden layer -> output
layer.

Returns
---------
cost : float
Regularized cost.

"""
term1 = -y_enc * ([Link](output))
term2 = (1 - y_enc) * [Link](1 - output)
cost = [Link](term1 - term2)
L1_term = self._L1_reg(self.l1, w1, w2)
L2_term = self._L2_reg(self.l2, w1, w2)
cost = cost + L1_term + L2_term
return cost

def _get_gradient(self, a1, a2, a3, z2, y_enc,


w1, w2):
""" Compute gradient step using
backpropagation.

Parameters
------------
a1 : array, shape = [n_samples,
n_features+1]
Input values with bias unit.

a2 : array, shape = [n_hidden+1,


n_samples]
Activation of hidden layer.

a3 : array, shape = [n_output_units,


n_samples]
Activation of output layer.

z2 : array, shape = [n_hidden, n_samples]


Net input of hidden layer.

y_enc : array, shape = (n_labels,


n_samples)
one-hot encoded class labels.

w1 : array, shape = [n_hidden_units,


n_features]
Weight matrix for input layer -> hidden
layer.

w2 : array, shape = [n_output_units,


n_hidden_units]
Weight matrix for hidden layer -> output
layer.

Returns
---------

grad1 : array, shape = [n_hidden_units,


n_features]
Gradient of the weight matrix w1.

grad2 : array, shape = [n_output_units,


n_hidden_units]
Gradient of the weight matrix w2.

"""
# backpropagation
sigma3 = a3 - y_enc
z2 = self._add_bias_unit(z2, how='row')
sigma2 = [Link](sigma3) *
self._sigmoid_gradient(z2)
sigma2 = sigma2[1:, :]
grad1 = [Link](a1)
grad2 = [Link](a2.T)

# regularize
grad1[:, 1:] += (w1[:, 1:] * (self.l1 +
self.l2))
grad2[:, 1:] += (w2[:, 1:] * (self.l1 +
self.l2))

return grad1, grad2

def predict(self, X):


"""Predict class labels

Parameters
-----------
X : array, shape = [n_samples, n_features]
Input layer with original features.

Returns:
----------
y_pred : array, shape = [n_samples]
Predicted class labels.

"""
if len([Link]) != 2:
raise AttributeError('X must be a
[n_samples, n_features] array.\n'
'Use X[:,None]
for 1-feature classification,'
'\nor X[[i]] for
1-sample classification')

a1, z2, a2, z3, a3 = self._feedforward(X,


self.w1, self.w2)
y_pred = [Link](z3, axis=0)
return y_pred

def fit(self, X, y, print_progress=False):


""" Learn weights from training data.

Parameters
-----------
X : array, shape = [n_samples, n_features]
Input layer with original features.

y : array, shape = [n_samples]


Target class labels.

print_progress : bool (default: False)


Prints progress as the number of epochs
to stderr.

Returns:
----------
self

"""
self.cost_ = []
X_data, y_data = [Link](), [Link]()
y_enc = self._encode_labels(y,
self.n_output)

delta_w1_prev = [Link]([Link])
delta_w2_prev = [Link]([Link])

for i in range([Link]):

# adaptive learning rate


[Link] /= (1 +
self.decrease_const*i)

if print_progress:
[Link]('\rEpoch: %d/%d'
% (i+1, [Link]))
[Link]()

if [Link]:
idx =
[Link](y_data.shape[0])
X_data, y_data = X_data[idx],
y_data[idx]

mini =
np.array_split(range(y_data.shape[0]),
[Link])
for idx in mini:

# feedforward
a1, z2, a2, z3, a3 =
self._feedforward(X[idx], self.w1, self.w2)
cost =
self._get_cost(y_enc=y_enc[:, idx],
output=a3,
w1=self.w1,
w2=self.w2)
self.cost_.append(cost)

# compute gradient via


backpropagation
grad1, grad2 =
self._get_gradient(a1=a1, a2=a2,

a3=a3, z2=z2,

y_enc=y_enc[:, idx],

w1=self.w1,

w2=self.w2)

delta_w1, delta_w2 = [Link] *


grad1, [Link] * grad2
self.w1 -= (delta_w1 + ([Link]
* delta_w1_prev))
self.w2 -= (delta_w2 + ([Link]
* delta_w2_prev))
delta_w1_prev, delta_w2_prev =
delta_w1, delta_w2

return self
Training an artificial neural network
[back to top]

nn = NeuralNetMLP(n_output=10,
n_features=X_train.shape[1],
n_hidden=50,
l2=0.1,
l1=0.0,
epochs=1000,
eta=0.001,
alpha=0.001,
decrease_const=0.00001,
minibatches=50,
random_state=1)

[Link](X_train, y_train, print_progress=True)

Epoch: 1000/1000

<__main__.NeuralNetMLP at 0x109d527b8>

%matplotlib inline
import [Link] as plt
[Link](range(len(nn.cost_)), nn.cost_)
[Link]([0, 2000])
[Link]('Cost')
[Link]('Epochs * 50')
plt.tight_layout()
# [Link]('./figures/[Link]', dpi=300)
[Link]()

batches = np.array_split(range(len(nn.cost_)),
1000)
cost_ary = [Link](nn.cost_)
cost_avgs = [[Link](cost_ary[i]) for i in
batches]

[Link](range(len(cost_avgs)), cost_avgs,
color='red')
[Link]([0, 2000])
[Link]('Cost')
[Link]('Epochs')
plt.tight_layout()
[Link]('./figures/[Link]', dpi=300)
[Link]()
y_train_pred = [Link](X_train)
acc = [Link](y_train == y_train_pred, axis=0) /
X_train.shape[0]
print('Training accuracy: %.2f%%' % (acc * 100))

Training accuracy: 97.74%

y_test_pred = [Link](X_test)
acc = [Link](y_test == y_test_pred, axis=0) /
X_test.shape[0]
print('Training accuracy: %.2f%%' % (acc * 100))

Training accuracy: 96.18%

miscl_img = X_test[y_test != y_test_pred][:25]


correct_lab = y_test[y_test != y_test_pred][:25]
miscl_lab= y_test_pred[y_test != y_test_pred][:25]
fig, ax = [Link](nrows=5, ncols=5,
sharex=True, sharey=True,)
ax = [Link]()
for i in range(25):
img = miscl_img[i].reshape(28, 28)
ax[i].imshow(img, cmap='Greys',
interpolation='nearest')
ax[i].set_title('%d) t: %d p: %d' % (i+1,
correct_lab[i], miscl_lab[i]))

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# [Link]('./figures/mnist_miscl.png',
dpi=300)
[Link]()
Debugging neural networks with
gradient checking
[back to top]

import numpy as np
from [Link] import expit
import sys

class MLPGradientCheck(object):
""" Feedforward neural network / Multi-layer
perceptron classifier.

Parameters
------------
n_output : int
Number of output units, should be equal to
the
number of unique class labels.

n_features : int
Number of features (dimensions) in the
target dataset.
Should be equal to the number of columns in
the X array.

n_hidden : int (default: 30)


Number of hidden units.

l1 : float (default: 0.0)


Lambda value for L1-regularization.
No regularization if l1=0.0 (default)
l2 : float (default: 0.0)
Lambda value for L2-regularization.
No regularization if l2=0.0 (default)

epochs : int (default: 500)


Number of passes over the training set.

eta : float (default: 0.001)


Learning rate.

alpha : float (default: 0.0)


Momentum constant. Factor multiplied with
the
gradient of the previous epoch t-1 to
improve
learning speed
w(t) := w(t) - (grad(t) + alpha*grad(t-1))

decrease_const : float (default: 0.0)


Decrease constant. Shrinks the learning rate
after each epoch via eta / (1 +
epoch*decrease_const)

shuffle : bool (default: False)


Shuffles training data every epoch if True
to prevent circles.

minibatches : int (default: 1)


Divides training data into k minibatches for
efficiency.
Normal gradient descent learning if k=1
(default).

random_state : int (default: None)


Set random state for shuffling and
initializing the weights.
Attributes
-----------
cost_ : list
Sum of squared errors after each epoch.

"""
def __init__(self, n_output, n_features,
n_hidden=30,
l1=0.0, l2=0.0, epochs=500,
eta=0.001,
alpha=0.0, decrease_const=0.0,
shuffle=True,
minibatches=1,
random_state=None):

[Link](random_state)
self.n_output = n_output
self.n_features = n_features
self.n_hidden = n_hidden
self.w1, self.w2 =
self._initialize_weights()
self.l1 = l1
self.l2 = l2
[Link] = epochs
[Link] = eta
[Link] = alpha
self.decrease_const = decrease_const
[Link] = shuffle
[Link] = minibatches

def _encode_labels(self, y, k):


"""Encode labels into one-hot
representation

Parameters
------------
y : array, shape = [n_samples]
Target values.

Returns
-----------
onehot : array, shape = (n_labels,
n_samples)

"""
onehot = [Link]((k, [Link][0]))
for idx, val in enumerate(y):
onehot[val, idx] = 1.0
return onehot

def _initialize_weights(self):
"""Initialize weights with small random
numbers."""
w1 = [Link](-1.0, 1.0,
size=self.n_hidden*(self.n_features + 1))
w1 = [Link](self.n_hidden,
self.n_features + 1)
w2 = [Link](-1.0, 1.0,
size=self.n_output*(self.n_hidden + 1))
w2 = [Link](self.n_output,
self.n_hidden + 1)
return w1, w2

def _sigmoid(self, z):


"""Compute logistic function (sigmoid)

Uses [Link] to avoid overflow


error for very small input values z.

"""
# return 1.0 / (1.0 + [Link](-z))
return expit(z)
def _sigmoid_gradient(self, z):
"""Compute gradient of the logistic
function"""
sg = self._sigmoid(z)
return sg * (1 - sg)

def _add_bias_unit(self, X, how='column'):


"""Add bias unit (column or row of 1s) to
array at index 0"""
if how == 'column':
X_new = [Link](([Link][0],
[Link][1]+1))
X_new[:, 1:] = X
elif how == 'row':
X_new = [Link](([Link][0]+1,
[Link][1]))
X_new[1:, :] = X
else:
raise AttributeError('`how` must be
`column` or `row`')
return X_new

def _feedforward(self, X, w1, w2):


"""Compute feedforward step

Parameters
-----------
X : array, shape = [n_samples, n_features]
Input layer with original features.

w1 : array, shape = [n_hidden_units,


n_features]
Weight matrix for input layer -> hidden
layer.

w2 : array, shape = [n_output_units,


n_hidden_units]
Weight matrix for hidden layer -> output
layer.

Returns
----------
a1 : array, shape = [n_samples,
n_features+1]
Input values with bias unit.

z2 : array, shape = [n_hidden, n_samples]


Net input of hidden layer.

a2 : array, shape = [n_hidden+1,


n_samples]
Activation of hidden layer.

z3 : array, shape = [n_output_units,


n_samples]
Net input of output layer.

a3 : array, shape = [n_output_units,


n_samples]
Activation of output layer.

"""
a1 = self._add_bias_unit(X, how='column')
z2 = [Link](a1.T)
a2 = self._sigmoid(z2)
a2 = self._add_bias_unit(a2, how='row')
z3 = [Link](a2)
a3 = self._sigmoid(z3)
return a1, z2, a2, z3, a3

def _L2_reg(self, lambda_, w1, w2):


"""Compute L2-regularization cost"""
return (lambda_/2.0) * ([Link](w1[:, 1:]
** 2) + [Link](w2[:, 1:] ** 2))
def _L1_reg(self, lambda_, w1, w2):
"""Compute L1-regularization cost"""
return (lambda_/2.0) * ([Link](w1[:,
1:]).sum() + [Link](w2[:, 1:]).sum())

def _get_cost(self, y_enc, output, w1, w2):


"""Compute cost function.

y_enc : array, shape = (n_labels,


n_samples)
one-hot encoded class labels.

output : array, shape = [n_output_units,


n_samples]
Activation of the output layer
(feedforward)

w1 : array, shape = [n_hidden_units,


n_features]
Weight matrix for input layer -> hidden
layer.

w2 : array, shape = [n_output_units,


n_hidden_units]
Weight matrix for hidden layer -> output
layer.

Returns
---------
cost : float
Regularized cost.

"""
term1 = -y_enc * ([Link](output))
term2 = (1 - y_enc) * [Link](1 - output)
cost = [Link](term1 - term2)
L1_term = self._L1_reg(self.l1, w1, w2)
L2_term = self._L2_reg(self.l2, w1, w2)
cost = cost + L1_term + L2_term
return cost

def _get_gradient(self, a1, a2, a3, z2, y_enc,


w1, w2):
""" Compute gradient step using
backpropagation.

Parameters
------------
a1 : array, shape = [n_samples,
n_features+1]
Input values with bias unit.

a2 : array, shape = [n_hidden+1,


n_samples]
Activation of hidden layer.

a3 : array, shape = [n_output_units,


n_samples]
Activation of output layer.

z2 : array, shape = [n_hidden, n_samples]


Net input of hidden layer.

y_enc : array, shape = (n_labels,


n_samples)
one-hot encoded class labels.

w1 : array, shape = [n_hidden_units,


n_features]
Weight matrix for input layer -> hidden
layer.

w2 : array, shape = [n_output_units,


n_hidden_units]
Weight matrix for hidden layer -> output
layer.

Returns
---------

grad1 : array, shape = [n_hidden_units,


n_features]
Gradient of the weight matrix w1.

grad2 : array, shape = [n_output_units,


n_hidden_units]
Gradient of the weight matrix w2.

"""
# backpropagation
sigma3 = a3 - y_enc
z2 = self._add_bias_unit(z2, how='row')
sigma2 = [Link](sigma3) *
self._sigmoid_gradient(z2)
sigma2 = sigma2[1:, :]
grad1 = [Link](a1)
grad2 = [Link](a2.T)

# regularize
grad1[:, 1:] += (w1[:, 1:] * (self.l1 +
self.l2))
grad2[:, 1:] += (w2[:, 1:] * (self.l1 +
self.l2))

return grad1, grad2

def _gradient_checking(self, X, y_enc, w1, w2,


epsilon, grad1, grad2):
""" Apply gradient checking (for debugging
only)
Returns
---------
relative_error : float
Relative error between the numerically
approximated gradients and the
backpropagated gradients.

"""
num_grad1 = [Link]([Link](w1))
epsilon_ary1 = [Link]([Link](w1))
for i in range([Link][0]):
for j in range([Link][1]):
epsilon_ary1[i, j] = epsilon
a1, z2, a2, z3, a3 =
self._feedforward(X, w1 - epsilon_ary1, w2)
cost1 = self._get_cost(y_enc, a3,
w1-epsilon_ary1, w2)
a1, z2, a2, z3, a3 =
self._feedforward(X, w1 + epsilon_ary1, w2)
cost2 = self._get_cost(y_enc, a3,
w1 + epsilon_ary1, w2)
num_grad1[i, j] = (cost2 - cost1)
/ (2 * epsilon)
epsilon_ary1[i, j] = 0

num_grad2 = [Link]([Link](w2))
epsilon_ary2 = [Link]([Link](w2))
for i in range([Link][0]):
for j in range([Link][1]):
epsilon_ary2[i, j] = epsilon
a1, z2, a2, z3, a3 =
self._feedforward(X, w1, w2 - epsilon_ary2)
cost1 = self._get_cost(y_enc, a3,
w1, w2 - epsilon_ary2)
a1, z2, a2, z3, a3 =
self._feedforward(X, w1, w2 + epsilon_ary2)
cost2 = self._get_cost(y_enc, a3,
w1, w2 + epsilon_ary2)
num_grad2[i, j] = (cost2 - cost1)
/ (2 * epsilon)
epsilon_ary2[i, j] = 0

num_grad = [Link]((num_grad1.flatten(),
num_grad2.flatten()))
grad = [Link](([Link](),
[Link]()))
norm1 = [Link](num_grad - grad)
norm2 = [Link](num_grad)
norm3 = [Link](grad)
relative_error = norm1 / (norm2 + norm3)
return relative_error

def predict(self, X):


"""Predict class labels

Parameters
-----------
X : array, shape = [n_samples, n_features]
Input layer with original features.

Returns:
----------
y_pred : array, shape = [n_samples]
Predicted class labels.

"""
if len([Link]) != 2:
raise AttributeError('X must be a
[n_samples, n_features] array.\n'
'Use X[:,None]
for 1-feature classification,'
'\nor X[[i]] for
1-sample classification')
a1, z2, a2, z3, a3 = self._feedforward(X,
self.w1, self.w2)
y_pred = [Link](z3, axis=0)
return y_pred

def fit(self, X, y, print_progress=False):


""" Learn weights from training data.

Parameters
-----------
X : array, shape = [n_samples, n_features]
Input layer with original features.

y : array, shape = [n_samples]


Target class labels.

print_progress : bool (default: False)


Prints progress as the number of epochs
to stderr.

Returns:
----------
self

"""
self.cost_ = []
X_data, y_data = [Link](), [Link]()
y_enc = self._encode_labels(y,
self.n_output)

delta_w1_prev = [Link]([Link])
delta_w2_prev = [Link]([Link])

for i in range([Link]):

# adaptive learning rate


[Link] /= (1 +
self.decrease_const*i)

if print_progress:
[Link]('\rEpoch: %d/%d'
% (i+1, [Link]))
[Link]()

if [Link]:
idx =
[Link](y_data.shape[0])
X_data, y_data = X_data[idx],
y_data[idx]

mini =
np.array_split(range(y_data.shape[0]),
[Link])
for idx in mini:

# feedforward
a1, z2, a2, z3, a3 =
self._feedforward(X[idx], self.w1, self.w2)
cost =
self._get_cost(y_enc=y_enc[:, idx],
output=a3,
w1=self.w1,
w2=self.w2)
self.cost_.append(cost)

# compute gradient via


backpropagation
grad1, grad2 =
self._get_gradient(a1=a1, a2=a2,

a3=a3, z2=z2,

y_enc=y_enc[:, idx],
w1=self.w1,

w2=self.w2)

## start gradient checking


grad_diff =
self._gradient_checking(X=X[idx], y_enc=y_enc[:,
idx],

w1=self.w1, w2=self.w2,

epsilon=1e-5,

grad1=grad1, grad2=grad2)

if grad_diff <= 1e-7:


print('Ok: %s' % grad_diff)
elif grad_diff <= 1e-4:
print('Warning: %s' %
grad_diff)
else:
print('PROBLEM: %s' %
grad_diff)

# update weights; [alpha *


delta_w_prev] for momentum learning
delta_w1, delta_w2 = [Link] *
grad1, [Link] * grad2
self.w1 -= (delta_w1 + ([Link]
* delta_w1_prev))
self.w2 -= (delta_w2 + ([Link]
* delta_w2_prev))
delta_w1_prev, delta_w2_prev =
delta_w1, delta_w2

return self
nn_check = MLPGradientCheck(n_output=10,

n_features=X_train.shape[1],
n_hidden=10,
l2=0.0,
l1=0.0,
epochs=10,
eta=0.001,
alpha=0.0,
decrease_const=0.0,
minibatches=1,
random_state=1)

nn_check.fit(X_train[:5], y_train[:5],
print_progress=False)

Ok: 2.56712936241e-10
Ok: 2.94603251069e-10
Ok: 2.37615620231e-10
Ok: 2.43469423226e-10
Ok: 3.37872073158e-10
Ok: 3.63466384861e-10
Ok: 2.22472120785e-10
Ok: 2.33163708438e-10
Ok: 3.44653686551e-10
Ok: 2.17161707211e-10
<__main__.MLPGradientCheck at 0x10a13ab70>
Theano
A language in a language
Dealing with weights matrices and gradients can be tricky and
sometimes not trivial. Theano is a great framework for handling
vectors, matrices and high dimensional tensor algebra. Most of
this tutorial will refer to Theano however TensorFlow is another
great framework capable of providing an incredible abstraction
for complex algebra. More on TensorFlow in the next chapters.

import theano
import [Link] as T
Symbolic variables
Theano has it's own variables and functions, defined the
following

x = [Link]()

<TensorType(float64, scalar)>

Variables can be used in expressions

y = 3*(x**2) + 1

y is an expression now
Result is symbolic as well

type(y)
[Link]

Shape.0
printing
As we are about to see, normal printing isn't the best when it
comes to theano

print(y)

Elemwise{add,no_inplace}.0

[Link](y)

'((TensorConstant{3} * (<TensorType(float64,
scalar)> ** TensorConstant{2})) +
TensorConstant{1})'

[Link](y)

Elemwise{add,no_inplace} [id A] ''


|Elemwise{mul,no_inplace} [id B] ''
| |TensorConstant{3} [id C]
| |Elemwise{pow,no_inplace} [id D] ''
| |<TensorType(float64, scalar)> [id E]
| |TensorConstant{2} [id F]
|TensorConstant{1} [id G]
Evaluating expressions
Supply a dict mapping variables to values

[Link]({x: 2})

array(13.0)

Or compile a function

f = [Link]([x], y)

f(2)

array(13.0)
Other tensor types

X = [Link]()
X = [Link]()
X = T.tensor3()
X = T.tensor4()
Automatic differention
Gradients are free!

x = [Link]()
y = [Link](x)

gradient = [Link](y, x)
print(gradient)
print([Link]({x: 2}))
print((2 * gradient))

Elemwise{true_div}.0
0.5
Elemwise{mul,no_inplace}.0
Shared Variables
Symbolic + Storage

import numpy as np
x = [Link]([Link]((2, 3),
dtype=[Link]))

<TensorType(float64, matrix)>

We can get and set the variable's value

values = x.get_value()
print([Link])
print(values)

(2, 3)
[[ 0. 0. 0.]
[ 0. 0. 0.]]

x.set_value(values)
Shared variables can be used in expressions as well

(x + 2) ** 2

Elemwise{pow,no_inplace}.0

Their value is used as input when evaluating

((x + 2) ** 2).eval()

array([[ 4., 4., 4.],


[ 4., 4., 4.]])

[Link]([], (x + 2) ** 2)()

array([[ 4., 4., 4.],


[ 4., 4., 4.]])
Updates
Store results of function evalution
dict mapping shared variables to new values

count = [Link](0)
new_count = count + 1
updates = {count: new_count}

f = [Link]([], count, updates=updates)

f()

array(0)

f()

array(1)

f()

array(2)
Tensorflow
TensorFlow ([Link] is a software
library, developed by Google Brain Team within Google's
Machine Learning Intelligence research organization, for the
purposes of conducting machine learning and deep neural
network research.
TensorFlow combines the computational algebra of
compilation optimization techniques, making easy the
calculation of many mathematical expressions that would be
difficult to calculate, instead.
Tensorflow Main Features
Defining, optimizing, and efficiently calculating
mathematical expressions involving multi-dimensional
arrays (tensors).
Programming support of deep neural networks and
machine learning techniques.
Transparent use of GPU computing, automating
management and optimization of the same memory and the
data used. You can write the same code and run it either on
CPUs or GPUs. More specifically, TensorFlow will figure out
which parts of the computation should be moved to the
GPU.
High scalability of computation across machines and huge
data sets.
TensorFlow is available with Python and C++ support, but
the Python API is better supported and much easier to
learn.
Very Preliminary Example

# A simple calculation in Python


x = 1
y = x + 10
print(y)

11

import tensorflow as tf

# The ~same simple calculation in Tensorflow


x = [Link](1, name='x')
y = [Link](x+10, name='y')
print(y)

<[Link] 'y:0' shape=() dtype=int32_ref>

Meaning: "When the variable y is computed, take the value


of the constant x and add 10 to it"
Sessions and Models
To actually calculate the value of the y variable and to
evaluate expressions, we need to initialise the variables, and
then create a session where the actual computation happens

model = tf.global_variables_initializer() # model


is used by convention

with [Link]() as session:


[Link](model)
print([Link](y))

11
Data Flow Graph
(IDEA) _A Machine Learning application is the result of the
repeated computation of complex mathematical
expressions, thus we could describe this computation by
using a Data Flow Graph
Data Flow Graph: a graph where:
each Node represents the instance of a mathematical
operation
multiply , add , divide
each Edge is a multi-dimensional data set ( tensors )
on which the operations are performed.
Tensorflow Graph Model
Node: In TensorFlow, each node represents the instantion
of an operation.
Each operation has inputs ( >= 2 ) and outputs
>= 0 .
Edges: In TensorFlow, there are two types of edge:
Data Edges: They are carriers of data structures
( tensors ), where an output of one operation (from
one node) becomes the input for another operation.
Dependency Edges: These edges indicate a control
dependency between two nodes (i.e. "happens before"
relationship).
Let's suppose we have two nodes A and B and
a dependency edge connecting A to B . This
means that B will start its operation only when the
operation in A ends.
Tensorflow Graph Model (cont.)
Operation: This represents an abstract computation, such
as adding or multiplying matrices.
An operation manages tensors, and It can just be
polymorphic: the same operation can manipulate
different tensor element types.
For example, the addition of two int32 tensors, the
addition of two float tensors, and so on.
Kernel: This represents the concrete implementation of that
operation.
A kernel defines the implementation of the operation on
a particular device.
For example, an add matrix operation can have
a CPU implementation and a GPU one.
Tensorflow Graph Model Session
Session: When the client program has to establish
communication with the TensorFlow runtime system, a session
must be created.
As soon as the session is created for a client, an initial graph is
created and is empty. It has two fundamental methods:
[Link] : To be used during a computation,
requesting to add more operations (nodes) and edges
(data). The execution graph is then extended accordingly.
[Link] : The execution graphs are executed to get
the outputs (sometimes, subgraphs are executed
thousands/millions of times using run invocations).
Tensorboard
TensorBoard is a visualization tool, devoted to analyzing Data
Flow Graph and also to better understand the machine learning
models.
It can view different types of statistics about the parameters and
details of any part of a computer graph graphically. It often
happens that a graph of computation can be very complex.
Tensorboard Example
Run the TensorBoard Server:

tensorboard --logdir=/tmp/tf_logs

Open TensorBoard

Example

a = [Link](5, name="a")
b = [Link](45, name="b")
y = [Link](a+b*2, name='y')
model = tf.global_variables_initializer()

with [Link]() as session:


# Merge all the summaries collected in the
default graph.
merged = [Link].merge_all()

# Then we create `SummaryWriter`.


# It will write all the summaries (in this
case the execution graph)
# obtained from the code's execution into the
specified path”
writer =
[Link]("tmp/tf_logs_simple",
[Link])
[Link](model)
print([Link](y))
95
Data Types (Tensors)

One Dimensional Tensor (Vector)

import numpy as np
tensor_1d = [Link]([1, 2.5, 4.6, 5.75, 9.7])
tf_tensor=tf.convert_to_tensor(tensor_1d,dtype=tf.
float64)

with [Link]() as sess:


print([Link](tf_tensor))
print([Link](tf_tensor[0]))
print([Link](tf_tensor[2]))

[ 1. 2.5 4.6 5.75 9.7 ]


1.0
4.6
Two Dimensional Tensor (Matrix)

tensor_2d = [Link](16).reshape(4, 4)
print(tensor_2d)
tf_tensor = [Link](tf.float32, shape=(4,
4))
with [Link]() as sess:
print([Link](tf_tensor, feed_dict=
{tf_tensor: tensor_2d}))

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]
[ 12. 13. 14. 15.]]
Basic Operations (Examples)

matrix1 = [Link]([(2,2,2),(2,2,2),
(2,2,2)],dtype='float32')
matrix2 = [Link]([(1,1,1),(1,1,1),
(1,1,1)],dtype='float32')

tf_mat1 = [Link](matrix1)
tf_mat2 = [Link](matrix2)

matrix_product = [Link](tf_mat1, tf_mat2)


matrix_sum = [Link](tf_mat1, tf_mat2)

matrix_det = tf.matrix_determinant(matrix2)

with [Link]() as sess:


prod_res = [Link](matrix_product)
sum_res = [Link](matrix_sum)
det_res = [Link](matrix_det)

print("matrix1*matrix2 : \n", prod_res)


print("matrix1+matrix2 : \n", sum_res)
print("det(matrix2) : \n", det_res)
matrix1*matrix2 :
[[ 6. 6. 6.]
[ 6. 6. 6.]
[ 6. 6. 6.]]
matrix1+matrix2 :
[[ 3. 3. 3.]
[ 3. 3. 3.]
[ 3. 3. 3.]]
det(matrix2) :
0.0
Handling Tensors

%matplotlib inline

import [Link] as mp_image


filename = "img/[Link]"
input_image = mp_image.imread(filename)

#dimension
print('input dim = {}'.format(input_image.ndim))
#shape
print('input shape =
{}'.format(input_image.shape))

input dim = 3
input shape = (300, 300, 3)

import [Link] as plt


[Link](input_image)
[Link]()
Slicing

my_image = [Link]("uint8",[None,None,3])
slice = [Link](my_image,[10,0,0],[16,-1,-1])

with [Link]() as session:


result = [Link](slice,feed_dict=
{my_image: input_image})
print([Link])

(16, 300, 3)

[Link](result)
[Link]()
Transpose

x = [Link](input_image,name='x')
model = tf.global_variables_initializer()

with [Link]() as session:


x = [Link](x, perm=[1,0,2])
[Link](model)
result=[Link](x)

[Link](result)
[Link]()
Computing the Gradient
Gradients are free!

x = [Link](tf.float32)
y = [Link](x)
var_grad = [Link](y, x)
with [Link]() as session:
var_grad_val = [Link](var_grad,
feed_dict={x:2})
print(var_grad_val)

[0.5]
Why Tensorflow ?
On a typical system, there are multiple computing devices.
In TensorFlow, the supported device types are CPU and GPU.
They are represented as strings. For example:
"/cpu:0" : The CPU of your machine.
"/gpu:0" : The GPU of your machine, if you have one.
"/gpu:1" : The second GPU of your machine, etc.

If a TensorFlow operation has both CPU and GPU


implementations, the GPU devices will be given priority when
the operation is assigned to a device.
For example, matmul has both CPU and GPU kernels. On a
system with devices cpu:0 and gpu:0 , gpu:0 will be
selected to run matmul .

Example 1. Logging Device Placement


[Link](config=[Link](log_device_placement=True))

# Creates a graph.
a = [Link]([1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
shape=[2, 3], name='a')
b = [Link]([1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
shape=[3, 2], name='b')
c = [Link](a, b)
# Creates a session with log_device_placement set
to True.
sess =
[Link](config=[Link](log_device_placem
ent=True))
# Runs the op.
print([Link](c))

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device:
0, name: GeForce GTX 760, pci bus
id: [Link].0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
Using Multiple GPUs

# Creates a graph.
c = []
for d in ['/gpu:0', '/gpu:1']:
with [Link](d):
a = [Link]([1.0, 2.0, 3.0, 4.0, 5.0,
6.0], shape=[2, 3])
b = [Link]([1.0, 2.0, 3.0, 4.0, 5.0,
6.0], shape=[3, 2])
[Link]([Link](a, b))
with [Link]('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set
to True.
sess =
[Link](config=[Link](log_device_placem
ent=True))
# Runs the op.
print [Link](sum)

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device:
0, name: GeForce GTX 760, pci bus
id: [Link].0
/job:localhost/replica:0/task:0/gpu:1 -> device:
1, name: GeForce GTX 760, pci bus
id: [Link].0
Const_3: /job:localhost/replica:0/task:0/gpu:0
Const_2: /job:localhost/replica:0/task:0/gpu:0
MatMul_1: /job:localhost/replica:0/task:0/gpu:0
Const_1: /job:localhost/replica:0/task:0/gpu:1
Const: /job:localhost/replica:0/task:0/gpu:1
MatMul: /job:localhost/replica:0/task:0/gpu:1
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]
More on Tensorflow
Official Documentation
Keras: Deep Learning library for
Theano and TensorFlow
Keras is a minimalist, highly modular neural networks library,
written in Python and capable of running on top of either
TensorFlow or Theano.
It was developed with a focus on enabling fast
experimentation. Being able to go from idea to result with
the least possible delay is key to doing good research. ref:
[Link]

Kaggle Challenge Data


The Otto Group is one of the world’s biggest e-commerce
companies, A consistent analysis of the performance of
products is crucial. However, due to diverse global
infrastructure, many identical products get classified
differently. For this competition, we have provided a dataset
with 93 features for more than 200,000 products. The
objective is to build a predictive model which is able to
distinguish between our main product categories. Each row
corresponds to a single product. There are a total of 93
numerical features, which represent counts of different
events. All features have been obfuscated and will not be
defined any further.
[Link]
challenge/data
For this section we will use the Kaggle Otto Group
Challenge Data. You will find these data in
../data/kaggle_ottogroup/ folder.
Logistic Regression
This algorithm has nothing to do with the canonical linear
regression, but it is an algorithm that allows us to solve
problems of classification (supervised learning).
In fact, to estimate the dependent variable, now we make use
of the so-called logistic function or sigmoid.
It is precisely because of this feature we call this algorithm
logistic regression.
Data Preparation

from kaggle_data import load_data,


preprocess_data, preprocess_labels
import numpy as np
import [Link] as plt

Using TensorFlow backend.

X_train, labels =
load_data('../data/kaggle_ottogroup/[Link]',
train=True)
X_train, scaler = preprocess_data(X_train)
Y_train, encoder = preprocess_labels(labels)

X_test, ids =
load_data('../data/kaggle_ottogroup/[Link]',
train=False)
X_test, _ = preprocess_data(X_test, scaler)

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

dims = X_train.shape[1]
print(dims, 'dims')

9 classes
93 dims
[Link](labels)

array(['Class_1', 'Class_2', 'Class_3', 'Class_4',


'Class_5', 'Class_6',
'Class_7', 'Class_8', 'Class_9'],
dtype=object)

Y_train # one-hot encoding

array([[ 0., 0., 0., ..., 0., 0., 0.],


[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 1., ..., 0., 0., 0.]])
Using Theano

import theano as th
import [Link] as T

#Based on example from [Link]


rng = [Link]
N = 400
feats = 93
training_steps = 10

# Declare Theano symbolic variables


x = [Link]("x")
y = [Link]("y")
w = [Link]([Link](feats), name="w")
b = [Link](0., name="b")

# Construct Theano expression graph


p_1 = 1 / (1 + [Link](-[Link](x, w) - b)) #
Probability that target = 1
prediction = p_1 > 0.5 #
The prediction thresholded
xent = -y * [Link](p_1) - (1-y) * [Link](1-p_1) #
Cross-entropy loss function
cost = [Link]() + 0.01 * (w ** 2).sum() #
The cost to minimize
gw, gb = [Link](cost, [w, b]) #
Compute the gradient of the cost

# Compile
train = [Link](
inputs=[x,y],
outputs=[prediction, xent],
updates=((w, w - 0.1 * gw), (b, b - 0.1
* gb)),
allow_input_downcast=True)
predict = [Link](inputs=[x],
outputs=prediction, allow_input_downcast=True)

#Transform for class1


y_class1 = []
for i in Y_train:
y_class1.append(i[0])
y_class1 = [Link](y_class1)

# Train
for i in range(training_steps):
print('Epoch %s' % (i+1,))
pred, err = train(X_train, y_class1)

print("target values for Data:")


print(y_class1)
print("prediction on training set:")
print(predict(X_train))

Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
target values for Data:
[ 0. 0. 0. ..., 0. 0. 0.]
prediction on training set:
[ True True False ..., True True True]
Using Tensorflow

import tensorflow as tf

# Parameters
learning_rate = 0.01
training_epochs = 25
display_step = 1

# tf Graph Input
x = [Link]("float", [None, dims])
y = [Link]("float", [None, nb_classes])

<[Link] 'Placeholder:0' shape=(?, 93)


dtype=float32>

Model (Introducing Tensorboard)

# Construct (linear) model


with tf.name_scope("model") as scope:
# Set model weights
W = [Link]([Link]([dims, nb_classes]))
b = [Link]([Link]([nb_classes]))
activation = [Link]([Link](x, W) +
b) # Softmax

# Add summary ops to collect data


w_h =
[Link]("weights_histogram", W)
b_h =
[Link]("biases_histograms", b)
[Link]('mean_weights',
tf.reduce_mean(W))
[Link]('mean_bias',
tf.reduce_mean(b))

# Minimize error using cross entropy


# Note: More name scopes will clean up graph
representation
with tf.name_scope("cost_function") as scope:
cross_entropy = y*[Link](activation)
cost = tf.reduce_mean(-
tf.reduce_sum(cross_entropy,reduction_indices=1))
# Create a summary to monitor the cost
function
[Link]("cost_function", cost)
[Link]("cost_histogram", cost)

with tf.name_scope("train") as scope:


# Set the Optimizer
optimizer =
[Link](learning_rate).m
inimize(cost)

Accuracy

with tf.name_scope('Accuracy') as scope:


correct_prediction =
[Link]([Link](activation, 1), [Link](y,
1))
# Calculate accuracy
accuracy =
tf.reduce_mean([Link](correct_prediction,
"float"))
# Create a summary to monitor the cost
function
[Link]("accuracy", accuracy)

Learning in a TF Session

LOGDIR = "/tmp/logistic_logs"
import os, shutil
if [Link](LOGDIR):
[Link](LOGDIR)
[Link](LOGDIR)

# Plug TensorBoard Visualisation


writer = [Link](LOGDIR,
graph=tf.get_default_graph())

for var in
tf.get_collection([Link]):
print([Link])

summary_op = [Link].merge_all()
print('Summary Op: ' + summary_op)

model/weights_histogram:0
model/biases_histograms:0
model/mean_weights:0
model/mean_bias:0
cost_function/cost_function:0
cost_function/cost_histogram:0
Accuracy/accuracy:0
Tensor("add:0", shape=(), dtype=string)

# Launch the graph


with [Link]() as session:
# Initializing the variables
[Link](tf.global_variables_initializer())

cost_epochs = []
# Training cycle
for epoch in range(training_epochs):
_, summary, c = [Link](fetches=
[optimizer, summary_op, cost],
feed_dict={x:
X_train, y: Y_train})
cost_epochs.append(c)
writer.add_summary(summary=summary,
global_step=epoch)
print("accuracy epoch {}:{}".format(epoch,
[Link]({x: X_train, y: Y_train})))

print("Training phase finished")

#plotting
[Link](range(len(cost_epochs)), cost_epochs,
'o', label='Logistic Regression Training phase')
[Link]('cost')
[Link]('epoch')
[Link]()
[Link]()
prediction = [Link](activation, 1)
print([Link]({x: X_test}))

accuracy epoch 0:0.6649535894393921


accuracy epoch 1:0.665276825428009
accuracy epoch 2:0.6657131910324097
accuracy epoch 3:0.6659556031227112
accuracy epoch 4:0.6662949919700623
accuracy epoch 5:0.6666181683540344
accuracy epoch 6:0.6668121218681335
accuracy epoch 7:0.6671029925346375
accuracy epoch 8:0.6674585342407227
accuracy epoch 9:0.6678463816642761
accuracy epoch 10:0.6680726408958435
accuracy epoch 11:0.6682504415512085
accuracy epoch 12:0.6684605479240417
accuracy epoch 13:0.6687514185905457
accuracy epoch 14:0.6690422892570496
accuracy epoch 15:0.6692523956298828
accuracy epoch 16:0.6695109605789185
accuracy epoch 17:0.6697695255279541
accuracy epoch 18:0.6699796319007874
accuracy epoch 19:0.6702220439910889
accuracy epoch 20:0.6705452799797058
accuracy epoch 21:0.6708361506462097
accuracy epoch 22:0.6710785627365112
accuracy epoch 23:0.671385645866394
accuracy epoch 24:0.6716926693916321
Training phase finished
[1 5 5 ..., 2 1 1]

%%bash
python -m [Link] --
logdir=/tmp/logistic_logs

Process is terminated.
Using Keras

from [Link] import Sequential


from [Link] import Dense, Activation

dims = X_train.shape[1]
print(dims, 'dims')
print("Building model...")

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

model = Sequential()
[Link](Dense(nb_classes, input_shape=(dims,),
activation='sigmoid'))
[Link](Activation('softmax'))

[Link](optimizer='sgd',
loss='categorical_crossentropy')
[Link](X_train, Y_train)

93 dims
Building model...
9 classes
Epoch 1/10
61878/61878 [==============================] - 3s
- loss: 1.9845
Epoch 2/10
61878/61878 [==============================] - 2s
- loss: 1.8337
Epoch 3/10
61878/61878 [==============================] - 2s
- loss: 1.7779
Epoch 4/10
61878/61878 [==============================] - 3s
- loss: 1.7432
Epoch 5/10
61878/61878 [==============================] - 2s
- loss: 1.7187
Epoch 6/10
61878/61878 [==============================] - 3s
- loss: 1.7002
Epoch 7/10
61878/61878 [==============================] - 2s
- loss: 1.6857
Epoch 8/10
61878/61878 [==============================] - 2s
- loss: 1.6739
Epoch 9/10
61878/61878 [==============================] - 2s
- loss: 1.6642
Epoch 10/10
61878/61878 [==============================] - 2s
- loss: 1.6560

<[Link] at 0x123026dd8>

Simplicity is pretty impressive right? :)


Theano:
shape = (channels, rows, cols)
Tensorflow:
shape = (rows, cols, channels)

image_data_format : channels_last | channels_first

!cat ~/.keras/[Link]

"epsilon": 1e-07,

"backend": "tensorflow",

"floatx": "float32",

"image_data_format": "channels_last"

Now lets understand:

The core data structure of Keras is a


model, a way to organize layers. The main
type of model is the Sequential model, a
linear stack of layers.

What we did here is stacking a Fully Connected (Dense) layer


of trainable weights from the input to the output and an
Activation layer on top of the weights layer.
Dense

from [Link] import Dense

Dense(units, activation=None, use_bias=True,


kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)

units : int > 0.

init : name of initialization function for the weights of the


layer (see initializations), or alternatively, Theano function to
use for weights initialization. This parameter is only relevant
if you don't pass a weights argument.
activation : name of activation function to use (see
activations), or alternatively, elementwise Theano function.
If you don't specify anything, no activation is applied (ie.
"linear" activation: a(x) = x).
weights : list of Numpy arrays to set as initial weights.
The list should have 2 elements, of shape (input_dim,
output_dim) and (output_dim,) for weights and biases
respectively.
kernel_regularizer : instance of WeightRegularizer (eg.
L1 or L2 regularization), applied to the main weights matrix.
bias_regularizer : instance of WeightRegularizer,
applied to the bias.
activity_regularizer : instance of ActivityRegularizer,
applied to the network output.
kernel_constraint : instance of the constraints module
(eg. maxnorm, nonneg), applied to the main weights matrix.
bias_constraint : instance of the constraints module,
applied to the bias.
use_bias : whether to include a bias (i.e. make the layer
affine rather than linear).
(some) others [Link]
[Link]()
[Link](target_shape)
[Link](dims)

model = Sequential()
[Link](Permute((2, 1), input_shape=(10, 64)))
# now: model.output_shape == (None, 64, 10)
# note: `None` is the batch dimension

[Link](function, output_shape=None, argum


[Link](l1=0.0, l2=0.0)

Credits: Yam Peleg (@Yampeleg)

Activation

from [Link] import Activation


Activation(activation)

Supported Activations : [[Link]


Advanced Activations: [[Link]
activations/]

Optimizer
If you need to, you can further configure your optimizer. A core
principle of Keras is to make things reasonably simple, while
allowing the user to be fully in control when they need to (the
ultimate control being the easy extensibility of the source code).
Here we used SGD (stochastic gradient descent) as an
optimization algorithm for our trainable weights.
Source & Reference:
[Link]
_evaluation_optimizers.gif
"Data Sciencing" this example a
little bit more
What we did here is nice, however in the real world it is not
useable because of overfitting. Lets try and solve it with cross
validation.

Overfitting
In overfitting, a statistical model describes random error or
noise instead of the underlying relationship. Overfitting occurs
when a model is excessively complex, such as having too many
parameters relative to the number of observations.
A model that has been overfit has poor predictive performance,
as it overreacts to minor fluctuations in the training data.

To avoid overfitting, we will first split


out data to training set and test set and
test out model on the test set.
Next: we will use two of keras's callbacks
EarlyStopping and ModelCheckpoint
Let's see first the model we implemented

[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
dense_1 (Dense) (None, 9)
846
__________________________________________________
_______________
activation_1 (Activation) (None, 9)
0
==================================================
===============
Total params: 846
Trainable params: 846
Non-trainable params: 0
__________________________________________________
_______________

from sklearn.model_selection import


train_test_split
from [Link] import EarlyStopping,
ModelCheckpoint
X_train, X_val, Y_train, Y_val =
train_test_split(X_train, Y_train, test_size=0.15,
random_state=42)

fBestModel = 'best_model.h5'
early_stop = EarlyStopping(monitor='val_loss',
patience=2, verbose=1)
best_model = ModelCheckpoint(fBestModel,
verbose=0, save_best_only=True)

[Link](X_train, Y_train, validation_data =


(X_val, Y_val), epochs=50,
batch_size=128, verbose=True, callbacks=
[best_model, early_stop])

Train on 52596 samples, validate on 9282 samples


Epoch 1/50
52596/52596 [==============================] - 1s
- loss: 1.6516 - val_loss: 1.6513
Epoch 2/50
52596/52596 [==============================] - 0s
- loss: 1.6501 - val_loss: 1.6499
Epoch 3/50
52596/52596 [==============================] - 1s
- loss: 1.6488 - val_loss: 1.6486
Epoch 4/50
52596/52596 [==============================] - 1s
- loss: 1.6474 - val_loss: 1.6473
Epoch 5/50
52596/52596 [==============================] - 0s
- loss: 1.6462 - val_loss: 1.6461
Epoch 6/50
52596/52596 [==============================] - 0s
- loss: 1.6449 - val_loss: 1.6448
Epoch 7/50
52596/52596 [==============================] - 0s
- loss: 1.6437 - val_loss: 1.6437
Epoch 8/50
52596/52596 [==============================] - 0s
- loss: 1.6425 - val_loss: 1.6425
Epoch 9/50
52596/52596 [==============================] - 0s
- loss: 1.6414 - val_loss: 1.6414
Epoch 10/50
52596/52596 [==============================] - 0s
- loss: 1.6403 - val_loss: 1.6403
Epoch 11/50
52596/52596 [==============================] - 0s
- loss: 1.6392 - val_loss: 1.6393
Epoch 12/50
52596/52596 [==============================] - 0s
- loss: 1.6382 - val_loss: 1.6383
Epoch 13/50
52596/52596 [==============================] - 1s
- loss: 1.6372 - val_loss: 1.6373
Epoch 14/50
52596/52596 [==============================] - 0s
- loss: 1.6362 - val_loss: 1.6363
Epoch 15/50
52596/52596 [==============================] - 0s
- loss: 1.6352 - val_loss: 1.6354
Epoch 16/50
52596/52596 [==============================] - 0s
- loss: 1.6343 - val_loss: 1.6345
Epoch 17/50
52596/52596 [==============================] - 0s
- loss: 1.6334 - val_loss: 1.6336
Epoch 18/50
52596/52596 [==============================] - 0s
- loss: 1.6325 - val_loss: 1.6327
Epoch 19/50
52596/52596 [==============================] - 0s
- loss: 1.6316 - val_loss: 1.6319
Epoch 20/50
52596/52596 [==============================] - 0s
- loss: 1.6308 - val_loss: 1.6311
Epoch 21/50
52596/52596 [==============================] - 0s
- loss: 1.6300 - val_loss: 1.6303
Epoch 22/50
52596/52596 [==============================] - 0s
- loss: 1.6292 - val_loss: 1.6295
Epoch 23/50
52596/52596 [==============================] - 0s
- loss: 1.6284 - val_loss: 1.6287
Epoch 24/50
52596/52596 [==============================] - 0s
- loss: 1.6276 - val_loss: 1.6280
Epoch 25/50
52596/52596 [==============================] - 0s
- loss: 1.6269 - val_loss: 1.6273
Epoch 26/50
52596/52596 [==============================] - 0s
- loss: 1.6262 - val_loss: 1.6265
Epoch 27/50
52596/52596 [==============================] - 0s
- loss: 1.6254 - val_loss: 1.6258
Epoch 28/50
52596/52596 [==============================] - 0s
- loss: 1.6247 - val_loss: 1.6252
Epoch 29/50
52596/52596 [==============================] - 0s
- loss: 1.6241 - val_loss: 1.6245
Epoch 30/50
52596/52596 [==============================] - 0s
- loss: 1.6234 - val_loss: 1.6238
Epoch 31/50
52596/52596 [==============================] - 0s
- loss: 1.6227 - val_loss: 1.6232
Epoch 32/50
52596/52596 [==============================] - 0s
- loss: 1.6221 - val_loss: 1.6226
Epoch 33/50
52596/52596 [==============================] - 0s
- loss: 1.6215 - val_loss: 1.6220
Epoch 34/50
52596/52596 [==============================] - 1s
- loss: 1.6209 - val_loss: 1.6214
Epoch 35/50
52596/52596 [==============================] - 0s
- loss: 1.6203 - val_loss: 1.6208
Epoch 36/50
52596/52596 [==============================] - 0s
- loss: 1.6197 - val_loss: 1.6202
Epoch 37/50
52596/52596 [==============================] - 0s
- loss: 1.6191 - val_loss: 1.6197
Epoch 38/50
52596/52596 [==============================] - 0s
- loss: 1.6186 - val_loss: 1.6191
Epoch 39/50
52596/52596 [==============================] - 0s
- loss: 1.6180 - val_loss: 1.6186
Epoch 40/50
52596/52596 [==============================] - 0s
- loss: 1.6175 - val_loss: 1.6181
Epoch 41/50
52596/52596 [==============================] - 0s
- loss: 1.6170 - val_loss: 1.6175
Epoch 42/50
52596/52596 [==============================] - 0s
- loss: 1.6165 - val_loss: 1.6170
Epoch 43/50
52596/52596 [==============================] - 0s
- loss: 1.6160 - val_loss: 1.6166
Epoch 44/50
52596/52596 [==============================] - 0s
- loss: 1.6155 - val_loss: 1.6161
Epoch 45/50
52596/52596 [==============================] - 0s
- loss: 1.6150 - val_loss: 1.6156
Epoch 46/50
52596/52596 [==============================] - 0s
- loss: 1.6145 - val_loss: 1.6151
Epoch 47/50
52596/52596 [==============================] - 0s
- loss: 1.6141 - val_loss: 1.6147
Epoch 48/50
52596/52596 [==============================] - 0s
- loss: 1.6136 - val_loss: 1.6142
Epoch 49/50
52596/52596 [==============================] - 0s
- loss: 1.6132 - val_loss: 1.6138
Epoch 50/50
52596/52596 [==============================] - 0s
- loss: 1.6127 - val_loss: 1.6134

<[Link] at 0x11e7a2710>
Multi-Layer Fully Connected
Networks

Forward and Backward Propagation

Q: How hard can it be to build a Multi-Layer Fully-Connected


Network with keras?
A: It is basically the same, just add more layers!

model = Sequential()
[Link](Dense(100, input_shape=(dims,)))
[Link](Dense(nb_classes))
[Link](Activation('softmax'))
[Link](optimizer='sgd',
loss='categorical_crossentropy')
[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
dense_2 (Dense) (None, 100)
9400
__________________________________________________
_______________
dense_3 (Dense) (None, 9)
909
__________________________________________________
_______________
activation_2 (Activation) (None, 9)
0
==================================================
===============
Total params: 10,309
Trainable params: 10,309
Non-trainable params: 0
__________________________________________________
_______________

[Link](X_train, Y_train, validation_data =


(X_val, Y_val), epochs=20,
batch_size=128, verbose=True)
Train on 52596 samples, validate on 9282 samples
Epoch 1/20
52596/52596 [==============================] - 1s
- loss: 1.2113 - val_loss: 0.8824
Epoch 2/20
52596/52596 [==============================] - 0s
- loss: 0.8229 - val_loss: 0.7851
Epoch 3/20
52596/52596 [==============================] - 0s
- loss: 0.7623 - val_loss: 0.7470
Epoch 4/20
52596/52596 [==============================] - 1s
- loss: 0.7329 - val_loss: 0.7258
Epoch 5/20
52596/52596 [==============================] - 1s
- loss: 0.7143 - val_loss: 0.7107
Epoch 6/20
52596/52596 [==============================] - 0s
- loss: 0.7014 - val_loss: 0.7005
Epoch 7/20
52596/52596 [==============================] - 1s
- loss: 0.6918 - val_loss: 0.6922
Epoch 8/20
52596/52596 [==============================] - 0s
- loss: 0.6843 - val_loss: 0.6868
Epoch 9/20
52596/52596 [==============================] - 0s
- loss: 0.6784 - val_loss: 0.6817
Epoch 10/20
52596/52596 [==============================] - 0s
- loss: 0.6736 - val_loss: 0.6773
Epoch 11/20
52596/52596 [==============================] - 0s
- loss: 0.6695 - val_loss: 0.6739
Epoch 12/20
52596/52596 [==============================] - 1s
- loss: 0.6660 - val_loss: 0.6711
Epoch 13/20
52596/52596 [==============================] - 1s
- loss: 0.6631 - val_loss: 0.6688
Epoch 14/20
52596/52596 [==============================] - 1s
- loss: 0.6604 - val_loss: 0.6670
Epoch 15/20
52596/52596 [==============================] - 1s
- loss: 0.6582 - val_loss: 0.6649
Epoch 16/20
52596/52596 [==============================] - 1s
- loss: 0.6563 - val_loss: 0.6626
Epoch 17/20
52596/52596 [==============================] - 1s
- loss: 0.6545 - val_loss: 0.6611
Epoch 18/20
52596/52596 [==============================] - 1s
- loss: 0.6528 - val_loss: 0.6598
Epoch 19/20
52596/52596 [==============================] - 1s
- loss: 0.6514 - val_loss: 0.6578
Epoch 20/20
52596/52596 [==============================] - 1s
- loss: 0.6500 - val_loss: 0.6571

<[Link] at 0x12830b978>
Your Turn!

Hands On - Keras Fully Connected


Take couple of minutes and try to play with the number of layers
and the number of parameters in the layers to get the best
results.

model = Sequential()
[Link](Dense(100, input_shape=(dims,)))

# ...
# ...
# Play with it! add as much layers as you want!
try and get better results.

[Link](Dense(nb_classes))
[Link](Activation('softmax'))
[Link](optimizer='sgd',
loss='categorical_crossentropy')

[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
dense_4 (Dense) (None, 100)
9400
__________________________________________________
_______________
dense_5 (Dense) (None, 9)
909
__________________________________________________
_______________
activation_3 (Activation) (None, 9)
0
==================================================
===============
Total params: 10,309
Trainable params: 10,309
Non-trainable params: 0
__________________________________________________
_______________

[Link](X_train, Y_train, validation_data =


(X_val, Y_val), epochs=20,
batch_size=128, verbose=True)

Train on 52596 samples, validate on 9282 samples


Epoch 1/20
52596/52596 [==============================] - 1s
- loss: 1.2107 - val_loss: 0.8821
Epoch 2/20
52596/52596 [==============================] - 1s
- loss: 0.8204 - val_loss: 0.7798
Epoch 3/20
52596/52596 [==============================] - 1s
- loss: 0.7577 - val_loss: 0.7393
Epoch 4/20
52596/52596 [==============================] - 0s
- loss: 0.7280 - val_loss: 0.7176
Epoch 5/20
52596/52596 [==============================] - 1s
- loss: 0.7097 - val_loss: 0.7028
Epoch 6/20
52596/52596 [==============================] - 1s
- loss: 0.6973 - val_loss: 0.6929
Epoch 7/20
52596/52596 [==============================] - 1s
- loss: 0.6883 - val_loss: 0.6858
Epoch 8/20
52596/52596 [==============================] - 1s
- loss: 0.6813 - val_loss: 0.6804
Epoch 9/20
52596/52596 [==============================] - 1s
- loss: 0.6757 - val_loss: 0.6756
Epoch 10/20
52596/52596 [==============================] - 1s
- loss: 0.6711 - val_loss: 0.6722
Epoch 11/20
52596/52596 [==============================] - 1s
- loss: 0.6672 - val_loss: 0.6692
Epoch 12/20
52596/52596 [==============================] - 0s
- loss: 0.6641 - val_loss: 0.6667
Epoch 13/20
52596/52596 [==============================] - 0s
- loss: 0.6613 - val_loss: 0.6636
Epoch 14/20
52596/52596 [==============================] - 0s
- loss: 0.6589 - val_loss: 0.6620
Epoch 15/20
52596/52596 [==============================] - 0s
- loss: 0.6568 - val_loss: 0.6606
Epoch 16/20
52596/52596 [==============================] - 0s
- loss: 0.6546 - val_loss: 0.6589
Epoch 17/20
52596/52596 [==============================] - 0s
- loss: 0.6531 - val_loss: 0.6577
Epoch 18/20
52596/52596 [==============================] - 0s
- loss: 0.6515 - val_loss: 0.6568
Epoch 19/20
52596/52596 [==============================] - 0s
- loss: 0.6501 - val_loss: 0.6546
Epoch 20/20
52596/52596 [==============================] - 0s
- loss: 0.6489 - val_loss: 0.6539

<[Link] at 0x1285bae80>

Building a question answering system, an image classification


model, a Neural Turing Machine, a word2vec embedder or any
other model is just as fast. The ideas behind deep learning are
simple, so why should their implementation be painful?

Theoretical Motivations for depth


Much has been studied about the depth of neural nets. Is
has been proven mathematically[1] and empirically that
convolutional neural network benifit from depth!
[1] - On the Expressive Power of Deep Learning: A Tensor
Analysis - Cohen, et al 2015

Theoretical Motivations for depth


One much quoted theorem about neural network states that:
Universal approximation theorem states[1] that a feed-
forward network with a single hidden layer containing a finite
number of neurons (i.e., a multilayer perceptron), can
approximate continuous functions on compact subsets of
$\mathbb{R}^n$, under mild assumptions on the activation
function. The theorem thus states that simple neural
networks can represent a wide variety of interesting
functions when given appropriate parameters; however, it
does not touch upon the algorithmic learnability of those
parameters.
[1] - Approximation Capabilities of Multilayer Feedforward
Networks - Kurt Hornik 1991
Addendum
2.3.1 Keras Backend
Keras Backend
In this notebook we will be using the Keras backend module,
which provides an abstraction over both Theano and
Tensorflow.
Let's try to re-implement the Logistic Regression Model using
the [Link] APIs.

The following code will look like very similar to what we would
write in Theano or Tensorflow (with the only difference that it
may run on both the two backends).

import [Link] as K
import numpy as np
import [Link] as plt

%matplotlib inline

Using TensorFlow backend.

from kaggle_data import load_data,


preprocess_data, preprocess_labels

X_train, labels =
load_data('../data/kaggle_ottogroup/[Link]',
train=True)
X_train, scaler = preprocess_data(X_train)
Y_train, encoder = preprocess_labels(labels)

X_test, ids =
load_data('../data/kaggle_ottogroup/[Link]',
train=False)

X_test, _ = preprocess_data(X_test, scaler)

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

dims = X_train.shape[1]
print(dims, 'dims')

9 classes
93 dims

feats = dims
training_steps = 25

x = [Link](dtype="float",
shape=X_train.shape)
target = [Link](dtype="float",
shape=Y_train.shape)

# Set model weights


W = [Link]([Link](dims, nb_classes))
b = [Link]([Link](nb_classes))

# Define model and loss


y = [Link](x, W) + b
loss = K.categorical_crossentropy(y, target)

activation = [Link](y) # Softmax

lr = [Link](0.01)
grads = [Link](loss, [W,b])
updates = [(W, W-lr*grads[0]), (b, b-lr*grads[1])]

train = [Link](inputs=[x, target], outputs=


[loss], updates=updates)

# Training
loss_history = []
for epoch in range(training_steps):
current_loss = train([X_train, Y_train])[0]
loss_history.append(current_loss)
if epoch % 20 == 0:
print("Loss: {}".format(current_loss))

Loss: [ 2.13178873 1.99579716 3.72429109 ...,


2.75165343 2.29350972
1.77051127]
Loss: [ 2.95424724 0.10998608 1.07148504 ...,
0.23925911 2.9478302
2.90452051]
loss_history = [[Link](lh) for lh in
loss_history]

# plotting
[Link](range(len(loss_history)), loss_history,
'o', label='Logistic Regression Training phase')
[Link]('cost')
[Link]('epoch')
[Link]()
[Link]()
Your Turn
Please switch to the Theano backend and restart the
notebook.
You should see no difference in the execution!
Reminder: please keep in mind that you can execute shell
commands from a notebook (pre-pending a ! sign). Thus:

!cat ~/.keras/[Link]

should show you the content of your keras configuration file.

Moreover
Try to play a bit with the learning reate parameter to see how
the loss history floats...
Exercise: Linear Regression
To get familiar with automatic differentiation, we start by
learning a simple linear regression model using Stochastic
Gradient Descent (SGD).
Recall that given a dataset ${(xi, y_i)}{i=0}^N$, with $x_i, y_i \in
\mathbb{R}$, the objective of linear regression is to find two
scalars $w$ and $b$ such that $y = w\cdot x + b$ fits the
dataset. In this tutorial we will learn $w$ and $b$ using SGD
and a Mean Square Error (MSE) loss:
$$\mathcal{l} = \frac{1}{N} \sum_{i=0}^N (w\cdot x_i + b -
y_i)^2$$
Starting from random values, parameters $w$ and $b$ will be
updated at each iteration via the following rule:
$$wt = w{t-1} - \eta \frac{\partial \mathcal{l}}{\partial w}$$
$$bt = b{t-1} - \eta \frac{\partial \mathcal{l}}{\partial b}$$
where $\eta$ is the learning rate.
NOTE: Recall that linear regression is indeed a simple
neuron with a linear activation function!!

Definition: Placeholders and Variables


First of all, we define the necessary variables and placeholders
for our computational graph. Variables maintain state across
executions of the computational graph, while placeholders are
ways to feed the graph with external data.
For the linear regression example, we need three variables:
w , b , and the learning rate for SGD, lr .

Two placeholders x and target are created to store $x_i$


and $y_i$ values.

# Placeholders and variables


x = [Link]()
target = [Link]()
w = [Link]([Link]())
b = [Link]([Link]())

Notes:
In case you're wondering what's the difference between a
placeholder and a variable, in short:
Use [Link]() for trainable variables such as
weights ( W ) and biases ( b ) for your model.
Use [Link]() to feed actual data (e.g. training
examples)
Model definition
Now we can define the $y = w\cdot x + b$ relation as well as
the MSE loss in the computational graph.

# Define model and loss

# %load ../solutions/sol_2311.py

Then, given the gradient of MSE wrt to w and b , we can


define how we update the parameters via SGD:

# %load ../solutions/sol_2312.py

The whole model can be encapsulated in a function , which


takes as input x and target , returns the current loss value
and updates its parameter according to updates .

train = [Link](inputs=[x, target], outputs=


[loss], updates=updates)
Training
Training is now just a matter of calling the function we have
just defined. Each time train is called, indeed, w and b
will be updated using the SGD rule.
Having generated some random training data, we will feed the
train function for several epochs and observe the values of
w , b , and loss.

# Generate data
np_x = [Link](1000)
np_target = 0.96*np_x + 0.24

# Training
loss_history = []
for epoch in range(200):
current_loss = train([np_x, np_target])[0]
loss_history.append(current_loss)
if epoch % 20 == 0:
print("Loss: %.03f, w, b: [%.02f, %.02f]"
% (current_loss, [Link](w), [Link](b)))

We can also plot the loss history:

# Plot loss history

# %load ../solutions/sol_2313.py
Final Note:
Please switch back your backend to tensorflow before
moving on. It may be useful for next notebooks !-)
MNIST Dataset
Also known as digits if you're familiar with
sklearn :

from [Link] import digits


Problem Definition
Recognize handwritten digits
Data
The MNIST database (link) has a database of handwritten
digits.
The training set has $60,000$ samples. The test set has
$10,000$ samples.
The digits are size-normalized and centered in a fixed-size
image.
The data page has description on how the data was collected. It
also has reports the benchmark of various algorithms on the
test dataset.

Load the data


The data is available in the repo's data folder. Let's load that
using the keras library.

For now, let's load the data and see how it looks.

import numpy as np
import keras
from [Link] import mnist

# Load the datasets


(X_train, y_train), (X_test, y_test) =
mnist.load_data()
Basic data analysis on the dataset

# What is the type of X_train?

# What is the type of y_train?

# Find number of observations in training data

# Find number of observations in test data

# Display first 2 records of X_train

# Display the first 10 records of y_train

# Find the number of observations for each digit


in the y_train dataset

# Find the number of observations for each digit


in the y_test dataset
# What is the dimension of X_train?. What does
that mean?

Display Images
Let's now display some of the images and see how they look
We will be using matplotlib library for displaying the image

from matplotlib import pyplot


import matplotlib as mpl
%matplotlib inline

# Displaying the first training data

fig = [Link]()
ax = fig.add_subplot(1,1,1)
imgplot = [Link](X_train[0], cmap=[Link])
imgplot.set_interpolation('nearest')
[Link].set_ticks_position('top')
[Link].set_ticks_position('left')
[Link]()

# Let's now display the 11th record


Fully Connected Feed-Forward
Network
In this notebook we will play with Feed-Forward FC-NN (Fully
Connected Neural Network) for a classification task:
Image Classification on MNIST Dataset
RECALL
In the FC-NN, the output of each layer is computed using the
activations from the previous one, as follows:
$$h{i} = \sigma(W_i h{i-1} + b_i)$$
where ${h}_i$ is the activation vector from the $i$-th layer (or
the input data for $i=0$), ${W}_i$ and ${b}_i$ are the weight
matrix and the bias vector for the $i$-th layer, respectively.
$\sigma(\cdot)$ is the activation function. In our example, we
will use the ReLU activation function for the hidden layers and
softmax for the last layer.
To regularize the model, we will also insert a Dropout layer
between consecutive hidden layers.
Dropout works by “dropping out” some unit activations in a
given layer, that is setting them to zero with a given probability.
Our loss function will be the categorical crossentropy.
Model definition
Keras supports two different kind of models: the Sequential
model and the Graph model. The former is used to build linear
stacks of layer (so each layer has one input and one output),
and the latter supports any kind of connection graph.
In our case we build a Sequential model with three Dense (aka
fully connected) layers, with some Dropout. Notice that the
output layer has the softmax activation function.
The resulting model is actually a function of its own inputs
implemented using the Keras backend.
We apply the binary crossentropy loss and choose SGD as the
optimizer.
Please remind that Keras supports a variety of different
optimizers and loss functions, which you may want to check
out.

import numpy as np
import [Link] as plt

%matplotlib inline
Introducing ReLU
The ReLu function is defined as $f(x) = \max(0, x),$ [1]
A smooth approximation to the rectifier is the analytic function:
$f(x) = \ln(1 + e^x)$
which is called the softplus function.
The derivative of softplus is $f'(x) = e^x / (e^x + 1) = 1 / (1 + e^{-
x})$, i.e. the logistic function.
[1] [Link] by G. E.
Hinton

Note: Keep in mind this function as it is heavily


used in CNN

from [Link] import Sequential


from [Link] import Dense
from [Link] import SGD

nb_classes = 10

# FC@512+relu -> FC@512+relu ->


FC@nb_classes+softmax
# ... your Code Here

# %load ../solutions/sol_321.py
from [Link] import Sequential
from [Link] import Dense
from [Link] import SGD

model = Sequential()
[Link](Dense(512, activation='relu',
input_shape=(784,)))
[Link](Dense(512, activation='relu'))
[Link](Dense(10, activation='softmax'))

[Link](loss='categorical_crossentropy',
optimizer=SGD(lr=0.001),
metrics=['accuracy'])
Data preparation ( [Link] )
We will train our model on the MNIST dataset, which consists of
60,000 28x28 grayscale images of the 10 digits, along with a
test set of 10,000 images.

Since this dataset is provided with Keras, we just ask the


[Link] model for training and test data.

We will:
download the data
reshape data to be in vectorial form (original data are
images)
normalize between 0 and 1.
The binary_crossentropy loss expects a one-hot-vector as
input, therefore we apply the to_categorical function from
[Link] to convert integer labels to one-hot-vectors.

from [Link] import mnist


from [Link] import np_utils

(X_train, y_train), (X_test, y_test) =


mnist.load_data()

X_train.shape

(60000, 28, 28)

X_train = X_train.reshape(60000, 784)


X_test = X_test.reshape(10000, 784)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")

# Put everything on grayscale


X_train /= 255
X_test /= 255

# convert class vectors to binary class matrices


Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

Split Training and Validation Data


from sklearn.model_selection import
train_test_split

X_train, X_val, Y_train, Y_val =


train_test_split(X_train, Y_train)

X_train[0].shape

(784,)

[Link](X_train[0].reshape(28, 28))

<[Link] at 0x7f7f8cea6438>

print([Link](range(10)))
print(Y_train[0].astype('int'))
[0 1 2 3 4 5 6 7 8 9]
[0 0 0 0 0 1 0 0 0 0]

[Link](X_val[0].reshape(28, 28))

<[Link] at 0x7f7f8ce4f9b0>

print([Link](range(10)))
print(Y_val[0].astype('int'))

[0 1 2 3 4 5 6 7 8 9]
[0 0 0 0 1 0 0 0 0 0]
Training
Having defined and compiled the model, it can be trained using
the fit function. We also specify a validation dataset to
monitor validation loss and accuracy.

network_history = [Link](X_train, Y_train,


batch_size=128,
epochs=2, verbose=1,
validation_data=(X_val, Y_val))

Train on 45000 samples, validate on 15000 samples


Epoch 1/2
45000/45000 [==============================] - 1s
- loss: 2.1743 - acc: 0.2946 - val_loss: 2.0402 -
val_acc: 0.5123
Epoch 2/2
45000/45000 [==============================] - 1s
- loss: 1.9111 - acc: 0.6254 - val_loss: 1.7829 -
val_acc: 0.6876

Plotting Network Performance Trend


The return value of the fit function is a
[Link] object which contains the entire
history of training/validation loss and accuracy, for each epoch.
We can therefore plot the behaviour of loss and accuracy
during the training phase.
import [Link] as plt
%matplotlib inline

def plot_history(network_history):
[Link]()
[Link]('Epochs')
[Link]('Loss')
[Link](network_history.history['loss'])
[Link](network_history.history['val_loss'])
[Link](['Training', 'Validation'])

[Link]()
[Link]('Epochs')
[Link]('Accuracy')
[Link](network_history.history['acc'])
[Link](network_history.history['val_acc'])
[Link](['Training', 'Validation'],
loc='lower right')
[Link]()

plot_history(network_history)
After 2 epochs, we get a ~88% validation accuracy.

If you increase the number of epochs, you will get definitely


better results.

Quick Exercise:
Try increasing the number of epochs (if you're hardware allows
to)

# Your code here


[Link](loss='categorical_crossentropy',
optimizer=SGD(lr=0.001),
metrics=['accuracy'])
network_history = [Link](X_train, Y_train,
batch_size=128,
epochs=2, verbose=1,
validation_data=(X_val, Y_val))

Train on 45000 samples, validate on 15000 samples


Epoch 1/2
45000/45000 [==============================] - 2s
- loss: 0.8966 - acc: 0.8258 - val_loss: 0.8463 -
val_acc: 0.8299
Epoch 2/2
45000/45000 [==============================] - 1s
- loss: 0.8005 - acc: 0.8370 - val_loss: 0.7634 -
val_acc: 0.8382
Introducing the Dropout Layer
The dropout layers have the very specific function to drop out
a random set of activations in that layers by setting them to
zero in the forward pass. Simple as that.
It allows to avoid overfitting but has to be used only at training
time and not at test time.

[Link](rate, noise_shape=None,
seed=None)

Applies Dropout to the input.


Dropout consists in randomly setting a fraction rate of input
units to 0 at each update during training time, which helps
prevent overfitting.
Arguments
rate: float between 0 and 1. Fraction of the input units to
drop.
noise_shape: 1D integer tensor representing the shape of
the binary dropout mask that will be multiplied with the
input. For instance, if your inputs have shape (batch_size,
timesteps, features) and you want the dropout mask to be
the same for all timesteps, you can use noise_shape=
(batch_size, 1, features).
seed: A Python integer to use as random seed.
Note Keras guarantess automatically that this layer is not used
in Inference (i.e. Prediction) phase (thus only used in training
as it should be!)
See [Link].in_train_phase function

from [Link] import Dropout

## Pls note **where** the `K.in_train_phase` is


actually called!!
Dropout??

from keras import backend as K

K.in_train_phase?

Exercise:
Try modifying the previous example network adding a Dropout
layer:

from [Link] import Dropout

# FC@512+relu -> DropOut(0.2) -> FC@512+relu ->


DropOut(0.2) -> FC@nb_classes+softmax
# ... your Code Here

# %load ../solutions/sol_312.py
network_history = [Link](X_train, Y_train,
batch_size=128,
epochs=4, verbose=1,
validation_data=(X_val, Y_val))
plot_history(network_history)

Train on 45000 samples, validate on 15000 samples


Epoch 1/4
45000/45000 [==============================] - 2s
- loss: 1.3746 - acc: 0.6348 - val_loss: 0.6917 -
val_acc: 0.8418
Epoch 2/4
45000/45000 [==============================] - 2s
- loss: 0.6235 - acc: 0.8268 - val_loss: 0.4541 -
val_acc: 0.8795
Epoch 3/4
45000/45000 [==============================] - 1s
- loss: 0.4827 - acc: 0.8607 - val_loss: 0.3795 -
val_acc: 0.8974
Epoch 4/4
45000/45000 [==============================] - 1s
- loss: 0.4218 - acc: 0.8781 - val_loss: 0.3402 -
val_acc: 0.9055
If you continue training, at some point the validation loss will
start to increase: that is when the model starts to overfit.
It is always necessary to monitor training and validation loss
during the training of any kind of Neural Network, either to
detect overfitting or to evaluate the behaviour of the model (any
clue on how to do it??)

# %load solutions/[Link]
from [Link] import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss',
patience=4, verbose=1)

model = Sequential()
[Link](Dense(512, activation='relu',
input_shape=(784,)))
[Link](Dropout(0.2))
[Link](Dense(512, activation='relu'))
[Link](Dropout(0.2))
[Link](Dense(10, activation='softmax'))

[Link](loss='categorical_crossentropy',
optimizer=SGD(),
metrics=['accuracy'])

[Link](X_train, Y_train, validation_data =


(X_test, Y_test), epochs=100,
batch_size=128, verbose=True, callbacks=
[early_stop])
Inspecting Layers

# We already used `summary`


[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
dense_4 (Dense) (None, 512)
401920
__________________________________________________
_______________
dropout_1 (Dropout) (None, 512)
0
__________________________________________________
_______________
dense_5 (Dense) (None, 512)
262656
__________________________________________________
_______________
dropout_2 (Dropout) (None, 512)
0
__________________________________________________
_______________
dense_6 (Dense) (None, 10)
5130
==================================================
===============
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
__________________________________________________
_______________

[Link] is iterable

print('Model Input Tensors: ', [Link],


end='\n\n')
print('Layers - Network Configuration:',
end='\n\n')
for layer in [Link]:
print([Link], [Link])
print('Layer Configuration:')
print(layer.get_config(),
end='\n{}\n'.format('----'*10))
print('Model Output Tensors: ', [Link])

Model Input Tensors: Tensor("dense_4_input:0",


shape=(?, 784), dtype=float32)

Layers - Network Configuration:

dense_4 True
Layer Configuration:
{'batch_input_shape': (None, 784), 'name':
'dense_4', 'units': 512, 'bias_regularizer': None,
'bias_initializer': {'config': {}, 'class_name':
'Zeros'}, 'trainable': True, 'activation': 'relu',
'use_bias': True, 'bias_constraint': None,
'activity_regularizer': None,
'kernel_regularizer': None, 'kernel_constraint':
None, 'kernel_initializer': {'config': {'seed':
None, 'mode': 'fan_avg', 'scale': 1.0,
'distribution': 'uniform'}, 'class_name':
'VarianceScaling'}, 'dtype': 'float32'}
----------------------------------------
dropout_1 True
Layer Configuration:
{'name': 'dropout_1', 'rate': 0.2, 'trainable':
True}
----------------------------------------
dense_5 True
Layer Configuration:
{'kernel_regularizer': None, 'units': 512,
'bias_regularizer': None, 'bias_initializer':
{'config': {}, 'class_name': 'Zeros'},
'trainable': True, 'activation': 'relu',
'bias_constraint': None, 'activity_regularizer':
None, 'name': 'dense_5', 'kernel_constraint':
None, 'kernel_initializer': {'config': {'seed':
None, 'mode': 'fan_avg', 'scale': 1.0,
'distribution': 'uniform'}, 'class_name':
'VarianceScaling'}, 'use_bias': True}
----------------------------------------
dropout_2 True
Layer Configuration:
{'name': 'dropout_2', 'rate': 0.2, 'trainable':
True}
----------------------------------------
dense_6 True
Layer Configuration:
{'kernel_regularizer': None, 'units': 10,
'bias_regularizer': None, 'bias_initializer':
{'config': {}, 'class_name': 'Zeros'},
'trainable': True, 'activation': 'softmax',
'bias_constraint': None, 'activity_regularizer':
None, 'name': 'dense_6', 'kernel_constraint':
None, 'kernel_initializer': {'config': {'seed':
None, 'mode': 'fan_avg', 'scale': 1.0,
'distribution': 'uniform'}, 'class_name':
'VarianceScaling'}, 'use_bias': True}
----------------------------------------
Model Output Tensors: Tensor("dense_6/Softmax:0",
shape=(?, 10), dtype=float32)
Extract hidden layer representation of
the given data
One simple way to do it is to use the weights of your model to
build a new model that's truncated at the layer you want to
read.
Then you can run the ._predict(X_batch) method to get the
activations for a batch of inputs.

model_truncated = Sequential()
model_truncated.add(Dense(512, activation='relu',
input_shape=(784,)))
model_truncated.add(Dropout(0.2))
model_truncated.add(Dense(512, activation='relu'))

for i, layer in enumerate(model_truncated.layers):

layer.set_weights([Link][i].get_weights())

model_truncated.compile(loss='categorical_crossent
ropy', optimizer=SGD(),
metrics=['accuracy'])

# Check
[Link](model_truncated.layers[0].get_weights()[0]
== [Link][0].get_weights()[0])

True
hidden_features = model_truncated.predict(X_train)

hidden_features.shape

(45000, 512)

X_train.shape

(45000, 784)

Hint: Alternative Method to get activations


(Using [Link] function on Tensors)

def get_activations(model, layer, X_batch):


activations_f =
[Link]([[Link][0].input,
K.learning_phase()], [[Link],])
activations = activations_f((X_batch, False))
return activations

Generate the Embedding of Hidden Features


from [Link] import TSNE

tsne = TSNE(n_components=2)
X_tsne =
tsne.fit_transform(hidden_features[:1000]) ##
Reduced for computational issues

colors_map = [Link](Y_train, axis=1)

X_tsne.shape

(1000, 2)

nb_classes

10

[Link](colors_map==6)

(array([ 1, 30, 62, 73, 86, 88, 89, 109,


112, 114, 123, 132, 134,
137, 150, 165, 173, 175, 179, 215, 216,
217, 224, 235, 242, 248,
250, 256, 282, 302, 303, 304, 332, 343,
352, 369, 386, 396, 397,
434, 444, 456, 481, 493, 495, 496, 522,
524, 527, 544, 558, 571,
595, 618, 625, 634, 646, 652, 657, 666,
672, 673, 676, 714, 720,
727, 732, 737, 796, 812, 813, 824, 828,
837, 842, 848, 851, 854,
867, 869, 886, 894, 903, 931, 934, 941,
950, 956, 970, 972, 974, 988]),)

colors = [Link]([x for x in 'b-g-r-c-m-y-k-


purple-coral-lime'.split('-')])
colors_map = colors_map[:1000]
[Link](figsize=(10,10))
for cl in range(nb_classes):
indices = [Link](colors_map==cl)
[Link](X_tsne[indices,0], X_tsne[indices,
1], c=colors[cl], label=cl)
[Link]()
[Link]()
Using Bokeh (Interactive Chart)

from [Link] import figure,


output_notebook, show

output_notebook()

<div class="bk-root">
<a href="[Link]
target="_blank" class="bk-logo bk-logo-small bk-
logo-notebook"></a>
<span id="0af86eff-6a55-4644-ab84-
9a6f5fcbeb3e">Loading BokehJS ...</span>
</div>

p = figure(plot_width=600, plot_height=600)

colors = [x for x in 'blue-green-red-cyan-magenta-


yellow-black-purple-coral-lime'.split('-')]
colors_map = colors_map[:1000]
for cl in range(nb_classes):
indices = [Link](colors_map==cl)
[Link](X_tsne[indices, 0].ravel(),
X_tsne[indices, 1].ravel(), size=7,
color=colors[cl], alpha=0.4,
legend=str(cl))

# show the results


[Link] = 'bottom_right'
show(p)
<div class="bk-root">
<div class="bk-plotdiv" id="e90df1a2-e577-
49fb-89bb-6e083232c9ec"></div>
</div>

Note: We used default TSNE parameters.


Better results can be achieved by tuning TSNE
Hyper-parameters
Exercise 1:

Try with a different algorithm to create the


manifold

from [Link] import MDS

## Your code here


Exercise 2:

Try extracting the Hidden features of the First


and the Last layer of the model

## Your code here

## Try using the `get_activations` function


relying on keras backend
def get_activations(model, layer, X_batch):
activations_f =
[Link]([[Link][0].input,
K.learning_phase()], [[Link],])
activations = activations_f((X_batch, False))
return activations
Convolutional Neural Network
References:
Some of the images and the content I used came from this
great couple of blog posts [1]
[Link] and [2]
the terrific book, "Neural Networks and Deep Learning" by
Michael Nielsen. (Strongly recommend)
A convolutional neural network (CNN, or ConvNet) is a type of
feed-forward artificial neural network in which the connectivity
pattern between its neurons is inspired by the organization of
the animal visual cortex.
The networks consist of multiple layers of small neuron
collections which process portions of the input image, called
receptive fields.
The outputs of these collections are then tiled so that their input
regions overlap, to obtain a better representation of the original
image; this is repeated for every such layer.
How does it look like?

source: [Link]
[Link]
The Problem Space

Image Classification
Image classification is the task of taking an input image and
outputting a class (a cat, dog, etc) or a probability of classes
that best describes the image.
For humans, this task of recognition is one of the first skills we
learn from the moment we are born and is one that comes
naturally and effortlessly as adults.
These skills of being able to quickly recognize patterns,
generalize from prior knowledge, and adapt to different image
environments are ones that we do not share with machines.
Inputs and Outputs

source: [Link]
content/uploads/sites/551/2014/11/[Link]
When a computer sees an image (takes an image as input), it
will see an array of pixel values.
Depending on the resolution and size of the image, it will see a
32 x 32 x 3 array of numbers (The 3 refers to RGB values).
let's say we have a color image in JPG form and its size is 480
x 480. The representative array will be 480 x 480 x 3. Each of
these numbers is given a value from 0 to 255 which describes
the pixel intensity at that point.
Goal
What we want the computer to do is to be able to differentiate
between all the images it’s given and figure out the unique
features that make a dog a dog or that make a cat a cat.
When we look at a picture of a dog, we can classify it as such if
the picture has identifiable features such as paws or 4 legs.
In a similar way, the computer should be able to perform image
classification by looking for low level features such as edges
and curves, and then building up to more abstract concepts
through a series of convolutional layers.
Structure of a CNN
A more detailed overview of what CNNs do would be that
you take the image, pass it through a series of
convolutional, nonlinear, pooling (downsampling), and fully
connected layers, and get an output. As we said earlier, the
output can be a single class or a probability of classes that
best describes the image.
source: [1]
Convolutional Layer
The first layer in a CNN is always a Convolutional Layer.

Reference:
[Link]
html

Convolutional filters
A Convolutional Filter much like a kernel in image recognition is
a small matrix useful for blurring, sharpening, embossing, edge
detection, and more.
This is accomplished by means of convolution between a kernel
and an image.

The main difference here is that the conv


matrices are learned.
As the filter is sliding, or convolving, around the input image, it
is multiplying the values in the filter with the original pixel values
of the image
(a.k.a. computing element wise multiplications).

Now, we repeat this process for every location on the input


volume. (Next step would be moving the filter to the right by 1
unit, then right again by 1, and so on).
After sliding the filter over all the locations, we are left with an
array of numbers usually called an activation map or feature
map.
High Level Perspective
Let’s talk about briefly what this convolution is actually doing
from a high level.
Each of these filters can be thought of as feature identifiers
(e.g. straight edges, simple colors, curves)

Visualisation of the Receptive Field


The value is much lower! This is because there wasn’t anything
in the image section that responded to the curve detector filter.
Remember, the output of this conv layer is an activation map.
Going Deeper Through the Network
Now in a traditional convolutional neural network
architecture, there are other layers that are interspersed
between these conv layers.
ReLU (Rectified Linear Units) Layer
After each conv layer, it is convention to apply a nonlinear layer
(or activation layer) immediately afterward.
The purpose of this layer is to introduce nonlinearity to a
system that basically has just been computing linear operations
during the conv layers (just element wise multiplications and
summations)
In the past, nonlinear functions like tanh and sigmoid were
used, but researchers found out that ReLU layers work far
better because the network is able to train a lot faster (because
of the computational efficiency) without making a significant
difference to the accuracy.
It also helps to alleviate the vanishing gradient problem,
which is the issue where the lower layers of the network train
very slowly because the gradient decreases exponentially
through the layers
(very briefly)
Vanishing gradient problem depends on the choice of the
activation function.
Many common activation functions (e.g sigmoid or tanh )
squash their input into a very small output range in a very non-
linear fashion.
For example, sigmoid maps the real number line onto a "small"
range of [0, 1].
As a result, there are large regions of the input space which are
mapped to an extremely small range.
In these regions of the input space, even a large change in the
input will produce a small change in the output - hence the
gradient is small.

ReLu
The ReLu function is defined as $f(x) = \max(0, x),$ [2]
A smooth approximation to the rectifier is the analytic function:
$f(x) = \ln(1 + e^x)$
which is called the softplus function.
The derivative of softplus is $f'(x) = e^x / (e^x + 1) = 1 / (1 + e^{-
x})$, i.e. the logistic function.
[2] [Link] by G. E.
Hinton
Pooling Layers
After some ReLU layers, it is customary to apply a pooling
layer (aka downsampling layer).
In this category, there are also several layer options, with
maxpooling being the most popular.
Example of a MaxPooling filter

Other options for pooling layers are average pooling and L2-
norm pooling.
The intuition behind this Pooling layer is that once we know that
a specific feature is in the original input volume (there will be a
high activation value), its exact location is not as important as
its relative location to the other features.
Therefore this layer drastically reduces the spatial dimension
(the length and the width but not the depth) of the input volume.
This serves two main purposes: reduce the amount of
parameters; controlling overfitting.
An intuitive explanation for the usefulness of pooling could be
explained by an example:
Lets assume that we have a filter that is used for detecting
faces. The exact pixel location of the face is less relevant then
the fact that there is a face "somewhere at the top"
Dropout Layer
The dropout layers have the very specific function to drop out
a random set of activations in that layers by setting them to
zero in the forward pass. Simple as that.
It allows to avoid overfitting but has to be used only at training
time and not at test time.
Fully Connected Layer
The last layer, however, is an important one, namely the Fully
Connected Layer.
Basically, a FC layer looks at what high level features most
strongly correlate to a particular class and has particular
weights so that when you compute the products between the
weights and the previous layer, you get the correct probabilities
for the different classes.
Going further: Convolution Arithmetic
If you want to go further with Convolution and you want to fully
understand how convolution works with all the details we
omitted in this notebook, I strongly suggest to read this terrific
paper: A guide to convolution arithmetic for deep learning.
This paper is also referenced (with animations) in the theano
main documentation: convnet tutorial
CNN in Keras
Keras has an extensive support for Convolutional Layers:
1D Convolutional Layers;
2D Convolutional Layers;
3D Convolutional Layers;
Depthwise Convolution;
Transpose Convolution;
....
The corresponding keras package is
[Link] .

Take a look at the Convolutional Layers documentation to know


more about Conv Layers that are missing in this notebook.

Convolution1D

from [Link] import Conv1D

Conv1D(filters, kernel_size, strides=1,


padding='valid',
dilation_rate=1, activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None)
Arguments:
filters: Integer, the dimensionality of the output space (i.e.
the number output of filters in the convolution).
kernel_size: An integer or tuple/list of a single integer,
specifying the length of the 1D convolution window.
strides: An integer or tuple/list of a single integer,
specifying the stride length of the convolution. Specifying
any stride value != 1 is incompatible with specifying any
dilation_rate value != 1.
padding: One of "valid" , "causal" or "same"
(case-insensitive). "causal" results in causal (dilated)
convolutions, e.g. output[t] does not depend on input[t+1:].
Useful when modeling temporal data where the model
should not violate the temporal order. See WaveNet: A
Generative Model for Raw Audio, section 2.1.
dilation_rate: an integer or tuple/list of a single integer,
specifying the dilation rate to use for dilated convolution.
Currently, specifying any dilation_rate value != 1 is
incompatible with specifying any strides value != 1.
activation: Activation function to use (see activations). If
you don't specify anything, no activation is applied (ie.
"linear" activation: a(x) = x ).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights
matrix (see initializers).
bias_initializer: Initializer for the bias vector (see
initializers).
kernel_regularizer: Regularizer function applied to the
kernel weights matrix (see regularizer).
bias_regularizer: Regularizer function applied to the bias
vector (see regularizer).
activity_regularizer: Regularizer function applied to the
output of the layer (its "activation"). (see regularizer).
kernel_constraint: Constraint function applied to the kernel
matrix (see constraints).
bias_constraint: Constraint function applied to the bias
vector (see constraints).
Convolution operator for filtering neighborhoods of one-
dimensional inputs. When using this layer as the first layer
in a model, either provide the keyword argument
input_dim (int, e.g. 128 for sequences of 128-
dimensional vectors), or input_shape (tuple of integers,
e.g. (10, 128) for sequences of 10 vectors of 128-
dimensional vectors).

Example

# apply a convolution 1d of length 3 to a sequence


with 10 timesteps,
# with 64 output filters
model = Sequential()
[Link](Conv1D(64, 3, padding='same',
input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)

# add a new conv1d on top


[Link](Conv1D(32, 3, padding='same'))
# now model.output_shape == (None, 10, 32)

Convolution2D
from [Link] import Conv2D

Conv2D(filters, kernel_size, strides=(1, 1),


padding='valid',
data_format=None, dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None)

Arguments:
filters: Integer, the dimensionality of the output space (i.e.
the number output of filters in the convolution).
kernel_size: An integer or tuple/list of 2 integers, specifying
the width and height of the 2D convolution window. Can be
a single integer to specify the same value for all spatial
dimensions.
strides: An integer or tuple/list of 2 integers, specifying the
strides of the convolution along the width and height. Can
be a single integer to specify the same value for all spatial
dimensions. Specifying any stride value != 1 is incompatible
with specifying any dilation_rate value != 1.
padding: one of "valid" or "same" (case-insensitive).
data_format: A string, one of channels_last (default) or
channels_first . The ordering of the dimensions in the
inputs. channels_last corresponds to inputs with shape
(batch, height, width, channels) while
channels_first corresponds to inputs with shape
(batch, channels, height, width) . It defaults to the
image_data_format value found in your Keras config file
at ~/.keras/[Link] . If you never set it, then it will
be "channels_last".
dilation_rate: an integer or tuple/list of 2 integers,
specifying the dilation rate to use for dilated convolution.
Can be a single integer to specify the same value for all
spatial dimensions. Currently, specifying any
dilation_rate value != 1 is incompatible with specifying
any stride value != 1.
activation: Activation function to use (see activations). If
you don't specify anything, no activation is applied (ie.
"linear" activation: a(x) = x ).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights
matrix (see initializers).
bias_initializer: Initializer for the bias vector (see
initializers).
kernel_regularizer: Regularizer function applied to the
kernel weights matrix (see regularizer).
bias_regularizer: Regularizer function applied to the bias
vector (see regularizer).
activity_regularizer: Regularizer function applied to the
output of the layer (its "activation"). (see regularizer).
kernel_constraint: Constraint function applied to the kernel
matrix (see constraints).
bias_constraint: Constraint function applied to the bias
vector (see constraints).

Example
Assuming
[Link].image_data_format == "channels_last"

# apply a 3x3 convolution with 64 output filters


on a 256x256 image:
model = Sequential()
[Link](Conv2D(64, (3, 3), padding='same',
input_shape=(3, 256, 256)))
# now model.output_shape == (None, 256, 256, 64)

# add a 3x3 convolution on top, with 32 output


filters:
[Link](Conv2D(32, (3, 3), padding='same'))
# now model.output_shape == (None, 256, 256, 32)
Dimensions of Conv filters in Keras
The complex structure of ConvNets may lead to a
representation that is challenging to understand.
Of course, the dimensions vary according to the dimension of
the Convolutional filters (e.g. 1D, 2D)

Convolution1D
Input Shape:
3D tensor with shape: ( batch_size , steps , input_dim ).

Output Shape:
3D tensor with shape: ( batch_size , new_steps ,
filters ).

Convolution2D
Input Shape:
4D tensor with shape:
( batch_size , channels , rows , cols ) if
image_data_format='channels_last'
( batch_size , rows , cols , channels ) if
image_data_format='channels_first'

Output Shape:
4D tensor with shape:
( batch_size , filters , new_rows , new_cols ) if
image_data_format='channels_first'
( batch_size , new_rows , new_cols , filters ) if
image_data_format='channels_last'
Convolution Nets for MNIST
Deep Learning models can take quite a bit of time to run,
particularly if GPU isn't used.
In the interest of time, you could sample a subset of
observations (e.g. $1000$) that are a particular number of your
choice (e.g. $6$) and $1000$ observations that aren't that
particular number (i.e. $\neq 6$).
We will build a model using that and see how it performs on the
test dataset

#Import the required libraries


import numpy as np
[Link](1338)

from [Link] import mnist

Using TensorFlow backend.

from [Link] import Sequential


from [Link] import Dense, Dropout,
Activation, Flatten

from [Link] import Conv2D


from [Link] import MaxPooling2D
from [Link] import np_utils
from [Link] import SGD
Loading Data

#Load the training and testing data


(X_train, y_train), (X_test, y_test) =
mnist.load_data()

X_test_orig = X_test
Data Preparation

Very Important:
When dealing with images & convolutions, it is paramount to
handle image_data_format properly

from keras import backend as K

img_rows, img_cols = 28, 28

if K.image_data_format() == 'channels_first':
shape_ord = (1, img_rows, img_cols)
else: # channel_last
shape_ord = (img_rows, img_cols, 1)

Preprocess and Normalise Data

X_train = X_train.reshape((X_train.shape[0],) +
shape_ord)
X_test = X_test.reshape((X_test.shape[0],) +
shape_ord)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train /= 255
X_test /= 255
[Link](1338) # for reproducibilty!!

# Test data
X_test = X_test.copy()
Y = y_test.copy()

# Converting the output to binary


classification(Six=1,Not Six=0)
Y_test = Y == 6
Y_test = Y_test.astype(int)

# Selecting the 5918 examples where the output is


6
X_six = X_train[y_train == 6].copy()
Y_six = y_train[y_train == 6].copy()

# Selecting the examples where the output is not 6


X_not_six = X_train[y_train != 6].copy()
Y_not_six = y_train[y_train != 6].copy()

# Selecting 6000 random examples from the data


that
# only contains the data where the output is not 6
random_rows =
[Link](0,X_six.shape[0],6000)
X_not_six = X_not_six[random_rows]
Y_not_six = Y_not_six[random_rows]

# Appending the data with output as 6 and data


with output as <> 6
X_train = [Link](X_six,X_not_six)

# Reshaping the appended data to appropraite form


X_train = X_train.reshape((X_six.shape[0] +
X_not_six.shape[0],) + shape_ord)

# Appending the labels and converting the labels


to
# binary classification(Six=1,Not Six=0)
Y_labels = [Link](Y_six,Y_not_six)
Y_train = Y_labels == 6
Y_train = Y_train.astype(int)

print(X_train.shape, Y_labels.shape, X_test.shape,


Y_test.shape)

(11918, 28, 28, 1) (11918,) (10000, 28, 28, 1)


(10000,)

# Converting the classes to its binary categorical


form
nb_classes = 2
Y_train = np_utils.to_categorical(Y_train,
nb_classes)
Y_test = np_utils.to_categorical(Y_test,
nb_classes)
A simple CNN

# -- Initializing the values for the convolution


neural network

nb_epoch = 2 # kept very low! Please increase if


you have GPU

batch_size = 64
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
nb_pool = 2
# convolution kernel size
nb_conv = 3

# Vanilla SGD
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9,
nesterov=True)

Step 1: Model Definition

model = Sequential()

[Link](Conv2D(nb_filters, (nb_conv, nb_conv),


padding='valid',
input_shape=shape_ord)) # note:
the very first layer **must** always specify the
input_shape
[Link](Activation('relu'))
[Link](Flatten())
[Link](Dense(nb_classes))
[Link](Activation('softmax'))

Step 2: Compile

[Link](loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])

Step 3: Fit

hist = [Link](X_train, Y_train,


batch_size=batch_size,
epochs=nb_epoch, verbose=1,
validation_data=(X_test, Y_test))

Train on 11918 samples, validate on 10000 samples


Epoch 1/2
11918/11918 [==============================] - 8s
- loss: 0.2321 - acc: 0.9491 - val_loss: 0.1276 -
val_acc: 0.9616
Epoch 2/2
11918/11918 [==============================] - 1s
- loss: 0.1065 - acc: 0.9666 - val_loss: 0.0933 -
val_acc: 0.9685

import [Link] as plt


%matplotlib inline
[Link]()
[Link]('Epochs')
[Link]('Loss')
[Link]([Link]['loss'])
[Link]([Link]['val_loss'])
[Link](['Training', 'Validation'])

[Link]()
[Link]('Epochs')
[Link]('Accuracy')
[Link]([Link]['acc'])
[Link]([Link]['val_acc'])
[Link](['Training', 'Validation'], loc='lower
right')

<[Link] at 0x7fdb2c0235f8>
Step 4: Evaluate

print('Available Metrics in Model:


{}'.format(model.metrics_names))

Available Metrics in Model: ['loss', 'acc']

# Evaluating the model on the test data


loss, accuracy = [Link](X_test, Y_test,
verbose=0)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

Test Loss: 0.0933376350194


Test Accuracy: 0.9685

Let's plot our model Predictions!


import [Link] as plt

%matplotlib inline

slice = 15
predicted =
[Link](X_test[:slice]).argmax(-1)

[Link](figsize=(16,8))
for i in range(slice):
[Link](1, slice, i+1)
[Link](X_test_orig[i],
interpolation='nearest')
[Link](0, 0, predicted[i], color='black',
bbox=dict(facecolor='white',
alpha=1))
[Link]('off')
Adding more Dense Layers

model = Sequential()
[Link](Conv2D(nb_filters, (nb_conv, nb_conv),
padding='valid',
input_shape=shape_ord))
[Link](Activation('relu'))

[Link](Flatten())
[Link](Dense(128))
[Link](Activation('relu'))

[Link](Dense(nb_classes))
[Link](Activation('softmax'))

[Link](loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

[Link](X_train, Y_train, batch_size=batch_size,


epochs=nb_epoch,verbose=1,
validation_data=(X_test, Y_test))

Train on 11918 samples, validate on 10000 samples


Epoch 1/2
11918/11918 [==============================] - 2s
- loss: 0.1922 - acc: 0.9503 - val_loss: 0.0864 -
val_acc: 0.9721
Epoch 2/2
11918/11918 [==============================] - 1s
- loss: 0.0902 - acc: 0.9705 - val_loss: 0.0898 -
val_acc: 0.9676

<[Link] at 0x7fdacc048cf8>

#Evaluating the model on the test data


score, accuracy = [Link](X_test, Y_test,
verbose=0)
print('Test score:', score)
print('Test accuracy:', accuracy)

Test score: 0.0898462146357


Test accuracy: 0.9676
Adding Dropout

model = Sequential()

[Link](Conv2D(nb_filters, (nb_conv, nb_conv),


padding='valid',
input_shape=shape_ord))
[Link](Activation('relu'))

[Link](Flatten())
[Link](Dense(128))
[Link](Activation('relu'))
[Link](Dropout(0.5))
[Link](Dense(nb_classes))
[Link](Activation('softmax'))

[Link](loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

[Link](X_train, Y_train, batch_size=batch_size,


epochs=nb_epoch,verbose=1,
validation_data=(X_test, Y_test))

Train on 11918 samples, validate on 10000 samples


Epoch 1/2
11918/11918 [==============================] - 1s
- loss: 0.2394 - acc: 0.9330 - val_loss: 0.1882 -
val_acc: 0.9355
Epoch 2/2
11918/11918 [==============================] - 1s
- loss: 0.1038 - acc: 0.9654 - val_loss: 0.0900 -
val_acc: 0.9679

<[Link] at 0x7fdacc064be0>

#Evaluating the model on the test data


score, accuracy = [Link](X_test, Y_test,
verbose=0)
print('Test score:', score)
print('Test accuracy:', accuracy)

Test score: 0.0900323278204


Test accuracy: 0.9679
Adding more Convolution Layers

model = Sequential()
[Link](Conv2D(nb_filters, (nb_conv, nb_conv),
padding='valid',
input_shape=shape_ord))
[Link](Activation('relu'))
[Link](Conv2D(nb_filters, (nb_conv, nb_conv)))
[Link](Activation('relu'))
[Link](MaxPooling2D(pool_size=(nb_pool,
nb_pool)))
[Link](Dropout(0.25))

[Link](Flatten())
[Link](Dense(128))
[Link](Activation('relu'))
[Link](Dropout(0.5))
[Link](Dense(nb_classes))
[Link](Activation('softmax'))

[Link](loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

[Link](X_train, Y_train, batch_size=batch_size,


epochs=nb_epoch,verbose=1,
validation_data=(X_test, Y_test))

Train on 11918 samples, validate on 10000 samples


Epoch 1/2
11918/11918 [==============================] - 2s
- loss: 0.3680 - acc: 0.8722 - val_loss: 0.1699 -
val_acc: 0.9457
Epoch 2/2
11918/11918 [==============================] - 2s
- loss: 0.1380 - acc: 0.9508 - val_loss: 0.0600 -
val_acc: 0.9793

<[Link] at 0x7fdb308ea978>

#Evaluating the model on the test data


score, accuracy = [Link](X_test, Y_test,
verbose=0)
print('Test score:', score)
print('Test accuracy:', accuracy)

Test score: 0.0600312609494


Test accuracy: 0.9793
Exercise
The above code has been written as a function.
Change some of the hyperparameters and see what happens.

# Function for constructing the convolution neural


network
# Feel free to add parameters, if you want

def build_model():
""""""
model = Sequential()
[Link](Conv2D(nb_filters, (nb_conv,
nb_conv),
padding='valid',
input_shape=shape_ord))
[Link](Activation('relu'))
[Link](Conv2D(nb_filters, (nb_conv,
nb_conv)))
[Link](Activation('relu'))
[Link](MaxPooling2D(pool_size=(nb_pool,
nb_pool)))
[Link](Dropout(0.25))

[Link](Flatten())
[Link](Dense(128))
[Link](Activation('relu'))
[Link](Dropout(0.5))
[Link](Dense(nb_classes))
[Link](Activation('softmax'))

[Link](loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

[Link](X_train, Y_train,
batch_size=batch_size,
epochs=nb_epoch,verbose=1,
validation_data=(X_test, Y_test))

#Evaluating the model on the test data


score, accuracy = [Link](X_test,
Y_test, verbose=0)
print('Test score:', score)
print('Test accuracy:', accuracy)

#Timing how long it takes to build the model and


test it.
%timeit -n1 -r1 build_model()

Train on 11918 samples, validate on 10000 samples


Epoch 1/2
11918/11918 [==============================] - 2s
- loss: 0.3752 - acc: 0.8672 - val_loss: 0.1512 -
val_acc: 0.9505
Epoch 2/2
11918/11918 [==============================] - 2s
- loss: 0.1384 - acc: 0.9528 - val_loss: 0.0672 -
val_acc: 0.9775
Test score: 0.0671689324878
Test accuracy: 0.9775
5.98 s ± 0 ns per loop (mean ± std. dev. of 1 run,
1 loop each)
Understanding Convolutional Layers
Structure
In this exercise we want to build a (quite shallow) network
which contains two [Convolution, Convolution, MaxPooling]
stages, and two Dense layers.
To test a different optimizer, we will use AdaDelta, which is a bit
more complex than the simple Vanilla SGD with momentum.

from [Link] import Adadelta

input_shape = shape_ord
nb_classes = 10

## [conv@32x3x3+relu]x2 --> MaxPool@2x2 -->


DropOut@0.25 -->
## [conv@64x3x3+relu]x2 --> MaxPool@2x2 -->
DropOut@0.25 -->
## Flatten--> FC@512+relu --> DropOut@0.5 -->
FC@nb_classes+SoftMax
## NOTE: each couple of Conv filters must have
`border_mode="same"` and `"valid"`, respectively

# %load solutions/[Link]

Understanding layer shapes


An important feature of Keras layers is that each of them has
an input_shape attribute, which you can use to visualize the
shape of the input tensor, and an output_shape attribute, for
inspecting the shape of the output tensor.
As we can see, the input shape of the first convolutional layer
corresponds to the input_shape attribute (which must be
specified by the user).
In this case, it is a 28x28 image with three color channels.

Since this convolutional layer has the padding set to same ,


its output width and height will remain the same, and the
number of output channel will be equal to the number of filters
learned by the layer, 16.
The following convolutional layer, instead, have the default
padding , and therefore reduce width and height by $(k-1)$,
where $k$ is the size of the kernel.
MaxPooling layers, instead, reduce width and height of the
input tensor, but keep the same number of channels.
Activation layers, of course, don't change the shape.

for i, layer in enumerate([Link]):


print ("Layer", i, "\t", [Link], "\t\t",
layer.input_shape, "\t", layer.output_shape)

Layer 0 conv2d_12 (None, 28, 28, 1)


(None, 28, 28, 32)
Layer 1 activation_21 (None, 28, 28,
32) (None, 28, 28, 32)
Layer 2 conv2d_13 (None, 28, 28, 32)
(None, 26, 26, 32)
Layer 3 activation_22 (None, 26, 26,
32) (None, 26, 26, 32)
Layer 4 max_pooling2d_5 (None, 26,
26, 32) (None, 13, 13, 32)
Layer 5 dropout_6 (None, 13, 13, 32)
(None, 13, 13, 32)
Layer 6 conv2d_14 (None, 13, 13, 32)
(None, 13, 13, 64)
Layer 7 activation_23 (None, 13, 13,
64) (None, 13, 13, 64)
Layer 8 conv2d_15 (None, 13, 13, 64)
(None, 11, 11, 64)
Layer 9 activation_24 (None, 11, 11,
64) (None, 11, 11, 64)
Layer 10 max_pooling2d_6 (None, 11,
11, 64) (None, 5, 5, 64)
Layer 11 dropout_7 (None, 5, 5, 64)
(None, 5, 5, 64)
Layer 12 flatten_6 (None, 5, 5, 64)
(None, 1600)
Layer 13 dense_10 (None, 1600)
(None, 512)
Layer 14 activation_25 (None, 512)
(None, 512)
Layer 15 dropout_8 (None, 512)
(None, 512)
Layer 16 dense_11 (None, 512)
(None, 10)
Layer 17 activation_26 (None, 10)
(None, 10)

Understanding weights shape


In the same way, we can visualize the shape of the weights
learned by each layer.
In particular, Keras lets you inspect weights by using the
get_weights method of a layer object.

This will return a list with two elements, the first one being the
weight tensor and the second one being the bias vector.
In particular:
MaxPooling layer don't have any weight tensor, since they
don't have learnable parameters.
Convolutional layers, instead, learn a $(n_o, n_i, k, k)$
weight tensor, where $k$ is the size of the kernel, $n_i$ is
the number of channels of the input tensor, and $n_o$ is
the number of filters to be learned.
For each of the $n_o$ filters, a bias is also learned.
Dense layers learn a $(n_i, n_o)$ weight tensor, where
$n_o$ is the output size and $n_i$ is the input size of the
layer. Each of the $n_o$ neurons also has a bias.

for i, layer in enumerate([Link]):


if len(layer.get_weights()) > 0:
W, b = layer.get_weights()
print("Layer", i, "\t", [Link],
"\t\t", [Link], "\t", [Link])

Layer 0 conv2d_12 (3, 3, 1, 32)


(32,)
Layer 2 conv2d_13 (3, 3, 32, 32)
(32,)
Layer 6 conv2d_14 (3, 3, 32, 64)
(64,)
Layer 8 conv2d_15 (3, 3, 64, 64)
(64,)
Layer 13 dense_10 (1600, 512)
(512,)
Layer 16 dense_11 (512, 10)
(10,)
Batch Normalisation
Normalize the activations of the previous layer at each batch,
i.e. applies a transformation that maintains the mean activation
close to 0 and the activation standard deviation close to 1.
How to BatchNorm in Keras

from [Link] import


BatchNormalization

BatchNormalization(axis=-1, momentum=0.99,
epsilon=0.001, center=True, scale=True,
beta_initializer='zeros',
gamma_initializer='ones',
moving_mean_initializer='zeros',

moving_variance_initializer='ones',
beta_regularizer=None, gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None)

Arguments
axis: Integer, the axis that should be normalized (typically
the features axis). For instance, after a Conv2D layer with
data_format="channels_first" , set axis=1 in
BatchNormalization .
momentum: Momentum for the moving average.
epsilon: Small float added to variance to avoid dividing by
zero.
center: If True, add offset of beta to normalized tensor. If
False, beta is ignored.
scale: If True, multiply by gamma . If False, gamma is not
used. When the next layer is linear (also e.g. [Link] ),
this can be disabled since the scaling will be done by the
next layer.
beta_initializer: Initializer for the beta weight.
gamma_initializer: Initializer for the gamma weight.
moving_mean_initializer: Initializer for the moving mean.
moving_variance_initializer: Initializer for the moving
variance.
beta_regularizer: Optional regularizer for the beta weight.
gamma_regularizer: Optional regularizer for the gamma
weight.
beta_constraint: Optional constraint for the beta weight.
gamma_constraint: Optional constraint for the gamma
weight.

Excercise

# Try to add a new BatchNormalization layer to the


Model
# (after the Dropout layer) - before or after the
ReLU Activation
Addendum:
CNN on CIFAR10
Convolutional Neural Network
In this second exercise-notebook we will play with
Convolutional Neural Network (CNN).
As you should have seen, a CNN is a feed-forward neural
network tipically composed of Convolutional, MaxPooling and
Dense layers.
If the task implemented by the CNN is a classification task, the
last Dense layer should use the Softmax activation, and the
loss should be the categorical crossentropy.
Reference:
[Link]
[Link]
Training the network
We will train our network on the CIFAR10 dataset, which
contains 50,000 32x32 color training images, labeled over 10
categories, and 10,000 test images.
As this dataset is also included in Keras datasets, we just ask
the [Link] module for the dataset.

Training and test images are normalized to lie in the


$\left[0,1\right]$ interval.

from [Link] import cifar10


from [Link] import np_utils

(X_train, y_train), (X_test, y_test) =


cifar10.load_data()
Y_train = np_utils.to_categorical(y_train,
nb_classes)
Y_test = np_utils.to_categorical(y_test,
nb_classes)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255

To reduce the risk of overfitting, we also apply some image


transformation, like rotations, shifts and flips. All these can be
easily implemented using the Keras Image Data Generator.
Warning: The following cells may be
computational Intensive....

from [Link] import


ImageDataGenerator

generated_images = ImageDataGenerator(
featurewise_center=True, # set input mean to
0 over the dataset
samplewise_center=False, # set each sample
mean to 0
featurewise_std_normalization=True, # divide
inputs by std of the dataset
samplewise_std_normalization=False, # divide
each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in
the range (degrees, 0 to 180)
width_shift_range=0.2, # randomly shift
images horizontally (fraction of total width)
height_shift_range=0.2, # randomly shift
images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images

generated_images.fit(X_train)

Now we can start training.


At each iteration, a batch of 500 images is requested to the
ImageDataGenerator object, and then fed to the network.

X_train.shape
(50000, 3, 32, 32)

gen = generated_images.flow(X_train, Y_train,


batch_size=500, shuffle=True)
X_batch, Y_batch = next(gen)

X_batch.shape

(500, 3, 32, 32)

from [Link] import generic_utils

n_epochs = 2
for e in range(n_epochs):
print('Epoch', e)
print('Training...')
progbar =
generic_utils.Progbar(X_train.shape[0])

for X_batch, Y_batch in


generated_images.flow(X_train, Y_train,
batch_size=500, shuffle=True):
loss = model.train_on_batch(X_batch,
Y_batch)
[Link](X_batch.shape[0], values=
[('train loss', loss[0])])
Deep Network Models
Constructing and training your own ConvNet from scratch can
be Hard and a long task.
A common trick used in Deep Learning is to use a pre-trained
model and finetune it to the specific data it will be used for.
Famous Models with Keras
This notebook contains code and reference for the following
Keras models (gathered from
[Link]
VGG16
VGG19
ResNet50
Inception v3
Xception
... more to come
References
Very Deep Convolutional Networks for Large-Scale Image
Recognition - please cite this paper if you use the VGG
models in your work.
Deep Residual Learning for Image Recognition - please cite
this paper if you use the ResNet model in your work.
Rethinking the Inception Architecture for Computer Vision -
please cite this paper if you use the Inception v3 model in
your work.
All architectures are compatible with both TensorFlow and
Theano, and upon instantiation the models will be built
according to the image dimension ordering set in your Keras
configuration file at ~/.keras/[Link] .

For instance, if you have set


image_data_format="channels_last" , then any model
loaded from this repository will get built according to the
TensorFlow dimension ordering convention, "Width-Height-
Depth".
VGG16
VGG19
[Link]

from [Link] import VGG16


from [Link].imagenet_utils import
preprocess_input, decode_predictions
import os

Using TensorFlow backend.

vgg16 = VGG16(include_top=True,
weights='imagenet')
[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
input_1 (InputLayer) (None, 224, 224, 3)
0
__________________________________________________
_______________
block1_conv1 (Conv2D) (None, 224, 224, 64)
1792
__________________________________________________
_______________
block1_conv2 (Conv2D) (None, 224, 224, 64)
36928
__________________________________________________
_______________
block1_pool (MaxPooling2D) (None, 112, 112, 64)
0
__________________________________________________
_______________
block2_conv1 (Conv2D) (None, 112, 112, 128)
73856
__________________________________________________
_______________
block2_conv2 (Conv2D) (None, 112, 112, 128)
147584
__________________________________________________
_______________
block2_pool (MaxPooling2D) (None, 56, 56, 128)
0
__________________________________________________
_______________
block3_conv1 (Conv2D) (None, 56, 56, 256)
295168
__________________________________________________
_______________
block3_conv2 (Conv2D) (None, 56, 56, 256)
590080
__________________________________________________
_______________
block3_conv3 (Conv2D) (None, 56, 56, 256)
590080
__________________________________________________
_______________
block3_pool (MaxPooling2D) (None, 28, 28, 256)
0
__________________________________________________
_______________
block4_conv1 (Conv2D) (None, 28, 28, 512)
1180160
__________________________________________________
_______________
block4_conv2 (Conv2D) (None, 28, 28, 512)
2359808
__________________________________________________
_______________
block4_conv3 (Conv2D) (None, 28, 28, 512)
2359808
__________________________________________________
_______________
block4_pool (MaxPooling2D) (None, 14, 14, 512)
0
__________________________________________________
_______________
block5_conv1 (Conv2D) (None, 14, 14, 512)
2359808
__________________________________________________
_______________
block5_conv2 (Conv2D) (None, 14, 14, 512)
2359808
__________________________________________________
_______________
block5_conv3 (Conv2D) (None, 14, 14, 512)
2359808
__________________________________________________
_______________
block5_pool (MaxPooling2D) (None, 7, 7, 512)
0
__________________________________________________
_______________
flatten (Flatten) (None, 25088)
0
__________________________________________________
_______________
fc1 (Dense) (None, 4096)
102764544
__________________________________________________
_______________
fc2 (Dense) (None, 4096)
16781312
__________________________________________________
_______________
predictions (Dense) (None, 1000)
4097000
==================================================
===============
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
__________________________________________________
_______________

If you're wondering where this HDF5 files with weights is


stored, please take a look at ~/.keras/models/

HandsOn VGG16 - Pre-trained Weights

IMAGENET_FOLDER = 'img/imagenet' #in the repo

!ls img/imagenet

apricot_565.jpeg apricot_787.jpeg
strawberry_1174.jpeg

apricot_696.jpeg strawberry_1157.jpeg
strawberry_1189.jpeg
from [Link] import image
import numpy as np

img_path = [Link](IMAGENET_FOLDER,
'strawberry_1157.jpeg')
img = image.load_img(img_path, target_size=(224,
224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', [Link])

preds = [Link](x)
print('Predicted:', decode_predictions(preds))

Input image shape: (1, 224, 224, 3)


Predicted: [[('n07745940', 'strawberry',
0.98570204), ('n07836838', 'chocolate_sauce',
0.005128039), ('n04332243', 'strainer',
0.003665844), ('n07614500', 'ice_cream',
0.0021996102), ('n04476259', 'tray',
0.0011693746)]]
img_path = [Link](IMAGENET_FOLDER,
'apricot_696.jpeg')
img = image.load_img(img_path, target_size=(224,
224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', [Link])

preds = [Link](x)
print('Predicted:', decode_predictions(preds))

Input image shape: (1, 224, 224, 3)


Predicted: [[('n07747607', 'orange', 0.84150302),
('n07749582', 'lemon', 0.053847123), ('n07717556',
'butternut_squash', 0.017796788), ('n03937543',
'pill_bottle', 0.015318954), ('n07720875',
'bell_pepper', 0.0083615109)]]
img_path = [Link](IMAGENET_FOLDER,
'apricot_565.jpeg')
img = image.load_img(img_path, target_size=(224,
224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', [Link])

preds = [Link](x)
print('Predicted:', decode_predictions(preds))

Input image shape: (1, 224, 224, 3)


Predicted: [[('n07718472', 'cucumber',
0.37647018), ('n07716358', 'zucchini',
0.25893891), ('n07711569', 'mashed_potato',
0.049320061), ('n07716906', 'spaghetti_squash',
0.033613835), ('n12144580', 'corn', 0.031451162)]]
Hands On:
Try to do the same with VGG19 Model

# from [Link] import VGG19

# - Visualise Summary
# - Infer classes using VGG19 predictions

# [your code here]


Residual Networks
ResNet 50
from [Link] import ResNet50

A ResNet is composed by two main blocks: Identity Block and


the ConvBlock.
IdentityBlock is the block that has no conv layer at shortcut
ConvBlock is the block that has a conv layer at shortcut

from [Link].resnet50 import


identity_block, conv_block

identity_block??

conv_block??
Visualising Convolutional Filters of a
CNN

import numpy as np
import time
from [Link] import vgg16
from keras import backend as K

from matplotlib import pyplot as plt

%matplotlib inline

# dimensions of the generated pictures for each


filter.
IMG_WIDTH = 224
IMG_HEIGHT = 224

from [Link] import vgg16

# build the VGG16 network with ImageNet weights


vgg16 = vgg16.VGG16(weights='imagenet',
include_top=False)
print('Model loaded.')

Model loaded.
[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
input_2 (InputLayer) (None, None, None, 3)
0
__________________________________________________
_______________
block1_conv1 (Conv2D) (None, None, None,
64) 1792
__________________________________________________
_______________
block1_conv2 (Conv2D) (None, None, None,
64) 36928
__________________________________________________
_______________
block1_pool (MaxPooling2D) (None, None, None,
64) 0
__________________________________________________
_______________
block2_conv1 (Conv2D) (None, None, None,
128) 73856
__________________________________________________
_______________
block2_conv2 (Conv2D) (None, None, None,
128) 147584
__________________________________________________
_______________
block2_pool (MaxPooling2D) (None, None, None,
128) 0
__________________________________________________
_______________
block3_conv1 (Conv2D) (None, None, None,
256) 295168
__________________________________________________
_______________
block3_conv2 (Conv2D) (None, None, None,
256) 590080
__________________________________________________
_______________
block3_conv3 (Conv2D) (None, None, None,
256) 590080
__________________________________________________
_______________
block3_pool (MaxPooling2D) (None, None, None,
256) 0
__________________________________________________
_______________
block4_conv1 (Conv2D) (None, None, None,
512) 1180160
__________________________________________________
_______________
block4_conv2 (Conv2D) (None, None, None,
512) 2359808
__________________________________________________
_______________
block4_conv3 (Conv2D) (None, None, None,
512) 2359808
__________________________________________________
_______________
block4_pool (MaxPooling2D) (None, None, None,
512) 0
__________________________________________________
_______________
block5_conv1 (Conv2D) (None, None, None,
512) 2359808
__________________________________________________
_______________
block5_conv2 (Conv2D) (None, None, None,
512) 2359808
__________________________________________________
_______________
block5_conv3 (Conv2D) (None, None, None,
512) 2359808
__________________________________________________
_______________
block5_pool (MaxPooling2D) (None, None, None,
512) 0
==================================================
===============
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
__________________________________________________
_______________

from collections import OrderedDict


layer_dict = OrderedDict()
# get the symbolic outputs of each "key" layer (we
gave them unique names).
for layer in [Link][1:]:
layer_dict[[Link]] = layer

Test Image

img_path = [Link](IMAGENET_FOLDER,
'strawberry_1157.jpeg')
img = image.load_img(img_path, target_size=
(IMG_WIDTH, IMG_HEIGHT))
[Link](img)
<[Link] at 0x7f896378cda0>

input_img_data = image.img_to_array(img)
# input_img_data /= 255
[Link](input_img_data)

<[Link] at 0x7f89636cd240>
input_img_data = np.expand_dims(input_img_data,
axis=0)
print('Input image shape:', input_img_data.shape)

Input image shape: (1, 224, 224, 3)

Visualising Image throught the layers

## Recall the function defined in notebook on


hidden features (2.1 Hidden Layer Repr. and
Embeddings)

def get_activations(model, layer, input_img_data):


activations_f =
[Link]([[Link][0].input,
K.learning_phase()], [[Link],])
activations = activations_f((input_img_data,
False))
return activations

layer_name = 'block1_conv2'
layer = layer_dict[layer_name]
activations = get_activations(vgg16, layer,
input_img_data)

print(len(activations))
activation = activations[0]
[Link]
1

(1, 224, 224, 64)

[Link] # no. of filters in the selected


conv block

64

activated_img = activation[0]
n = 8
fig = [Link](figsize=(20, 20))
for i in range(n):
for j in range(n):
idx = (n*i)+j
ax = fig.add_subplot(n, n, idx+1)
[Link](activated_img[:,:,idx])
conv_img_mean = [Link](activated_img, axis=2)

conv_img_mean.shape

(224, 224)
[Link](conv_img_mean)

<[Link] at 0x7f895e8be668>

Now visualise the first 64 filters of the


block5_conv2 layer

layer_name = 'block5_conv2'
layer = layer_dict[layer_name]
activations = get_activations(vgg16, layer,
input_img_data)
activated_img = activations[0][0] # [0][0] ->
first (and only) activation, first (and only)
sample in batch
n = 8
fig = [Link](figsize=(20, 20))
for i in range(n):
for j in range(n):
idx = (n*i)+j
ax = fig.add_subplot(n, n, idx+1)
[Link](activated_img[:,:,idx])
How Convnet see the world
Reference: [Link]
[Link]

Specify Percentage of Filters to scan


In this example, we'll still using VGG16 as the reference model.
Of course, the same code applies to different CNN models, with
appriopriate changes in layers references/names.

Please note that VGG16 includes a variable number of


convolutional filters, depending on the particular layer(s)
selected for processing.
Processing all the convolutional filters may be a high intensive
computation and time consuming and largely depending on the
number of parameters for the layer.
On my hardwarde (1 Tesla K80 GPU on Azure Cloud)
processing one single filter takes almost ~.5 secs. (on avg)
So, it would take ~256 secs (e.g. for block5_conv1 )
$\mapsto$ ~4mins (for one single layer name)

# utility function to convert a tensor into a


valid image

def deprocess_image(x):
# normalize tensor: center on 0., ensure std
is 0.1
x -= [Link]()
x /= ([Link]() + 1e-5)
x *= 0.1

# clip to [0, 1]
x += 0.5
x = [Link](x, 0, 1)

# convert to RGB array


x *= 255
if K.image_data_format() == 'channels_first':
x = [Link]((1, 2, 0))
x = [Link](x, 0, 255).astype('uint8')
return x

# dimensions of the generated pictures for each


filter.
img_width = 224
img_height = 224

def collect_filters(input_tensor, output_tensor,


filters):
kept_filters = []
start_time = [Link]()
for filter_index in range(0, filters):
if filter_index % 10 == 0:
print('\t Processing filter
{}'.format(filter_index))

# we build a loss function that maximizes


the activation
# of the nth filter of the layer
considered
if K.image_data_format() ==
'channels_first':
loss = [Link](output_tensor[:,
filter_index, :, :])
else:
loss = [Link](output_tensor[:, :, :,
filter_index])

# we compute the gradient of the input


picture wrt this loss
grads = [Link](loss, input_tensor)[0]
# normalization trick: we normalize the
gradient by its L2 norm
grads = grads /
([Link]([Link]([Link](grads))) + 1e-5)
# this function returns the loss and grads
given the input picture
iterate = [Link]([input_tensor],
[loss, grads])

# step size for gradient ascent


step = 1.

# we start from a gray image with some


random noise
if K.image_data_format() ==
'channels_first':
img_data = [Link]((1, 3,
img_width, img_height))
else:
img_data = [Link]((1,
img_width, img_height, 3))

img_data = (img_data - 0.5) * 20 + 128

# we run gradient ascent for 20 steps


for i in range(20):
loss_value, grads_value =
iterate([img_data])
img_data += grads_value * step
if loss_value <= 0.:
# some filters get stuck to 0, we
can skip them
break

# decode the resulting input image


if loss_value > 0:
img_deproc =
deprocess_image(img_data[0])
kept_filters.append((img_deproc,
loss_value))

end_time = [Link]()
print('\t Time required to process {} filters:
{}'.format(filters, (end_time - start_time)))

return kept_filters

# this is the placeholder for the input images


input_t = [Link]

def generate_stiched_filters(layer, nb_filters):


layer_name = [Link]
print('Processing {}
Layer'.format(layer_name))

# Processing filters of current layer


layer_output = [Link]
kept_filters = collect_filters(input_t,
layer_output, nb_filters)

print('Filter collection: completed!')


# we will stich the best sqrt(filters_to_scan)
filters put on a n x n grid.
limit = min(nb_filters, len(kept_filters))
n = [Link]([Link](limit)).astype([Link])

# the filters that have the highest loss are


assumed to be better-looking.
# we will only keep the top 64 filters.
kept_filters.sort(key=lambda x: x[1],
reverse=True)
kept_filters = kept_filters[:n * n]

# build a black picture with enough space for


margin = 5
width = n * img_width + (n - 1) * margin
height = n * img_height + (n - 1) * margin
stitched_filters = [Link]((width, height,
3))

# fill the picture with our saved filters


for i in range(n):
for j in range(n):
img, loss = kept_filters[i * n + j]
stitched_filters[(img_width + margin)
* i: (img_width + margin) * i + img_width,
(img_height + margin)
* j: (img_height + margin) * j + img_height, :] =
img
return stitched_filters

layer = layer_dict['block1_conv2'] # 64 filters


stitched_filters = generate_stiched_filters(layer,
[Link])
[Link](figsize=(10,10))
[Link](stitched_filters)
Processing block1_conv2 Layer
Processing filter 0
Processing filter 10
Processing filter 20
Processing filter 30
Processing filter 40
Processing filter 50
Processing filter 60
Time required to process 64 filters:
22.710692167282104
Filter collection: completed!

<[Link] at 0x7f895737ca90>
layer = layer_dict['block5_conv1'] # 512 filters
in total
stitched_filters = generate_stiched_filters(layer,
64)
[Link](figsize=(10,10))
[Link](stitched_filters)

Processing block5_conv1 Layer


Processing filter 0
Processing filter 10
Processing filter 20
Processing filter 30
Processing filter 40
Processing filter 50
Processing filter 60
Time required to process 64 filters:
101.60693192481995
Filter collection: completed!

<[Link] at 0x7f884e7d7c50>
HyperParameter Tuning
[Link].scikit_learn
Example adapted from:
[Link]
klearn_wrapper.py
Problem:
Builds simple CNN models on MNIST and uses sklearn's
GridSearchCV to find best model

import numpy as np
[Link](1337) # for reproducibility

from [Link] import mnist


from [Link] import Sequential
from [Link] import Dense, Dropout,
Activation, Flatten
from [Link] import Conv2D, MaxPooling2D
from [Link] import np_utils
from [Link].scikit_learn import
KerasClassifier
from keras import backend as K

Using TensorFlow backend.

from sklearn.model_selection import GridSearchCV


Data Preparation

nb_classes = 10

# input image dimensions


img_rows, img_cols = 28, 28

# load training data and do basic data


normalization
(X_train, y_train), (X_test, y_test) =
mnist.load_data()

if K.image_dim_ordering() == 'th':
X_train = X_train.reshape(X_train.shape[0], 1,
img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1,
img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0],
img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0],
img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# convert class vectors to binary class matrices
y_train = np_utils.to_categorical(y_train,
nb_classes)
y_test = np_utils.to_categorical(y_test,
nb_classes)
Build Model

def make_model(dense_layer_sizes, filters,


kernel_size, pool_size):
'''Creates model comprised of 2 convolutional
layers followed by dense layers

dense_layer_sizes: List of layer sizes. This


list has one number for each layer
nb_filters: Number of convolutional filters in
each convolutional layer
nb_conv: Convolutional kernel size
nb_pool: Size of pooling area for max pooling
'''

model = Sequential()

[Link](Conv2D(filters, (kernel_size,
kernel_size),
padding='valid',
input_shape=input_shape))
[Link](Activation('relu'))
[Link](Conv2D(filters, (kernel_size,
kernel_size)))
[Link](Activation('relu'))
[Link](MaxPooling2D(pool_size=(pool_size,
pool_size)))
[Link](Dropout(0.25))

[Link](Flatten())
for layer_size in dense_layer_sizes:
[Link](Dense(layer_size))
[Link](Activation('relu'))
[Link](Dropout(0.5))
[Link](Dense(nb_classes))
[Link](Activation('softmax'))

[Link](loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])

return model

dense_size_candidates = [[32], [64], [32, 32],


[64, 64]]
my_classifier = KerasClassifier(make_model,
batch_size=32)
GridSearch HyperParameters

validator = GridSearchCV(my_classifier,
param_grid=
{'dense_layer_sizes': dense_size_candidates,
# nb_epoch is
avail for tuning even when not
# an argument
to model building function
'epochs': [3,
6],
'filters':
[8],

'kernel_size': [3],
'pool_size':
[2]},
scoring='neg_log_loss',
n_jobs=1)
[Link](X_train, y_train)

Epoch 1/3
40000/40000 [==============================] -
ETA: 0s - loss: 0.8971 - acc: 0.694 - 10s - loss:
0.8961 - acc: 0.6953
Epoch 2/3
40000/40000 [==============================] - 9s
- loss: 0.5362 - acc: 0.8299
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.4425 - acc: 0.8594
39552/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 11s
- loss: 0.7593 - acc: 0.7543
Epoch 2/3
40000/40000 [==============================] - 10s
- loss: 0.4489 - acc: 0.8597
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.3841 - acc: 0.8814
39648/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 10s
- loss: 0.9089 - acc: 0.6946
Epoch 2/3
40000/40000 [==============================] - 9s
- loss: 0.5560 - acc: 0.8228
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.4597 - acc: 0.8556
39680/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 11s
- loss: 0.8415 - acc: 0.7162
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.4929 - acc: 0.8423
Epoch 3/6
40000/40000 [==============================] - 9s
- loss: 0.4172 - acc: 0.8703
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.3819 - acc: 0.8812
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.3491 - acc: 0.8919
Epoch 6/6
40000/40000 [==============================] - 10s
- loss: 0.3284 - acc: 0.8985
39680/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 11s
- loss: 0.7950 - acc: 0.7349
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.4913 - acc: 0.8428
Epoch 3/6
40000/40000 [==============================] - 10s
- loss: 0.4081 - acc: 0.8709
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.3613 - acc: 0.8870
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.3293 - acc: 0.8968
Epoch 6/6
40000/40000 [==============================] - 10s
- loss: 0.3024 - acc: 0.9058
39936/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 11s
- loss: 0.9822 - acc: 0.6735
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.6270 - acc: 0.8009
Epoch 3/6
40000/40000 [==============================] - 9s
- loss: 0.5045 - acc: 0.8409
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.4396 - acc: 0.8599
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.3978 - acc: 0.8775
Epoch 6/6
40000/40000 [==============================] - 10s
- loss: 0.3605 - acc: 0.8871
39872/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 11s
- loss: 0.6851 - acc: 0.7777
Epoch 2/3
40000/40000 [==============================] - 10s
- loss: 0.3989 - acc: 0.8776
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.3225 - acc: 0.9021
39552/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 11s
- loss: 0.5846 - acc: 0.8164
Epoch 2/3
40000/40000 [==============================] - 10s
- loss: 0.3243 - acc: 0.9053
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.2697 - acc: 0.9213
39680/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 11s
- loss: 0.6339 - acc: 0.8017
Epoch 2/3
40000/40000 [==============================] - 10s
- loss: 0.3417 - acc: 0.8975
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.2783 - acc: 0.9184
39648/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 11s
- loss: 0.6652 - acc: 0.7854
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.3693 - acc: 0.8911
Epoch 3/6
40000/40000 [==============================] - 10s
- loss: 0.2923 - acc: 0.9130
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.2479 - acc: 0.9274
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.2176 - acc: 0.9360
Epoch 6/6
40000/40000 [==============================] - 10s
- loss: 0.1994 - acc: 0.9416
39616/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 11s
- loss: 0.6463 - acc: 0.7952
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.3648 - acc: 0.8898
Epoch 3/6
40000/40000 [==============================] - 10s
- loss: 0.2880 - acc: 0.9154
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.2497 - acc: 0.9249
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.2154 - acc: 0.9357
Epoch 6/6
40000/40000 [==============================] - 10s
- loss: 0.1946 - acc: 0.9417
39584/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 11s
- loss: 0.6212 - acc: 0.8012
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.3341 - acc: 0.9008
Epoch 3/6
40000/40000 [==============================] - 10s
- loss: 0.2706 - acc: 0.9195
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.2343 - acc: 0.9307
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.2109 - acc: 0.9383
Epoch 6/6
40000/40000 [==============================] - 10s
- loss: 0.1961 - acc: 0.9420
39648/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 12s
- loss: 0.9322 - acc: 0.6835
Epoch 2/3
40000/40000 [==============================] - 10s
- loss: 0.5578 - acc: 0.8202
Epoch 3/3
40000/40000 [==============================] - 11s
- loss: 0.4651 - acc: 0.8518
40000/40000 [==============================] - 4s
Epoch 1/3
40000/40000 [==============================] - 11s
- loss: 0.7615 - acc: 0.7467
Epoch 2/3
40000/40000 [==============================] - 10s
- loss: 0.4369 - acc: 0.8634
Epoch 3/3
40000/40000 [==============================] - 10s
- loss: 0.3646 - acc: 0.8865
39904/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 12s
- loss: 0.7744 - acc: 0.7471
Epoch 2/3
40000/40000 [==============================] - 11s
- loss: 0.4294 - acc: 0.8674
Epoch 3/3
40000/40000 [==============================] - 11s
- loss: 0.3620 - acc: 0.8873
39968/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 12s
- loss: 0.8007 - acc: 0.7354
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.4769 - acc: 0.8499
Epoch 3/6
40000/40000 [==============================] - 11s
- loss: 0.4020 - acc: 0.8743
Epoch 4/6
40000/40000 [==============================] - 11s
- loss: 0.3551 - acc: 0.8905
Epoch 5/6
40000/40000 [==============================] - 11s
- loss: 0.3256 - acc: 0.8993
Epoch 6/6
40000/40000 [==============================] - 11s
- loss: 0.3005 - acc: 0.9067
39520/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 12s
- loss: 0.8505 - acc: 0.7123
Epoch 2/6
40000/40000 [==============================] - 10s
- loss: 0.5156 - acc: 0.8321
Epoch 3/6
40000/40000 [==============================] - 11s
- loss: 0.4208 - acc: 0.8660
Epoch 4/6
40000/40000 [==============================] - 11s
- loss: 0.3614 - acc: 0.8854
Epoch 5/6
40000/40000 [==============================] - 11s
- loss: 0.3258 - acc: 0.8980
Epoch 6/6
40000/40000 [==============================] - 11s
- loss: 0.3044 - acc: 0.9046
39936/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 12s
- loss: 0.7670 - acc: 0.7494
Epoch 2/6
40000/40000 [==============================] - 11s
- loss: 0.4593 - acc: 0.8574
Epoch 3/6
40000/40000 [==============================] -
ETA: 0s - loss: 0.3896 - acc: 0.880 - 11s - loss:
0.3898 - acc: 0.8799
Epoch 4/6
40000/40000 [==============================] - 10s
- loss: 0.3514 - acc: 0.8907
Epoch 5/6
40000/40000 [==============================] - 10s
- loss: 0.3124 - acc: 0.9020
Epoch 6/6
40000/40000 [==============================] - 11s
- loss: 0.2981 - acc: 0.9097
39680/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 12s
- loss: 0.5547 - acc: 0.8239
Epoch 2/3
40000/40000 [==============================] - 11s
- loss: 0.2752 - acc: 0.9204
Epoch 3/3
40000/40000 [==============================] - 11s
- loss: 0.2183 - acc: 0.9359
39520/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 12s
- loss: 0.5718 - acc: 0.8172
Epoch 2/3
40000/40000 [==============================] - 11s
- loss: 0.3141 - acc: 0.9054
Epoch 3/3
40000/40000 [==============================] - 11s
- loss: 0.2536 - acc: 0.9247
39680/40000 [============================>.] -
ETA: 0sEpoch 1/3
40000/40000 [==============================] - 12s
- loss: 0.5111 - acc: 0.8399
Epoch 2/3
40000/40000 [==============================] - 11s
- loss: 0.2469 - acc: 0.9270
Epoch 3/3
40000/40000 [==============================] - 11s
- loss: 0.1992 - acc: 0.9422
20000/20000 [==============================] - 2s
40000/40000 [==============================] - 4s
Epoch 1/6
40000/40000 [==============================] - 12s
- loss: 0.6041 - acc: 0.8066
Epoch 2/6
40000/40000 [==============================] - 11s
- loss: 0.2951 - acc: 0.9132
Epoch 3/6
40000/40000 [==============================] - 11s
- loss: 0.2343 - acc: 0.9315
Epoch 4/6
40000/40000 [==============================] - 11s
- loss: 0.1995 - acc: 0.9418
Epoch 5/6
40000/40000 [==============================] - 11s
- loss: 0.1779 - acc: 0.9487
Epoch 6/6
40000/40000 [==============================] - 11s
- loss: 0.1612 - acc: 0.9540
39680/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 12s
- loss: 0.6137 - acc: 0.8069
Epoch 2/6
40000/40000 [==============================] - 11s
- loss: 0.3075 - acc: 0.9096
Epoch 3/6
40000/40000 [==============================] - 11s
- loss: 0.2309 - acc: 0.9325
Epoch 4/6
40000/40000 [==============================] - 11s
- loss: 0.1935 - acc: 0.9443
Epoch 5/6
40000/40000 [==============================] - 11s
- loss: 0.1679 - acc: 0.9518
Epoch 6/6
40000/40000 [==============================] - 11s
- loss: 0.1576 - acc: 0.9551
39680/40000 [============================>.] -
ETA: 0sEpoch 1/6
40000/40000 [==============================] - 12s
- loss: 0.5143 - acc: 0.8400
Epoch 2/6
40000/40000 [==============================] - 11s
- loss: 0.2743 - acc: 0.9205
Epoch 3/6
40000/40000 [==============================] - 11s
- loss: 0.2248 - acc: 0.9350
Epoch 4/6
40000/40000 [==============================] - 11s
- loss: 0.1964 - acc: 0.9428
Epoch 5/6
40000/40000 [==============================] - 11s
- loss: 0.1736 - acc: 0.9496
Epoch 6/6
40000/40000 [==============================] - 11s
- loss: 0.1643 - acc: 0.9521
39840/40000 [============================>.] -
ETA: 0sEpoch 1/6
60000/60000 [==============================] - 18s
- loss: 0.4674 - acc: 0.8567
Epoch 2/6
60000/60000 [==============================] - 16s
- loss: 0.2417 - acc: 0.9293
Epoch 3/6
60000/60000 [==============================] - 16s
- loss: 0.1966 - acc: 0.9428
Epoch 4/6
60000/60000 [==============================] - 17s
- loss: 0.1695 - acc: 0.9519
Epoch 5/6
60000/60000 [==============================] - 16s
- loss: 0.1504 - acc: 0.9571
Epoch 6/6
60000/60000 [==============================] - 15s
- loss: 0.1393 - acc: 0.9597

GridSearchCV(cv=None, error_score='raise',
estimator=
<[Link].scikit_learn.KerasClassifier
object at 0x7f434a86ce48>,
fit_params={}, iid=True, n_jobs=1,
param_grid={'filters': [8], 'pool_size':
[2], 'epochs': [3, 6], 'dense_layer_sizes': [[32],
[64], [32, 32], [64, 64]], 'kernel_size': [3]},
pre_dispatch='2*n_jobs', refit=True,
return_train_score=True,
scoring='neg_log_loss', verbose=0)

print('The parameters of the best model are: ')


print(validator.best_params_)

# validator.best_estimator_ returns sklearn-


wrapped version of best model.
# validator.best_estimator_.model returns the
(unwrapped) keras model
best_model = validator.best_estimator_.model
metric_names = best_model.metrics_names
metric_values = best_model.evaluate(X_test,
y_test)
for metric, value in zip(metric_names,
metric_values):
print(metric, ': ', value)

The parameters of the best model are:


{'filters': 8, 'pool_size': 2, 'epochs': 6,
'dense_layer_sizes': [64, 64], 'kernel_size': 3}
9920/10000 [============================>.] -
ETA: 0sloss : 0.0577878101223
acc : 0.9822
There's more:
The GridSearchCV model in scikit-learn performs a complete
search, considering all the possible combinations of Hyper-
parameters we want to optimise.
If we want to apply for an optmised and bounded search in the
hyper-parameter space, I strongly suggest to take a look at:
Keras + hyperopt == hyperas :
[Link]
Transfer Learning and Fine Tuning
Train a simple convnet on the MNIST dataset the first 5
digits [0..4].
Freeze convolutional layers and fine-tune dense layers for
the classification of digits [5..9].

Using GPU (highly recommended)


-> If using theano backend:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32

import numpy as np
import datetime

[Link](1337) # for reproducibility

from [Link] import mnist


from [Link] import Sequential
from [Link] import Dense, Dropout,
Activation, Flatten
from [Link] import Convolution2D,
MaxPooling2D
from [Link] import np_utils
from keras import backend as K
from numpy import nan

now = [Link]
Using TensorFlow backend.

Settings

now = [Link]

batch_size = 128
nb_classes = 5
nb_epoch = 5

# input image dimensions


img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = 2
# convolution kernel size
kernel_size = 3

if K.image_data_format() == 'channels_first':
input_shape = (1, img_rows, img_cols)
else:
input_shape = (img_rows, img_cols, 1)

def train_model(model, train, test, nb_classes):

X_train =
train[0].reshape((train[0].shape[0],) +
input_shape)
X_test = test[0].reshape((test[0].shape[0],) +
input_shape)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

print('X_train shape:', X_train.shape)


print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class


matrices
Y_train = np_utils.to_categorical(train[1],
nb_classes)
Y_test = np_utils.to_categorical(test[1],
nb_classes)

[Link](loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])

t = now()
[Link](X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(X_test, Y_test))
print('Training time: %s' % (now() - t))
score = [Link](X_test, Y_test,
verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
Dataset Preparation

# the data, shuffled and split between train and


test sets
(X_train, y_train), (X_test, y_test) =
mnist.load_data()

# create two datasets one with digits below 5 and


one with 5 and above
X_train_lt5 = X_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
X_test_lt5 = X_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

X_train_gte5 = X_train[y_train >= 5]


y_train_gte5 = y_train[y_train >= 5] - 5 # make
classes start at 0 for
X_test_gte5 = X_test[y_test >= 5] #
np_utils.to_categorical
y_test_gte5 = y_test[y_test >= 5] - 5

# define two groups of layers: feature


(convolutions) and classification (dense)
feature_layers = [
Convolution2D(nb_filters, kernel_size,
kernel_size,
border_mode='valid',
input_shape=input_shape),
Activation('relu'),
Convolution2D(nb_filters, kernel_size,
kernel_size),
Activation('relu'),
MaxPooling2D(pool_size=(pool_size,
pool_size)),
Dropout(0.25),
Flatten(),
]
classification_layers = [
Dense(128),
Activation('relu'),
Dropout(0.5),
Dense(nb_classes),
Activation('softmax')
]

# create complete model


model = Sequential(feature_layers +
classification_layers)

# train model for 5-digit classification [0..4]


train_model(model,
(X_train_lt5, y_train_lt5),
(X_test_lt5, y_test_lt5), nb_classes)

X_train shape: (30596, 1, 28, 28)


30596 train samples
5139 test samples
Train on 30596 samples, validate on 5139 samples
Epoch 1/5
30596/30596 [==============================] - 3s
- loss: 0.2071 - acc: 0.9362 - val_loss: 0.0476 -
val_acc: 0.9848
Epoch 2/5
30596/30596 [==============================] - 3s
- loss: 0.0787 - acc: 0.9774 - val_loss: 0.0370 -
val_acc: 0.9879
Epoch 3/5
30596/30596 [==============================] - 3s
- loss: 0.0528 - acc: 0.9846 - val_loss: 0.0195 -
val_acc: 0.9926
Epoch 4/5
30596/30596 [==============================] - 3s
- loss: 0.0409 - acc: 0.9880 - val_loss: 0.0152 -
val_acc: 0.9942
Epoch 5/5
30596/30596 [==============================] - 3s
- loss: 0.0336 - acc: 0.9901 - val_loss: 0.0135 -
val_acc: 0.9959
Training time: [Link].094398
Test score: 0.0135238260214
Test accuracy: 0.995913601868

# freeze feature layers and rebuild model


for l in feature_layers:
[Link] = False

# transfer: train dense layers for new


classification task [5..9]
train_model(model,
(X_train_gte5, y_train_gte5),
(X_test_gte5, y_test_gte5),
nb_classes)

X_train shape: (29404, 1, 28, 28)


29404 train samples
4861 test samples
Train on 29404 samples, validate on 4861 samples
Epoch 1/5
29404/29404 [==============================] - 1s
- loss: 0.3810 - acc: 0.8846 - val_loss: 0.0897 -
val_acc: 0.9728
Epoch 2/5
29404/29404 [==============================] - 1s
- loss: 0.1245 - acc: 0.9607 - val_loss: 0.0596 -
val_acc: 0.9825
Epoch 3/5
29404/29404 [==============================] - 1s
- loss: 0.0927 - acc: 0.9714 - val_loss: 0.0467 -
val_acc: 0.9860
Epoch 4/5
29404/29404 [==============================] - 1s
- loss: 0.0798 - acc: 0.9755 - val_loss: 0.0408 -
val_acc: 0.9868
Epoch 5/5
29404/29404 [==============================] - 1s
- loss: 0.0704 - acc: 0.9783 - val_loss: 0.0353 -
val_acc: 0.9887
Training time: [Link].964140
Test score: 0.0352752654647
Test accuracy: 0.988685455557
Your Turn
Try to Fine Tune a VGG16 Network

## your code here

...
...
# Plugging new Layers
[Link](Dense(768, activation='sigmoid'))
[Link](Dropout(0.0))
[Link](Dense(768, activation='sigmoid'))
[Link](Dropout(0.0))
[Link](Dense(n_labels,
activation='softmax'))
Tight Integration

import tensorflow as tf

tf.__version__

'1.1.0'

from [Link] import keras

Tensorboard Integration

from [Link] import cifar100

(X_train, Y_train), (X_test, Y_test) =


cifar100.load_data(label_mode='fine')

Using TensorFlow backend.

from keras import backend as K

img_rows, img_cols = 32, 32


if K.image_data_format() == 'channels_first':
shape_ord = (3, img_rows, img_cols)
else: # channel_last
shape_ord = (img_rows, img_cols, 3)

shape_ord

(32, 32, 3)

X_train.shape

(50000, 32, 32, 3)

import numpy as np
nb_classes = len([Link](Y_train))

from [Link] import vgg16


from [Link] import Input

vgg16_model = vgg16.VGG16(weights='imagenet',
include_top=False,

input_tensor=Input(shape_ord))
vgg16_model.summary()
__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
input_1 (InputLayer) (None, 32, 32, 3)
0
__________________________________________________
_______________
block1_conv1 (Conv2D) (None, 32, 32, 64)
1792
__________________________________________________
_______________
block1_conv2 (Conv2D) (None, 32, 32, 64)
36928
__________________________________________________
_______________
block1_pool (MaxPooling2D) (None, 16, 16, 64)
0
__________________________________________________
_______________
block2_conv1 (Conv2D) (None, 16, 16, 128)
73856
__________________________________________________
_______________
block2_conv2 (Conv2D) (None, 16, 16, 128)
147584
__________________________________________________
_______________
block2_pool (MaxPooling2D) (None, 8, 8, 128)
0
__________________________________________________
_______________
block3_conv1 (Conv2D) (None, 8, 8, 256)
295168
__________________________________________________
_______________
block3_conv2 (Conv2D) (None, 8, 8, 256)
590080
__________________________________________________
_______________
block3_conv3 (Conv2D) (None, 8, 8, 256)
590080
__________________________________________________
_______________
block3_pool (MaxPooling2D) (None, 4, 4, 256)
0
__________________________________________________
_______________
block4_conv1 (Conv2D) (None, 4, 4, 512)
1180160
__________________________________________________
_______________
block4_conv2 (Conv2D) (None, 4, 4, 512)
2359808
__________________________________________________
_______________
block4_conv3 (Conv2D) (None, 4, 4, 512)
2359808
__________________________________________________
_______________
block4_pool (MaxPooling2D) (None, 2, 2, 512)
0
__________________________________________________
_______________
block5_conv1 (Conv2D) (None, 2, 2, 512)
2359808
__________________________________________________
_______________
block5_conv2 (Conv2D) (None, 2, 2, 512)
2359808
__________________________________________________
_______________
block5_conv3 (Conv2D) (None, 2, 2, 512)
2359808
__________________________________________________
_______________
block5_pool (MaxPooling2D) (None, 1, 1, 512)
0
==================================================
===============
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
__________________________________________________
_______________

for layer in vgg16_model.layers:


[Link] = False # freeze layer

from [Link] import Dense, Dropout,


Flatten
from [Link] import
BatchNormalization

x = Flatten(input_shape=vgg16_model.[Link])
(vgg16_model.output)
x = Dense(4096, activation='relu', name='ft_fc1')
(x)
x = Dropout(0.5)(x)
x = BatchNormalization()(x)
predictions = Dense(nb_classes, activation =
'softmax')(x)
from [Link] import Model

#create graph of your new model


model = Model(inputs=vgg16_model.input,
outputs=predictions)

#compile the model


[Link](optimizer='rmsprop',
loss='categorical_crossentropy', metrics=
['accuracy'])

[Link]()

__________________________________________________
_______________
Layer (type) Output Shape
Param #
==================================================
===============
input_1 (InputLayer) (None, 32, 32, 3)
0
__________________________________________________
_______________
block1_conv1 (Conv2D) (None, 32, 32, 64)
1792
__________________________________________________
_______________
block1_conv2 (Conv2D) (None, 32, 32, 64)
36928
__________________________________________________
_______________
block1_pool (MaxPooling2D) (None, 16, 16, 64)
0
__________________________________________________
_______________
block2_conv1 (Conv2D) (None, 16, 16, 128)
73856
__________________________________________________
_______________
block2_conv2 (Conv2D) (None, 16, 16, 128)
147584
__________________________________________________
_______________
block2_pool (MaxPooling2D) (None, 8, 8, 128)
0
__________________________________________________
_______________
block3_conv1 (Conv2D) (None, 8, 8, 256)
295168
__________________________________________________
_______________
block3_conv2 (Conv2D) (None, 8, 8, 256)
590080
__________________________________________________
_______________
block3_conv3 (Conv2D) (None, 8, 8, 256)
590080
__________________________________________________
_______________
block3_pool (MaxPooling2D) (None, 4, 4, 256)
0
__________________________________________________
_______________
block4_conv1 (Conv2D) (None, 4, 4, 512)
1180160
__________________________________________________
_______________
block4_conv2 (Conv2D) (None, 4, 4, 512)
2359808
__________________________________________________
_______________
block4_conv3 (Conv2D) (None, 4, 4, 512)
2359808
__________________________________________________
_______________
block4_pool (MaxPooling2D) (None, 2, 2, 512)
0
__________________________________________________
_______________
block5_conv1 (Conv2D) (None, 2, 2, 512)
2359808
__________________________________________________
_______________
block5_conv2 (Conv2D) (None, 2, 2, 512)
2359808
__________________________________________________
_______________
block5_conv3 (Conv2D) (None, 2, 2, 512)
2359808
__________________________________________________
_______________
block5_pool (MaxPooling2D) (None, 1, 1, 512)
0
__________________________________________________
_______________
flatten_1 (Flatten) (None, 512)
0
__________________________________________________
_______________
ft_fc1 (Dense) (None, 4096)
2101248
__________________________________________________
_______________
dropout_1 (Dropout) (None, 4096)
0
__________________________________________________
_______________
batch_normalization_1 (Batch (None, 4096)
16384
__________________________________________________
_______________
dense_1 (Dense) (None, 100)
409700
==================================================
===============
Total params: 17,242,020
Trainable params: 2,519,140
Non-trainable params: 14,722,880
__________________________________________________
_______________

TensorBoard Callback

from [Link] import TensorBoard

# Arguments
log_dir: the path of the directory where to
save the log
files to be parsed by TensorBoard.
histogram_freq: frequency (in epochs) at which
to compute activation
and weight histograms for the layers of
the model. If set to 0,
histograms won't be computed. Validation
data (or split) must be
specified for histogram visualizations.
write_graph: whether to visualize the graph in
TensorBoard.
The log file can become quite large when
write_graph is set to True.
write_grads: whether to visualize gradient
histograms in TensorBoard.
`histogram_freq` must be greater than 0.
write_images: whether to write model weights
to visualize as
image in TensorBoard.
embeddings_freq: frequency (in epochs) at
which selected embedding
layers will be saved.
embeddings_layer_names: a list of names of
layers to keep eye on. If
None or empty list all the embedding layer
will be watched.
embeddings_metadata: a dictionary which maps
layer name to a file name
in which metadata for this embedding layer
is saved.

See the details about metadata files format. In case if the same
metadata file is used for all embedding layers, string can be
passed.

## one-hot Encoding of labels (1 to 100 classes)


from [Link] import np_utils
Y_train.shape

(50000, 1)
Y_train = np_utils.to_categorical(Y_train)

Y_train.shape

(50000, 100)

def generate_batches(X, Y, batch_size=128):


""""""
# Iterations has to go indefinitely
start = 0
while True:
yield (X[start:start+batch_size],
Y[start:start+batch_size])
start=batch_size

batch_size = 64
steps_per_epoch = [Link](X_train.shape[0] /
batch_size)
model.fit_generator(generate_batches(X_train,
Y_train, batch_size=batch_size),

steps_per_epoch=steps_per_epoch, epochs=20,
verbose=1,
callbacks=
[TensorBoard(log_dir='./tf_logs',
histogram_freq=10,

write_graph=True, write_images=True,

embeddings_freq=10,
embeddings_layer_names=['block1_conv2',

'block5_conv1',

'ft_fc1'],

embeddings_metadata=None)])

INFO:tensorflow:Summary name block1_conv1/kernel:0


is illegal; using block1_conv1/kernel_0 instead.
INFO:tensorflow:Summary name block1_conv1/bias:0
is illegal; using block1_conv1/bias_0 instead.
INFO:tensorflow:Summary name block1_conv2/kernel:0
is illegal; using block1_conv2/kernel_0 instead.
INFO:tensorflow:Summary name block1_conv2/bias:0
is illegal; using block1_conv2/bias_0 instead.
INFO:tensorflow:Summary name block2_conv1/kernel:0
is illegal; using block2_conv1/kernel_0 instead.
INFO:tensorflow:Summary name block2_conv1/bias:0
is illegal; using block2_conv1/bias_0 instead.
INFO:tensorflow:Summary name block2_conv2/kernel:0
is illegal; using block2_conv2/kernel_0 instead.
INFO:tensorflow:Summary name block2_conv2/bias:0
is illegal; using block2_conv2/bias_0 instead.
INFO:tensorflow:Summary name block3_conv1/kernel:0
is illegal; using block3_conv1/kernel_0 instead.
INFO:tensorflow:Summary name block3_conv1/bias:0
is illegal; using block3_conv1/bias_0 instead.
INFO:tensorflow:Summary name block3_conv2/kernel:0
is illegal; using block3_conv2/kernel_0 instead.
INFO:tensorflow:Summary name block3_conv2/bias:0
is illegal; using block3_conv2/bias_0 instead.
INFO:tensorflow:Summary name block3_conv3/kernel:0
is illegal; using block3_conv3/kernel_0 instead.
INFO:tensorflow:Summary name block3_conv3/bias:0
is illegal; using block3_conv3/bias_0 instead.
INFO:tensorflow:Summary name block4_conv1/kernel:0
is illegal; using block4_conv1/kernel_0 instead.
INFO:tensorflow:Summary name block4_conv1/bias:0
is illegal; using block4_conv1/bias_0 instead.
INFO:tensorflow:Summary name block4_conv2/kernel:0
is illegal; using block4_conv2/kernel_0 instead.
INFO:tensorflow:Summary name block4_conv2/bias:0
is illegal; using block4_conv2/bias_0 instead.
INFO:tensorflow:Summary name block4_conv3/kernel:0
is illegal; using block4_conv3/kernel_0 instead.
INFO:tensorflow:Summary name block4_conv3/bias:0
is illegal; using block4_conv3/bias_0 instead.
INFO:tensorflow:Summary name block5_conv1/kernel:0
is illegal; using block5_conv1/kernel_0 instead.
INFO:tensorflow:Summary name block5_conv1/bias:0
is illegal; using block5_conv1/bias_0 instead.
INFO:tensorflow:Summary name block5_conv2/kernel:0
is illegal; using block5_conv2/kernel_0 instead.
INFO:tensorflow:Summary name block5_conv2/bias:0
is illegal; using block5_conv2/bias_0 instead.
INFO:tensorflow:Summary name block5_conv3/kernel:0
is illegal; using block5_conv3/kernel_0 instead.
INFO:tensorflow:Summary name block5_conv3/bias:0
is illegal; using block5_conv3/bias_0 instead.
INFO:tensorflow:Summary name ft_fc1/kernel:0 is
illegal; using ft_fc1/kernel_0 instead.
INFO:tensorflow:Summary name ft_fc1/bias:0 is
illegal; using ft_fc1/bias_0 instead.
INFO:tensorflow:Summary name
batch_normalization_1/gamma:0 is illegal; using
batch_normalization_1/gamma_0 instead.
INFO:tensorflow:Summary name
batch_normalization_1/beta:0 is illegal; using
batch_normalization_1/beta_0 instead.
INFO:tensorflow:Summary name
batch_normalization_1/moving_mean:0 is illegal;
using batch_normalization_1/moving_mean_0 instead.
INFO:tensorflow:Summary name
batch_normalization_1/moving_variance:0 is
illegal; using
batch_normalization_1/moving_variance_0 instead.
INFO:tensorflow:Summary name dense_1/kernel:0 is
illegal; using dense_1/kernel_0 instead.
INFO:tensorflow:Summary name dense_1/bias:0 is
illegal; using dense_1/bias_0 instead.
Epoch 1/20
781/781 [==============================] - 49s -
loss: 0.0161 - acc: 0.9974
Epoch 2/20
781/781 [==============================] - 48s -
loss: 1.1923e-07 - acc: 1.0000
Epoch 3/20
781/781 [==============================] - 47s -
loss: 1.1922e-07 - acc: 1.0000 - ETA:
Epoch 4/20
781/781 [==============================] - 47s -
loss: 1.1922e-07 - acc: 1.0000
Epoch 5/20
781/781 [==============================] - 48s -
loss: 1.1922e-07 - acc: 1.0000
Epoch 6/20
781/781 [==============================] - 48s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 7/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 8/20
781/781 [==============================] - 48s -
loss: 1.1922e-07 - acc: 1.0000
Epoch 9/20
781/781 [==============================] - 48s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 10/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000 - ET
Epoch 11/20
781/781 [==============================] - 48s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 12/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 13/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 14/20
781/781 [==============================] - 48s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 15/20
781/781 [==============================] - 46s -
loss: 1.1921e-07 - acc: 1.0000 - ETA: 0s -
loss: 1.1921e-07 - acc:
Epoch 16/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 17/20
781/781 [==============================] - ETA: 0s
- loss: 1.1921e-07 - acc: 1.000 - 47s - loss:
1.1921e-07 - acc: 1.0000
Epoch 18/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 19/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
Epoch 20/20
781/781 [==============================] - 47s -
loss: 1.1921e-07 - acc: 1.0000
<[Link] at 0x7fdb8f8f2be0>

Runing Tensorboard

%%bash
python -m [Link] --
logdir=./tf_logs

[Link] integration with Keras


Source:
[Link]
855b29

import operator
import threading
from functools import reduce

import keras
import [Link] as K
from [Link] import Model
import numpy as np
import tensorflow as tf
import time
from [Link] import Conv2D
from tqdm import tqdm

Using TensorFlow backend.


def prod(factors):
return reduce([Link], factors, 1)

TRAINING = True
with K.get_session() as sess:
shp = [10, 200, 200, 3]
shp1 = [10, 7, 7, 80]
inp = [Link](shp)
inp1 = [Link](shp1)
queue = [Link](20, [tf.float32,
tf.float32], [shp, shp1])
x1, y1 = [Link]()
enqueue = [Link]([inp, inp1])
model = [Link].ResNet50(False,
"imagenet", x1, shp[1:])
for i in range(3):
[Link]()
[Link][-1].outbound_nodes = []
[Link] = [[Link][-1].output]
output = [Link][0] # 7x7
# Reduce filter size to avoid OOM
output = Conv2D(32, (1, 1), padding="same",
activation='relu')(output)
output3 = Conv2D(5 * (4 + 11 + 1), (1, 1),
padding="same", activation='relu')(
output) # YOLO output B (4 + nb_class +1)
cost = tf.reduce_sum([Link](output3 - y1))
optimizer =
[Link](0.001).minimize(cost)
[Link](tf.global_variables_initializer())

def get_input():
# Super long processing I/O bla bla bla
return
[Link](prod(shp)).reshape(shp).astype(np.float3
2), [Link](prod(shp1)).reshape(shp1).astype(
np.float32)

def generate(coord, enqueue_op):


while not coord.should_stop():
inp_feed, inp1_feed = get_input()
[Link](enqueue_op, feed_dict={inp:
inp_feed, inp1: inp1_feed})

start = [Link]()
for i in tqdm(range(10)): # EPOCH
for j in range(30): # Batch
x,y = get_input()
optimizer_, s = [Link]([optimizer,
[Link]()],
feed_dict=
{x1:x,y1:y, K.learning_phase(): int(TRAINING)})
print("Took : ", [Link]() - start)

coordinator = [Link]()
threads = [[Link](target=generate,
args=(coordinator, enqueue)) for i in range(10)]
for t in threads:
[Link]()
start = [Link]()
for i in tqdm(range(10)): # EPOCH
for j in range(30): # Batch
optimizer_, s = [Link]([optimizer,
[Link]()],
feed_dict=
{K.learning_phase(): int(TRAINING)})
print("Took : ", [Link]() - start)
def clear_queue(queue, threads):
while any([t.is_alive() for t in
threads]):
_, s = [Link]([[Link](),
[Link]()])
print(s)

coordinator.request_stop()
clear_queue(queue, threads)

[Link](threads)
print("DONE Queue")
Unsupervised learning
AutoEncoders
An autoencoder, is an artificial neural network used for learning
efficient codings.
The aim of an autoencoder is to learn a representation
(encoding) for a set of data, typically for the purpose of
dimensionality reduction.

Unsupervised learning is a type of machine learning algorithm


used to draw inferences from datasets consisting of input data
without labeled responses. The most common unsupervised
learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in
data.

Reference
Based on [Link]
[Link]
Introducing Keras Functional API
The Keras functional API is the way to go for defining complex
models, such as multi-output models, directed acyclic graphs,
or models with shared layers.
All the Functional API relies on the fact that each
[Link] object is a callable object!

See 8.2 Multi-Modal Networks for further details.

from [Link] import Input, Dense


from [Link] import Model

from [Link] import mnist

import numpy as np

Using TensorFlow backend.

# this is the size of our encoded representations


encoding_dim = 32 # 32 floats -> compression of
factor 24.5, assuming the input is 784 floats

# this is our input placeholder


input_img = Input(shape=(784,))
# "encoded" is the encoded representation of the
input
encoded = Dense(encoding_dim, activation='relu')
(input_img)

# "decoded" is the lossy reconstruction of the


input
decoded = Dense(784, activation='sigmoid')
(encoded)

# this model maps an input to its reconstruction


autoencoder = Model(input_img, decoded)

# this model maps an input to its encoded


representation
encoder = Model(input_img, encoded)

# create a placeholder for an encoded (32-


dimensional) input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model
decoder_layer = [Link][-1]
# create the decoder model
decoder = Model(encoded_input,
decoder_layer(encoded_input))

[Link](optimizer='adadelta',
loss='binary_crossentropy')

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train),
[Link](x_train.shape[1:])))
x_test = x_test.reshape((len(x_test),
[Link](x_test.shape[1:])))

#note: x_train, x_train :)


[Link](x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples


Epoch 1/50
60000/60000 [==============================] - 1s
- loss: 0.3830 - val_loss: 0.2731
Epoch 2/50
60000/60000 [==============================] - 1s
- loss: 0.2664 - val_loss: 0.2561
Epoch 3/50
60000/60000 [==============================] - 1s
- loss: 0.2463 - val_loss: 0.2336
Epoch 4/50
60000/60000 [==============================] - 1s
- loss: 0.2258 - val_loss: 0.2156
Epoch 5/50
60000/60000 [==============================] - 1s
- loss: 0.2105 - val_loss: 0.2030
Epoch 6/50
60000/60000 [==============================] - 1s
- loss: 0.1997 - val_loss: 0.1936
Epoch 7/50
60000/60000 [==============================] - 1s
- loss: 0.1914 - val_loss: 0.1863
Epoch 8/50
60000/60000 [==============================] - 1s
- loss: 0.1846 - val_loss: 0.1800
Epoch 9/50
60000/60000 [==============================] - 1s
- loss: 0.1789 - val_loss: 0.1749
Epoch 10/50
60000/60000 [==============================] - 1s
- loss: 0.1740 - val_loss: 0.1702
Epoch 11/50
60000/60000 [==============================] - 1s
- loss: 0.1697 - val_loss: 0.1660
Epoch 12/50
60000/60000 [==============================] - 1s
- loss: 0.1657 - val_loss: 0.1622
Epoch 13/50
60000/60000 [==============================] - 1s
- loss: 0.1620 - val_loss: 0.1587
Epoch 14/50
60000/60000 [==============================] - 1s
- loss: 0.1586 - val_loss: 0.1554
Epoch 15/50
60000/60000 [==============================] - 1s
- loss: 0.1554 - val_loss: 0.1524
Epoch 16/50
60000/60000 [==============================] - 1s
- loss: 0.1525 - val_loss: 0.1495
Epoch 17/50
60000/60000 [==============================] - 1s
- loss: 0.1497 - val_loss: 0.1468
Epoch 18/50
60000/60000 [==============================] - 1s
- loss: 0.1470 - val_loss: 0.1441
Epoch 19/50
60000/60000 [==============================] - 1s
- loss: 0.1444 - val_loss: 0.1415
Epoch 20/50
60000/60000 [==============================] - 1s
- loss: 0.1419 - val_loss: 0.1391
Epoch 21/50
60000/60000 [==============================] - 1s
- loss: 0.1395 - val_loss: 0.1367
Epoch 22/50
60000/60000 [==============================] - 1s
- loss: 0.1371 - val_loss: 0.1345
Epoch 23/50
60000/60000 [==============================] - 1s
- loss: 0.1349 - val_loss: 0.1323ss: 0.13
Epoch 24/50
60000/60000 [==============================] - 1s
- loss: 0.1328 - val_loss: 0.1302
Epoch 25/50
60000/60000 [==============================] - 1s
- loss: 0.1308 - val_loss: 0.1283
Epoch 26/50
60000/60000 [==============================] - 1s
- loss: 0.1289 - val_loss: 0.1264
Epoch 27/50
60000/60000 [==============================] - 1s
- loss: 0.1271 - val_loss: 0.1247
Epoch 28/50
60000/60000 [==============================] - 1s
- loss: 0.1254 - val_loss: 0.1230
Epoch 29/50
60000/60000 [==============================] - 1s
- loss: 0.1238 - val_loss: 0.1215
Epoch 30/50
60000/60000 [==============================] - 1s
- loss: 0.1223 - val_loss: 0.1200
Epoch 31/50
60000/60000 [==============================] - 1s
- loss: 0.1208 - val_loss: 0.1186
Epoch 32/50
60000/60000 [==============================] - 1s
- loss: 0.1195 - val_loss: 0.1172
Epoch 33/50
60000/60000 [==============================] - 1s
- loss: 0.1182 - val_loss: 0.1160
Epoch 34/50
60000/60000 [==============================] - 1s
- loss: 0.1170 - val_loss: 0.1149
Epoch 35/50
60000/60000 [==============================] - 1s
- loss: 0.1158 - val_loss: 0.1137
Epoch 36/50
60000/60000 [==============================] - 1s
- loss: 0.1148 - val_loss: 0.1127
Epoch 37/50
60000/60000 [==============================] - 1s
- loss: 0.1138 - val_loss: 0.1117
Epoch 38/50
60000/60000 [==============================] - 1s
- loss: 0.1129 - val_loss: 0.1109
Epoch 39/50
60000/60000 [==============================] - 1s
- loss: 0.1120 - val_loss: 0.1100
Epoch 40/50
60000/60000 [==============================] - 1s
- loss: 0.1112 - val_loss: 0.1093
Epoch 41/50
60000/60000 [==============================] - 1s
- loss: 0.1105 - val_loss: 0.1085
Epoch 42/50
60000/60000 [==============================] - 1s
- loss: 0.1098 - val_loss: 0.1079
Epoch 43/50
60000/60000 [==============================] - 1s
- loss: 0.1092 - val_loss: 0.1072
Epoch 44/50
60000/60000 [==============================] - 1s
- loss: 0.1086 - val_loss: 0.1066
Epoch 45/50
60000/60000 [==============================] - 1s
- loss: 0.1080 - val_loss: 0.1061
Epoch 46/50
60000/60000 [==============================] - 1s
- loss: 0.1074 - val_loss: 0.1056
Epoch 47/50
60000/60000 [==============================] - 1s
- loss: 0.1069 - val_loss: 0.1051
Epoch 48/50
60000/60000 [==============================] - 1s
- loss: 0.1065 - val_loss: 0.1046
Epoch 49/50
60000/60000 [==============================] - 1s
- loss: 0.1060 - val_loss: 0.1042
Epoch 50/50
60000/60000 [==============================] - 1s
- loss: 0.1056 - val_loss: 0.1037

<[Link] at 0x7fd1ce5140f0>

Testing the Autoencoder

from matplotlib import pyplot as plt

%matplotlib inline

encoded_imgs = [Link](x_test)
decoded_imgs = [Link](encoded_imgs)

n = 10
[Link](figsize=(20, 4))
for i in range(n):
# original
ax = [Link](2, n, i + 1)
[Link](x_test[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# reconstruction
ax = [Link](2, n, i + 1 + n)
[Link](decoded_imgs[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
[Link]()

Sample generation with Autoencoder

encoded_imgs = [Link](10,32)
decoded_imgs = [Link](encoded_imgs)

n = 10
[Link](figsize=(20, 4))
for i in range(n):
# generation
ax = [Link](2, n, i + 1 + n)
[Link](decoded_imgs[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
[Link]()
Convolutional AutoEncoder
Since our inputs are images, it makes sense to use
convolutional neural networks ( convnets ) as encoders and
decoders.
In practical settings, autoencoders applied to images are
always convolutional autoencoders --they simply perform much
better.
The encoder will consist in a stack of Conv2D and
MaxPooling2D layers (max pooling being used for spatial
down-sampling), while the decoder will consist in a stack of
Conv2D and UpSampling2D layers.

from [Link] import Input, Dense, Conv2D,


MaxPooling2D, UpSampling2D
from [Link] import Model
from keras import backend as K

input_img = Input(shape=(28, 28, 1)) # adapt this


if using `channels_first` image data format

x = Conv2D(16, (3, 3), activation='relu',


padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu',
padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu',
padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8)
i.e. 128-dimensional

x = Conv2D(8, (3, 3), activation='relu',


padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu',
padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid',
padding='same')(x)

conv_autoencoder = Model(input_img, decoded)


conv_autoencoder.compile(optimizer='adadelta',
loss='binary_crossentropy')

from keras import backend as K

if K.image_data_format() == 'channels_last':
shape_ord = (28, 28, 1)
else:
shape_ord = (1, 28, 28)

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.

x_train = [Link](x_train, ((x_train.shape[0],)


+ shape_ord))
x_test = [Link](x_test, ((x_test.shape[0],) +
shape_ord))
x_train.shape

(60000, 28, 28, 1)

from [Link] import TensorBoard

batch_size=128
steps_per_epoch = [Link]([Link](x_train.shape[0]
/ batch_size))
conv_autoencoder.fit(x_train, x_train, epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test, x_test),
callbacks=
[TensorBoard(log_dir='./tf_autoencoder_logs')])

Train on 60000 samples, validate on 10000 samples


Epoch 1/50
60000/60000 [==============================] - 8s
- loss: 0.2327 - val_loss: 0.1740
Epoch 2/50
60000/60000 [==============================] - 7s
- loss: 0.1645 - val_loss: 0.1551
Epoch 3/50
60000/60000 [==============================] - 7s
- loss: 0.1501 - val_loss: 0.1442
Epoch 4/50
60000/60000 [==============================] - 7s
- loss: 0.1404 - val_loss: 0.1375
Epoch 5/50
60000/60000 [==============================] - 7s
- loss: 0.1342 - val_loss: 0.1316
Epoch 6/50
60000/60000 [==============================] - 7s
- loss: 0.1300 - val_loss: 0.1298
Epoch 7/50
60000/60000 [==============================] - 7s
- loss: 0.1272 - val_loss: 0.1301
Epoch 8/50
60000/60000 [==============================] - 7s
- loss: 0.1243 - val_loss: 0.1221
Epoch 9/50
60000/60000 [==============================] - 7s
- loss: 0.1222 - val_loss: 0.1196
Epoch 10/50
60000/60000 [==============================] - 7s
- loss: 0.1207 - val_loss: 0.1184
Epoch 11/50
60000/60000 [==============================] - 7s
- loss: 0.1188 - val_loss: 0.1162
Epoch 12/50
60000/60000 [==============================] - 7s
- loss: 0.1175 - val_loss: 0.1160
Epoch 13/50
60000/60000 [==============================] - 7s
- loss: 0.1167 - val_loss: 0.1164
Epoch 14/50
60000/60000 [==============================] - 7s
- loss: 0.1154 - val_loss: 0.1160
Epoch 15/50
60000/60000 [==============================] - 7s
- loss: 0.1145 - val_loss: 0.1159
Epoch 16/50
60000/60000 [==============================] - 7s
- loss: 0.1132 - val_loss: 0.1110
Epoch 17/50
60000/60000 [==============================] - 7s
- loss: 0.1127 - val_loss: 0.1108
Epoch 18/50
60000/60000 [==============================] - 7s
- loss: 0.1118 - val_loss: 0.1099
Epoch 19/50
60000/60000 [==============================] - 7s
- loss: 0.1113 - val_loss: 0.1106
Epoch 20/50
60000/60000 [==============================] - 7s
- loss: 0.1108 - val_loss: 0.1120
Epoch 21/50
60000/60000 [==============================] - 7s
- loss: 0.1104 - val_loss: 0.1064
Epoch 22/50
60000/60000 [==============================] - 7s
- loss: 0.1094 - val_loss: 0.1075
Epoch 23/50
60000/60000 [==============================] - 7s
- loss: 0.1088 - val_loss: 0.1088
Epoch 24/50
60000/60000 [==============================] - 7s
- loss: 0.1085 - val_loss: 0.1071
Epoch 25/50
60000/60000 [==============================] - 7s
- loss: 0.1081 - val_loss: 0.1060
Epoch 26/50
60000/60000 [==============================] - 7s
- loss: 0.1075 - val_loss: 0.1062
Epoch 27/50
60000/60000 [==============================] - 7s
- loss: 0.1074 - val_loss: 0.1062
Epoch 28/50
60000/60000 [==============================] - 7s
- loss: 0.1065 - val_loss: 0.1045
Epoch 29/50
60000/60000 [==============================] - 7s
- loss: 0.1062 - val_loss: 0.1043
Epoch 30/50
60000/60000 [==============================] - 7s
- loss: 0.1057 - val_loss: 0.1038
Epoch 31/50
60000/60000 [==============================] - 7s
- loss: 0.1053 - val_loss: 0.1040
Epoch 32/50
60000/60000 [==============================] - 7s
- loss: 0.1048 - val_loss: 0.1041
Epoch 33/50
60000/60000 [==============================] - 7s
- loss: 0.1045 - val_loss: 0.1057
Epoch 34/50
60000/60000 [==============================] - 7s
- loss: 0.1041 - val_loss: 0.1026
Epoch 35/50
60000/60000 [==============================] - 7s
- loss: 0.1041 - val_loss: 0.1042
Epoch 36/50
60000/60000 [==============================] - 7s
- loss: 0.1035 - val_loss: 0.1053
Epoch 37/50
60000/60000 [==============================] - 7s
- loss: 0.1032 - val_loss: 0.1006
Epoch 38/50
60000/60000 [==============================] - 7s
- loss: 0.1030 - val_loss: 0.1011
Epoch 39/50
60000/60000 [==============================] - 7s
- loss: 0.1028 - val_loss: 0.1013
Epoch 40/50
60000/60000 [==============================] - 7s
- loss: 0.1027 - val_loss: 0.1018
Epoch 41/50
60000/60000 [==============================] - 7s
- loss: 0.1025 - val_loss: 0.1019
Epoch 42/50
60000/60000 [==============================] - 7s
- loss: 0.1024 - val_loss: 0.1025
Epoch 43/50
60000/60000 [==============================] - 7s
- loss: 0.1020 - val_loss: 0.1015
Epoch 44/50
60000/60000 [==============================] - 7s
- loss: 0.1020 - val_loss: 0.1018
Epoch 45/50
60000/60000 [==============================] - 7s
- loss: 0.1015 - val_loss: 0.1011
Epoch 46/50
60000/60000 [==============================] - 7s
- loss: 0.1013 - val_loss: 0.0999
Epoch 47/50
60000/60000 [==============================] - 7s
- loss: 0.1010 - val_loss: 0.0995
Epoch 48/50
60000/60000 [==============================] - 7s
- loss: 0.1008 - val_loss: 0.0996
Epoch 49/50
60000/60000 [==============================] - 7s
- loss: 0.1008 - val_loss: 0.0990
Epoch 50/50
60000/60000 [==============================] - 7s
- loss: 0.1006 - val_loss: 0.0995

<[Link] at 0x7fd1bebacfd0>

decoded_imgs = conv_autoencoder.predict(x_test)
n = 10
[Link](figsize=(20, 4))
for i in range(n):
# display original
ax = [Link](2, n, i+1)
[Link](x_test[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# display reconstruction
ax = [Link](2, n, i + n + 1)
[Link](decoded_imgs[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
[Link]()

We coudl also have a look at the 128- dimensional encoded


middle representation

conv_encoder = Model(input_img, encoded)


encoded_imgs = conv_encoder.predict(x_test)

n = 10
[Link](figsize=(20, 8))
for i in range(n):
ax = [Link](1, n, i+1)
[Link](encoded_imgs[i].reshape(4, 4 *
8).T)
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
[Link]()
Pretraining encoders
One of the powerful tools of auto-encoders is using the encoder
to generate meaningful representation from the feature vectors.

# Use the encoder to pretrain a classifier


Application to Image Denoising
Let's put our convolutional autoencoder to work on an image
denoising problem. It's simple: we will train the autoencoder to
map noisy digits images to clean digits images.
Here's how we will generate synthetic noisy digits: we just apply
a gaussian noise matrix and clip the images between 0 and 1.

from [Link] import mnist


import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = [Link](x_train, (len(x_train), 28,
28, 1)) # adapt this if using `channels_first`
image data format
x_test = [Link](x_test, (len(x_test), 28, 28,
1)) # adapt this if using `channels_first` image
data format

noise_factor = 0.5
x_train_noisy = x_train + noise_factor *
[Link](loc=0.0, scale=1.0,
size=x_train.shape)
x_test_noisy = x_test + noise_factor *
[Link](loc=0.0, scale=1.0,
size=x_test.shape)

x_train_noisy = [Link](x_train_noisy, 0., 1.)


x_test_noisy = [Link](x_test_noisy, 0., 1.)
Using TensorFlow backend.

Here's how the noisy digits look like:

n = 10
[Link](figsize=(20, 2))
for i in range(n):
ax = [Link](1, n, i+1)
[Link](x_test_noisy[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
[Link]()

Question
If you squint you can still recognize them, but barely.
Can our autoencoder learn to recover the original digits?
Let's find out.
Compared to the previous convolutional autoencoder, in order
to improve the quality of the reconstructed, we'll use a slightly
different model with more filters per layer:

from [Link] import Input, Dense, Conv2D,


MaxPooling2D, UpSampling2D
from [Link] import Model
from [Link] import TensorBoard

Using TensorFlow backend.

input_img = Input(shape=(28, 28, 1)) # adapt this


if using `channels_first` image data format

x = Conv2D(32, (3, 3), activation='relu',


padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu',
padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (7, 7, 32)

x = Conv2D(32, (3, 3), activation='relu',


padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',
padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid',
padding='same')(x)

autoencoder = Model(input_img, decoded)


[Link](optimizer='adadelta',
loss='binary_crossentropy')

Let's train the AutoEncoder for 100 epochs


[Link](x_train_noisy, x_train,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy,
x_test),
callbacks=
[TensorBoard(log_dir='/tmp/autoencoder_denoise',

histogram_freq=0, write_graph=False)])

Train on 60000 samples, validate on 10000 samples


Epoch 1/100
60000/60000 [==============================] - 9s
- loss: 0.1901 - val_loss: 0.1255
Epoch 2/100
60000/60000 [==============================] - 8s
- loss: 0.1214 - val_loss: 0.1142
Epoch 3/100
60000/60000 [==============================] - 8s
- loss: 0.1135 - val_loss: 0.1085
Epoch 4/100
60000/60000 [==============================] - 8s
- loss: 0.1094 - val_loss: 0.1074
Epoch 5/100
60000/60000 [==============================] - 8s
- loss: 0.1071 - val_loss: 0.1052
Epoch 6/100
60000/60000 [==============================] - 8s
- loss: 0.1053 - val_loss: 0.1046
Epoch 7/100
60000/60000 [==============================] - 8s
- loss: 0.1040 - val_loss: 0.1020
Epoch 8/100
60000/60000 [==============================] - 8s
- loss: 0.1031 - val_loss: 0.1028
Epoch 9/100
60000/60000 [==============================] - 8s
- loss: 0.1023 - val_loss: 0.1009
Epoch 10/100
60000/60000 [==============================] - 8s
- loss: 0.1017 - val_loss: 0.1005
Epoch 11/100
60000/60000 [==============================] - 8s
- loss: 0.1009 - val_loss: 0.1003
Epoch 12/100
60000/60000 [==============================] - 8s
- loss: 0.1007 - val_loss: 0.1010
Epoch 13/100
60000/60000 [==============================] - 8s
- loss: 0.1002 - val_loss: 0.0989
Epoch 14/100
60000/60000 [==============================] - 8s
- loss: 0.1000 - val_loss: 0.0986
Epoch 15/100
60000/60000 [==============================] - 8s
- loss: 0.0998 - val_loss: 0.0983
Epoch 16/100
60000/60000 [==============================] - 8s
- loss: 0.0993 - val_loss: 0.0983
Epoch 17/100
60000/60000 [==============================] - 8s
- loss: 0.0991 - val_loss: 0.0979
Epoch 18/100
60000/60000 [==============================] - 8s
- loss: 0.0988 - val_loss: 0.0988
Epoch 19/100
60000/60000 [==============================] - 8s
- loss: 0.0986 - val_loss: 0.0976
Epoch 20/100
60000/60000 [==============================] - 8s
- loss: 0.0984 - val_loss: 0.0987
Epoch 21/100
60000/60000 [==============================] - 8s
- loss: 0.0983 - val_loss: 0.0973
Epoch 22/100
60000/60000 [==============================] - 8s
- loss: 0.0981 - val_loss: 0.0971
Epoch 23/100
60000/60000 [==============================] - 8s
- loss: 0.0979 - val_loss: 0.0978
Epoch 24/100
60000/60000 [==============================] - 8s
- loss: 0.0977 - val_loss: 0.0968
Epoch 25/100
60000/60000 [==============================] - 8s
- loss: 0.0975 - val_loss: 0.0976
Epoch 26/100
60000/60000 [==============================] - 8s
- loss: 0.0974 - val_loss: 0.0963
Epoch 27/100
60000/60000 [==============================] - 8s
- loss: 0.0973 - val_loss: 0.0963
Epoch 28/100
60000/60000 [==============================] - 8s
- loss: 0.0972 - val_loss: 0.0964
Epoch 29/100
60000/60000 [==============================] - 8s
- loss: 0.0970 - val_loss: 0.0961
Epoch 30/100
60000/60000 [==============================] - 8s
- loss: 0.0970 - val_loss: 0.0968
Epoch 31/100
60000/60000 [==============================] - 8s
- loss: 0.0969 - val_loss: 0.0959
Epoch 32/100
60000/60000 [==============================] - 8s
- loss: 0.0968 - val_loss: 0.0959
Epoch 33/100
60000/60000 [==============================] - 8s
- loss: 0.0967 - val_loss: 0.0957
Epoch 34/100
60000/60000 [==============================] - 8s
- loss: 0.0966 - val_loss: 0.0958
Epoch 35/100
60000/60000 [==============================] - 8s
- loss: 0.0965 - val_loss: 0.0956
Epoch 36/100
60000/60000 [==============================] - 8s
- loss: 0.0965 - val_loss: 0.0959
Epoch 37/100
60000/60000 [==============================] - 8s
- loss: 0.0964 - val_loss: 0.0963
Epoch 38/100
60000/60000 [==============================] - 8s
- loss: 0.0963 - val_loss: 0.0960
Epoch 39/100
60000/60000 [==============================] - 8s
- loss: 0.0963 - val_loss: 0.0963
Epoch 40/100
60000/60000 [==============================] - 8s
- loss: 0.0962 - val_loss: 0.0954
Epoch 41/100
60000/60000 [==============================] - 8s
- loss: 0.0961 - val_loss: 0.0955
Epoch 42/100
60000/60000 [==============================] - 8s
- loss: 0.0960 - val_loss: 0.0953
Epoch 43/100
60000/60000 [==============================] - 8s
- loss: 0.0960 - val_loss: 0.0952
Epoch 44/100
60000/60000 [==============================] - 8s
- loss: 0.0960 - val_loss: 0.0951
Epoch 45/100
60000/60000 [==============================] - 8s
- loss: 0.0959 - val_loss: 0.0951
Epoch 46/100
60000/60000 [==============================] - 8s
- loss: 0.0958 - val_loss: 0.0953
Epoch 47/100
60000/60000 [==============================] - 8s
- loss: 0.0957 - val_loss: 0.0952
Epoch 48/100
60000/60000 [==============================] - 8s
- loss: 0.0957 - val_loss: 0.0954
Epoch 49/100
60000/60000 [==============================] - 8s
- loss: 0.0957 - val_loss: 0.0954
Epoch 50/100
60000/60000 [==============================] - 8s
- loss: 0.0957 - val_loss: 0.0954
Epoch 51/100
60000/60000 [==============================] - 8s
- loss: 0.0955 - val_loss: 0.0948
Epoch 52/100
60000/60000 [==============================] - 8s
- loss: 0.0956 - val_loss: 0.0951
Epoch 53/100
60000/60000 [==============================] - 8s
- loss: 0.0955 - val_loss: 0.0951
Epoch 54/100
60000/60000 [==============================] - 8s
- loss: 0.0955 - val_loss: 0.0951
Epoch 55/100
60000/60000 [==============================] - 8s
- loss: 0.0955 - val_loss: 0.0948
Epoch 56/100
60000/60000 [==============================] - 8s
- loss: 0.0954 - val_loss: 0.0955
Epoch 57/100
60000/60000 [==============================] - 8s
- loss: 0.0954 - val_loss: 0.0950
Epoch 58/100
60000/60000 [==============================] - 8s
- loss: 0.0953 - val_loss: 0.0955
Epoch 59/100
60000/60000 [==============================] - 8s
- loss: 0.0952 - val_loss: 0.0947
Epoch 60/100
60000/60000 [==============================] - 8s
- loss: 0.0953 - val_loss: 0.0947
Epoch 61/100
60000/60000 [==============================] - 8s
- loss: 0.0952 - val_loss: 0.0947
Epoch 62/100
60000/60000 [==============================] - 8s
- loss: 0.0952 - val_loss: 0.0945
Epoch 63/100
60000/60000 [==============================] - 8s
- loss: 0.0952 - val_loss: 0.0945
Epoch 64/100
60000/60000 [==============================] - 8s
- loss: 0.0952 - val_loss: 0.0945
Epoch 65/100
60000/60000 [==============================] - 8s
- loss: 0.0950 - val_loss: 0.0954
Epoch 66/100
60000/60000 [==============================] - 8s
- loss: 0.0951 - val_loss: 0.0945
Epoch 67/100
60000/60000 [==============================] - 8s
- loss: 0.0951 - val_loss: 0.0946
Epoch 68/100
60000/60000 [==============================] - 8s
- loss: 0.0950 - val_loss: 0.0951
Epoch 69/100
60000/60000 [==============================] - 8s
- loss: 0.0950 - val_loss: 0.0952
Epoch 70/100
60000/60000 [==============================] - 8s
- loss: 0.0949 - val_loss: 0.0948
Epoch 71/100
60000/60000 [==============================] - 8s
- loss: 0.0949 - val_loss: 0.0958
Epoch 72/100
60000/60000 [==============================] - 8s
- loss: 0.0949 - val_loss: 0.0953
Epoch 73/100
60000/60000 [==============================] - 8s
- loss: 0.0949 - val_loss: 0.0942
Epoch 74/100
60000/60000 [==============================] - 8s
- loss: 0.0948 - val_loss: 0.0946
Epoch 75/100
60000/60000 [==============================] - 8s
- loss: 0.0948 - val_loss: 0.0942
Epoch 76/100
60000/60000 [==============================] - 8s
- loss: 0.0948 - val_loss: 0.0945
Epoch 77/100
60000/60000 [==============================] - 8s
- loss: 0.0948 - val_loss: 0.0944
Epoch 78/100
60000/60000 [==============================] - 8s
- loss: 0.0948 - val_loss: 0.0942
Epoch 79/100
60000/60000 [==============================] - 8s
- loss: 0.0947 - val_loss: 0.0944
Epoch 80/100
60000/60000 [==============================] - 8s
- loss: 0.0947 - val_loss: 0.0942
Epoch 81/100
60000/60000 [==============================] - 8s
- loss: 0.0946 - val_loss: 0.0943
Epoch 82/100
60000/60000 [==============================] - 8s
- loss: 0.0946 - val_loss: 0.0942
Epoch 83/100
60000/60000 [==============================] - 8s
- loss: 0.0946 - val_loss: 0.0941
Epoch 84/100
60000/60000 [==============================] - 8s
- loss: 0.0947 - val_loss: 0.0940
Epoch 85/100
60000/60000 [==============================] - 8s
- loss: 0.0946 - val_loss: 0.0941
Epoch 86/100
60000/60000 [==============================] - 8s
- loss: 0.0945 - val_loss: 0.0941
Epoch 87/100
60000/60000 [==============================] - 8s
- loss: 0.0946 - val_loss: 0.0945
Epoch 88/100
60000/60000 [==============================] - 8s
- loss: 0.0945 - val_loss: 0.0944
Epoch 89/100
60000/60000 [==============================] - 8s
- loss: 0.0945 - val_loss: 0.0944
Epoch 90/100
60000/60000 [==============================] - 8s
- loss: 0.0945 - val_loss: 0.0941
Epoch 91/100
60000/60000 [==============================] - 8s
- loss: 0.0945 - val_loss: 0.0939
Epoch 92/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0946
Epoch 93/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0941
Epoch 94/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0939
Epoch 95/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0941
Epoch 96/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0939
Epoch 97/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0939
Epoch 98/100
60000/60000 [==============================] - 8s
- loss: 0.0943 - val_loss: 0.0939
Epoch 99/100
60000/60000 [==============================] - 8s
- loss: 0.0944 - val_loss: 0.0941
Epoch 100/100
60000/60000 [==============================] - 8s
- loss: 0.0943 - val_loss: 0.0938

<[Link] at 0x7fb45ad95f28>

Now Let's Take a look....

decoded_imgs = [Link](x_test_noisy)

n = 10
[Link](figsize=(20, 4))
for i in range(n):
# display original
ax = [Link](2, n, i+1)
[Link](x_test[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# display reconstruction
ax = [Link](2, n, i + n + 1)
[Link](decoded_imgs[i].reshape(28, 28))
[Link]()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
[Link]()
Variational AutoEncoder
(Reference [Link]
[Link])
Variational autoencoders are a slightly more modern and
interesting take on autoencoding.

What is a variational autoencoder ?


It's a type of autoencoder with added constraints on the
encoded representations being learned.
More precisely, it is an autoencoder that learns a latent variable
model for its input data.
So instead of letting your neural network learn an arbitrary
function, you are learning the parameters of a probability
distribution modeling your data.
If you sample points from this distribution, you can generate
new input data samples: a VAE is a "generative model".

How does a variational autoencoder work?


First, an encoder network turns the input samples $x$ into two
parameters in a latent space, which we will note $z{\mu}$ and
$z{log_{\sigma}}$.
Then, we randomly sample similar points $z$ from the latent
normal distribution that is assumed to generate the data, via $z
= z{\mu} + \exp(z{log_{\sigma}}) * \epsilon$, where $\epsilon$ is
a random normal tensor.
Finally, a decoder network maps these latent space points back
to the original input data.
The parameters of the model are trained via two loss functions:
a reconstruction loss forcing the decoded samples to
match the initial inputs (just like in our previous
autoencoders);
and the KL divergence between the learned latent
distribution and the prior distribution, acting as a
regularization term.
You could actually get rid of this latter term entirely, although it
does help in learning well-formed latent spaces and reducing
overfitting to the training data.
Encoder Network

batch_size = 100
original_dim = 784
latent_dim = 2
intermediate_dim = 256
epochs = 50
epsilon_std = 1.0

x = Input(batch_shape=(batch_size, original_dim))
h = Dense(intermediate_dim, activation='relu')(x)
z_mean = Dense(latent_dim)(h)
z_log_sigma = Dense(latent_dim)(h)

We can use these parameters to sample new similar points


from the latent space:

from [Link] import Lambda


from keras import backend as K

def sampling(args):
z_mean, z_log_sigma = args
epsilon = K.random_normal(shape=(batch_size,
latent_dim),
mean=0.,
stddev=epsilon_std)
return z_mean + [Link](z_log_sigma) * epsilon

# note that "output_shape" isn't necessary with


the TensorFlow backend
# so you could write `Lambda(sampling)([z_mean,
z_log_sigma])`
z = Lambda(sampling, output_shape=(latent_dim,))
([z_mean, z_log_sigma])
Decoder Network
Finally, we can map these sampled latent points back to
reconstructed inputs:

decoder_h = Dense(intermediate_dim,
activation='relu')
decoder_mean = Dense(original_dim,
activation='sigmoid')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)

What we've done so far allows us to instantiate 3 models:


an end-to-end autoencoder mapping inputs to
reconstructions
an encoder mapping inputs to the latent space
a generator that can take points on the latent space and will
output the corresponding reconstructed samples.

# end-to-end autoencoder
vae = Model(x, x_decoded_mean)

# encoder, from inputs to latent space


encoder = Model(x, z_mean)

# generator, from latent space to reconstructed


inputs
decoder_input = Input(shape=(latent_dim,))
_h_decoded = decoder_h(decoder_input)
_x_decoded_mean = decoder_mean(_h_decoded)
generator = Model(decoder_input, _x_decoded_mean)
Let's Visualise the VAE Model

from [Link] import SVG


from [Link].vis_utils import model_to_dot

SVG(model_to_dot(vae).create(prog='dot',
format='svg'))

## Exercise: Let's Do the Same for `encoder` and


`generator` Model(s)

VAE on MNIST
We train the model using the end-to-end model, with a custom
loss function: the sum of a reconstruction term, and the KL
divergence regularization term.

from [Link] import binary_crossentropy

def vae_loss(x, x_decoded_mean):


xent_loss = binary_crossentropy(x,
x_decoded_mean)
kl_loss = - 0.5 * [Link](1 + z_log_sigma -
[Link](z_mean) - [Link](z_log_sigma), axis=-1)
return xent_loss + kl_loss

[Link](optimizer='rmsprop', loss=vae_loss)

Traing on MNIST Digits

from [Link] import mnist


import numpy as np

(x_train, y_train), (x_test, y_test) =


mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train),
[Link](x_train.shape[1:])))
x_test = x_test.reshape((len(x_test),
[Link](x_test.shape[1:])))
[Link](x_train, x_train,
shuffle=True,
epochs=epochs,
batch_size=batch_size,
validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples


Epoch 1/50
60000/60000 [==============================] - 3s
- loss: 0.2932 - val_loss: 0.2629
Epoch 2/50
60000/60000 [==============================] - 3s
- loss: 0.2631 - val_loss: 0.2628
Epoch 3/50
60000/60000 [==============================] - 3s
- loss: 0.2630 - val_loss: 0.2626
Epoch 4/50
60000/60000 [==============================] - 3s
- loss: 0.2630 - val_loss: 0.2629
Epoch 5/50
60000/60000 [==============================] - 3s
- loss: 0.2630 - val_loss: 0.2627
Epoch 6/50
60000/60000 [==============================] - 3s
- loss: 0.2630 - val_loss: 0.2627
Epoch 7/50
60000/60000 [==============================] - 3s
- loss: 0.2630 - val_loss: 0.2626
Epoch 8/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 9/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 10/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 11/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 12/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2625
Epoch 13/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 14/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 15/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 16/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 17/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 18/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 19/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 20/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 21/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2625
Epoch 22/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 23/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 24/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 25/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 26/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 27/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 28/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 29/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 30/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 31/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 32/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2625
Epoch 33/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 34/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 35/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 36/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2625
Epoch 37/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2625
Epoch 38/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 39/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 40/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 41/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 42/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 43/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 44/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 45/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 46/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 47/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626
Epoch 48/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2627
Epoch 49/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2625
Epoch 50/50
60000/60000 [==============================] - 3s
- loss: 0.2629 - val_loss: 0.2626

<[Link] at 0x7fb62fc26d30>

Because our latent space is two-dimensional, there are a few


cool visualizations that can be done at this point.
One is to look at the neighborhoods of different classes on the
latent 2D plane:

x_test_encoded = [Link](x_test,
batch_size=batch_size)

[Link](figsize=(6, 6))
[Link](x_test_encoded[:, 0],
x_test_encoded[:, 1], c=y_test)
[Link]()
[Link]()
Each of these colored clusters is a type of digit. Close clusters
are digits that are structurally similar (i.e. digits that share
information in the latent space).
Because the VAE is a generative model, we can also use it to
generate new digits! Here we will scan the latent plane,
sampling latent points at regular intervals, and generating the
corresponding digit for each of these points. This gives us a
visualization of the latent manifold that "generates" the MNIST
digits.

# display a 2D manifold of the digits


n = 15 # figure with 15x15 digits
digit_size = 28
figure = [Link]((digit_size * n, digit_size *
n))
# we will sample n points within [-15, 15]
standard deviations
grid_x = [Link](-15, 15, n)
grid_y = [Link](-15, 15, n)
for i, yi in enumerate(grid_x):
for j, xi in enumerate(grid_y):
z_sample = [Link]([[xi, yi]]) *
epsilon_std
x_decoded = [Link](z_sample)
digit = x_decoded[0].reshape(digit_size,
digit_size)
figure[i * digit_size: (i + 1) *
digit_size,
j * digit_size: (j + 1) *
digit_size] = digit

[Link](figsize=(10, 10))
[Link](figure)
[Link]()
Natural Language Processing using
Artificial Neural Networks
“In God we trust. All others must bring data.” – W. Edwards
Deming, statistician
Word Embeddings
What?
Convert words to vectors in a high dimensional space. Each
dimension denotes an aspect like gender, type of object / word.
"Word embeddings" are a family of natural language processing
techniques aiming at mapping semantic meaning into a
geometric space. This is done by associating a numeric vector
to every word in a dictionary, such that the distance (e.g. L2
distance or more commonly cosine distance) between any two
vectors would capture part of the semantic relationship between
the two associated words. The geometric space formed by
these vectors is called an embedding space.

Why?
By converting words to vectors we build relations between
words. More similar the words in a dimension, more closer their
scores are.

Example
W(green) = (1.2, 0.98, 0.05, ...)
W(red) = (1.1, 0.2, 0.5, ...)
Here the vector values of green and red are very similar in one
dimension because they both are colours. The value for second
dimension is very different because red might be depicting
something negative in the training data while green is used for
positiveness.
By vectorizing we are indirectly building different kind of
relations between words.
Example of word2vec using gensim

from [Link] import word2vec


from [Link].word2vec import Word2Vec

Using gpu device 0: GeForce GTX 760 (CNMeM is


enabled with initial size: 90.0% of memory, cuDNN
4007)

Reading blog post from data directory

import os
import pickle

DATA_DIRECTORY =
[Link]([Link]([Link]),
'..',
'data',
'word_embeddings')

male_posts = []
female_post = []

with
open([Link](DATA_DIRECTORY,"male_blog_list.t
xt"),"rb") as male_file:
male_posts= [Link](male_file)

with
open([Link](DATA_DIRECTORY,"female_blog_list
.txt"),"rb") as female_file:
female_posts = [Link](female_file)

print(len(female_posts))
print(len(male_posts))

2252
2611

filtered_male_posts = list(filter(lambda p: len(p)


> 0, male_posts))
filtered_female_posts = list(filter(lambda p:
len(p) > 0, female_posts))
posts = filtered_female_posts +
filtered_male_posts

print(len(filtered_female_posts),
len(filtered_male_posts), len(posts))

2247 2595 4842


Word2Vec

w2v = Word2Vec(size=200, min_count=1)


w2v.build_vocab(map(lambda x: [Link](),
posts[:100]), )

[Link]

{'see.': <[Link] at
0x7f61aa4f1908>,
'never.': <[Link] at
0x7f61aa4f1dd8>,
'driving': <[Link] at
0x7f61aa4f1e48>,
'buddy': <[Link] at
0x7f61aa4f0240>,
'DEFENSE': <[Link] at
0x7f61aa4f0438>,
'interval': <[Link] at
0x7f61aa4f04e0>,
'Right': <[Link] at
0x7f61aa4f06a0>,
'minds,': <[Link] at
0x7f61aa4f06d8>,
'earth.': <[Link] at
0x7f61aa4f0710>,
'pleasure': <[Link] at
0x7f61aa4f08d0>,
'school,': <[Link] at
0x7f61aa4f0cc0>,
'someone': <[Link] at
0x7f61aa4f0ef0>,
'dangit...': <[Link] at
0x7f61aa4f23c8>,
'one!': <[Link] at
0x7f61aa4f2c88>,
'hard.': <[Link] at
0x7f61aa4e25c0>,
'programs,': <[Link] at
0x7f61aa4e27b8>,
'SEEEENNNIIIOOORS!!!':
<[Link] at 0x7f61aa4e27f0>,
'two)': <[Link] at
0x7f61aa4e2828>,
"o'": <[Link] at
0x7f61aa4e28d0>,
'--': <[Link] at
0x7f61aa4e2a58>,
'this-actually': <[Link] at
0x7f61aa4e2b70>,
'swimming.': <[Link] at
0x7f61aa4e2c50>,
'people.': <[Link] at
0x7f61aa4e2cc0>,
'turn': <[Link] at
0x7f61aa4e2e48>,
'happened': <[Link] at
0x7f61aa4e2fd0>,
'clothing:': <[Link] at
0x7f61aa4e22e8>,
'it!': <[Link] at
0x7f61aa4e2048>,
'church': <[Link] at
0x7f61aa4e21d0>,
'boring.': <[Link] at
0x7f61aa4e2240>,
'freaky': <[Link] at
0x7f61aa4ea278>,
'Democrats,': <[Link] at
0x7f61aa4ea320>,
'*kick': <[Link] at
0x7f61aa4ea358>,
'"It': <[Link] at
0x7f61aa4ea550>,
'wet': <[Link] at
0x7f61aa4ea6d8>,
'snooze': <[Link] at
0x7f61aa4ea7b8>,
'points': <[Link] at
0x7f61aa4ea978>,
'Sen.': <[Link] at
0x7f61aa4ea9b0>,
'although': <[Link] at
0x7f61aa4eaac8>,
'Charlotte': <[Link] at
0x7f61aa4eab00>,
'lil...but': <[Link] at
0x7f61aa4eab38>,
'oneo': <[Link] at
0x7f61aa4eac50>,
'course;': <[Link] at
0x7f61aa4eada0>,
'Bring': <[Link] at
0x7f61aa4eadd8>,
'(compared': <[Link] at
0x7f61aa4eae48>,
'ugh.': <[Link] at
0x7f61aa4eaef0>,
'sit': <[Link] at
0x7f61aa553a20>,
'dipped?': <[Link] at
0x7f61aa4eafd0>,
'based': <[Link] at
0x7f61aa4ec978>,
'A.I.': <[Link] at
0x7f61aa4ec080>,
'breathing.': <[Link] at
0x7f61aa4ec128>,
'multi-millionaire':
<[Link] at 0x7f61aa4ec208>,
'groups': <[Link] at
0x7f61aa4ec278>,
'on': <[Link] at
0x7f61aa4ec2b0>,
'animals),': <[Link] at
0x7f61aa4d8630>,
'Manners?': <[Link] at
0x7f61aa4ec320>,
'you?]:': <[Link] at
0x7f61aa445f60>,
'redistribute': <[Link] at
0x7f61aa4dbba8>,
'omg.': <[Link] at
0x7f61aa4ec470>,
'dance?:': <[Link] at
0x7f61aa4ec4a8>,
'Canada)': <[Link] at
0x7f61aa553b00>,
'came': <[Link] at
0x7f61aa4ec550>,
'poof': <[Link] at
0x7f61aa4ec588>,
'brownies.': <[Link] at
0x7f61aa4ec630>,
'Not': <[Link] at
0x7f61aa4ec710>,
'spaces': <[Link] at
0x7f61aa4ec780>,
'destroy': <[Link] at
0x7f61aa4ec860>,
'maybe.': <[Link] at
0x7f61aa4ec898>,
'Industrial': <[Link] at
0x7f61aa4ec9e8>,
'boring': <[Link] at
0x7f61aa4ecb00>,
'is:': <[Link] at
0x7f61aa4ecd30>,
'question.': <[Link] at
0x7f61aa4ecd68>,
'long-lasting': <[Link] at
0x7f61aa4ecda0>,
'sun': <[Link] at
0x7f61aa5dc1d0>,
'CrAp*': <[Link] at
0x7f61aa4ed080>,
'irresistable': <[Link] at
0x7f61aa4ed0f0>,
'dont...i': <[Link] at
0x7f61aa4ed128>,
'loss.': <[Link] at
0x7f61aa4ed160>,
'easy': <[Link] at
0x7f61aa4ed2b0>,
'wanna': <[Link] at
0x7f61aa4635c0>,
'Gaviota': <[Link] at
0x7f61aa4ed4a8>,
'nose': <[Link] at
0x7f61aa4ed518>,
'slept': <[Link] at
0x7f61aa4ed5c0>,
'hahahahah': <[Link] at
0x7f61aa4ed5f8>,
'halloween': <[Link] at
0x7f61aa4ed630>,
'shes': <[Link] at
0x7f61aa553c50>,
'realize': <[Link] at
0x7f61aa4ed860>,
'twice': <[Link] at
0x7f61aa4ed908>,
'lift': <[Link] at
0x7f61aa4eda90>,
'china,': <[Link] at
0x7f61aa4edc88>,
'Standard.)': <[Link] at
0x7f61aa4edcc0>,
'worried': <[Link] at
0x7f61aa4edda0>,
'Opposite': <[Link] at
0x7f61aa4eddd8>,
'chin.': <[Link] at
0x7f61aa4edef0>,
'Garden': <[Link] at
0x7f61aa4ebcc0>,
'guy': <[Link] at
0x7f61aa4ebd68>,
'remmeber': <[Link] at
0x7f61aa4ebef0>,
'fence,': <[Link] at
0x7f61aa4eb128>,
'apologizing': <[Link] at
0x7f61aa4eb160>,
'next.': <[Link] at
0x7f61aa4eb2b0>,
'MATTERS': <[Link] at
0x7f61aa4eb2e8>,
'rugs': <[Link] at
0x7f61aa4eb320>,
'her...': <[Link] at
0x7f61aa4eb438>,
'energy,': <[Link] at
0x7f61aa4eb4a8>,
'recorded,': <[Link] at
0x7f61aa4eb588>,
'pepsi.': <[Link] at
0x7f61aa4eb710>,
'r': <[Link] at
0x7f61aa4eb860>,
'13': <[Link] at
0x7f61aa4eb898>,
'at:': <[Link] at
0x7f61aa5dc390>,
'cheaper': <[Link] at
0x7f61aa4ee9b0>,
'children!': <[Link] at
0x7f61aa5b6c88>,
'tree': <[Link] at
0x7f61aa4eecc0>,
'met': <[Link] at
0x7f61aa4eecf8>,
'one,': <[Link] at
0x7f61aa4eeda0>,
'rejected?': <[Link] at
0x7f61aa4eee48>,
'Marianne’s': <[Link] at
0x7f61aa4eee80>,
'Icenhower': <[Link] at
0x7f61aa4ee978>,
'day!': <[Link] at
0x7f61aa4ee1d0>,
'leaving': <[Link] at
0x7f61aa4ee240>,
'2110': <[Link] at
0x7f61aa4ee2b0>,
'kiss:': <[Link] at
0x7f61aa4ee748>,
'nearest': <[Link] at
0x7f61aa4ee780>,
'aimlessly': <[Link] at
0x7f61aa4ee7b8>,
'sprint': <[Link] at
0x7f61aa4ee898>,
'kids!)': <[Link] at
0x7f61aa536048>,
'canteen': <[Link] at
0x7f61aa5360f0>,
'weekend!': <[Link] at
0x7f61aa536160>,
'him': <[Link] at
0x7f61aa536198>,
'scariest': <[Link] at
0x7f61aa5361d0>,
'this?': <[Link] at
0x7f61aa536208>,
'"choosing': <[Link] at
0x7f61aa536240>,
'Talk': <[Link] at
0x7f61aa5362b0>,
'weeks': <[Link] at
0x7f61aa5362e8>,
"You'll": <[Link] at
0x7f61aa536390>,
'goodnight': <[Link] at
0x7f61aa5363c8>,
'skiing.': <[Link] at
0x7f61aa536438>,
'KeEp': <[Link] at
0x7f61aa5535f8>,
'week': <[Link] at
0x7f61aa536550>,
'norwegian': <[Link] at
0x7f61aa5366a0>,
'HAND:': <[Link] at
0x7f61aa553780>,
'fact,': <[Link] at
0x7f61aa5367f0>,
'thanksgiving': <[Link] at
0x7f61aa536828>,
'me..argh...': <[Link] at
0x7f61aa536860>,
'she': <[Link] at
0x7f61aa536898>,
'Tree': <[Link] at
0x7f61aa536908>,
'combat.': <[Link] at
0x7f61aa536940>,
'mitosis': <[Link] at
0x7f61aa536978>,
'offered': <[Link] at
0x7f61aa5369e8>,
'no..': <[Link] at
0x7f61aa536a20>,
'(there': <[Link] at
0x7f61aa536a58>,
'aspirations': <[Link] at
0x7f61aa536a90>,
'page': <[Link] at
0x7f61aa536ac8>,
'Least': <[Link] at
0x7f61aa536b38>,
'each': <[Link] at
0x7f61aa536b70>,
'ride...': <[Link] at
0x7f61aa536ba8>,
'doesn’t': <[Link] at
0x7f61aa536c18>,
'FUCK': <[Link] at
0x7f61aa536c50>,
'gona': <[Link] at
0x7f61aa536dd8>,
'window': <[Link] at
0x7f61aa536e10>,
'end': <[Link] at
0x7f61aa536e48>,
'expected': <[Link] at
0x7f61aa536eb8>,
'well.': <[Link] at
0x7f61aa536ef0>,
'called': <[Link] at
0x7f61aa460748>,
"needn't": <[Link] at
0x7f61aa536f28>,
'doesnt': <[Link] at
0x7f61aa536f60>,
'venturing': <[Link] at
0x7f61aa440a90>,
'alex': <[Link] at
0x7f61aa536fd0>,
'here:': <[Link] at
0x7f61aa53b048>,
'ewWw': <[Link] at
0x7f61aa53b0b8>,
'pole?': <[Link] at
0x7f61aa53b0f0>,
'melody,': <[Link] at
0x7f61aa5b6eb8>,
'motivated': <[Link] at
0x7f61aa53b128>,
'Well,': <[Link] at
0x7f61aa53b160>,
'says:': <[Link] at
0x7f61aa53b198>,
'worm': <[Link] at
0x7f61aa53b1d0>,
'[some': <[Link] at
0x7f61aa553f98>,
'name': <[Link] at
0x7f61aa53b320>,
'Leave"': <[Link] at
0x7f61aa53b358>,
'4th': <[Link] at
0x7f61aa404ba8>,
"It's...": <[Link] at
0x7f61aa53b390>,
'problem??': <[Link] at
0x7f61aa553fd0>,
'remember': <[Link] at
0x7f61aa53b470>,
'o': <[Link] at
0x7f61aa4e32b0>,
'letters.': <[Link] at
0x7f61aa53b4a8>,
'jean': <[Link] at
0x7f61aa53b4e0>,
'thing.': <[Link] at
0x7f61aa53b518>,
'friend?]:': <[Link] at
0x7f61aa53b588>,
'am!': <[Link] at
0x7f61aa53b5c0>,
'side...': <[Link] at
0x7f61aa53b6a0>,
'Yet': <[Link] at
0x7f61aa53b6d8>,
'easier': <[Link] at
0x7f61aa53b828>,
'babies': <[Link] at
0x7f61aa53b860>,
'You?': <[Link] at
0x7f61aa53b898>,
'wedding:': <[Link] at
0x7f61aa53b8d0>,
'2.)': <[Link] at
0x7f61aa53b908>,
'first...then': <[Link] at
0x7f61aa53b940>,
'LA:': <[Link] at
0x7f61aa53b978>,
'but,)': <[Link] at
0x7f61aa53b9b0>,
'not,': <[Link] at
0x7f61aa53ba20>,
'possession': <[Link] at
0x7f61aa53ba58>,
'its': <[Link] at
0x7f61aa53ba90>,
'stop': <[Link] at
0x7f61aa53bac8>,
'Thanks': <[Link] at
0x7f61aa53bb00>,
'durin': <[Link] at
0x7f61aa53bb38>,
'rings': <[Link] at
0x7f61aa53bb70>,
'Specifics': <[Link] at
0x7f61aa53bba8>,
'[Link]
uniqid=jm8bja2z': <[Link] at
0x7f61aa53bbe0>,
'lace': <[Link] at
0x7f61aa53bc18>,
'pretended': <[Link] at
0x7f61aa53bc50>,
'clothes': <[Link] at
0x7f61aa53bd30>,
'wong': <[Link] at
0x7f61aa53bd68>,
'38': <[Link] at
0x7f61aa5ce390>,
'country.': <[Link] at
0x7f61aa53bda0>,
'criticism': <[Link] at
0x7f61aa53bdd8>,
'NATIONAL': <[Link] at
0x7f61aa53be48>,
"that's": <[Link] at
0x7f61aa53beb8>,
'conclusively': <[Link] at
0x7f61aa53bef0>,
'cartoons,': <[Link] at
0x7f61aa53bf28>,
'chest/lungs': <[Link] at
0x7f61aa53bf60>,
'whilst': <[Link] at
0x7f61aa5dc7b8>,
"I'm,": <[Link] at
0x7f61aa3feb38>,
'Tata.': <[Link] at
0x7f61aa53bfd0>,
'mix': <[Link] at
0x7f61aa533160>,
'popularity': <[Link] at
0x7f61aa533390>,
'park)': <[Link] at
0x7f61aa5333c8>,
'(trampled': <[Link] at
0x7f61aa5336a0>,
'reminded': <[Link] at
0x7f61aa5339b0>,
'says.': <[Link] at
0x7f61aa533a58>,
'repetition,': <[Link] at
0x7f61aa533ac8>,
'Size?': <[Link] at
0x7f61aa533c18>,
"hm...i'm": <[Link] at
0x7f61aa533e10>,
'interesting,': <[Link] at
0x7f61aa560160>,
'exams': <[Link] at
0x7f61aa533f28>,
'crusts.': <[Link] at
0x7f61aa533f60>,
'filling': <[Link] at
0x7f61aa533fd0>,
'gets': <[Link] at
0x7f61aa4e51d0>,
'his': <[Link] at
0x7f61aa4e5208>,
'Friday,': <[Link] at
0x7f61aa4e5240>,
'f': <[Link] at
0x7f61aa4e5278>,
'too!': <[Link] at
0x7f61aa4e52e8>,
'Made': <[Link] at
0x7f61aa4e5400>,
'accidentally': <[Link] at
0x7f61aa4e5438>,
'"New': <[Link] at
0x7f61aa4e5470>,
'COURSE.': <[Link] at
0x7f61aa4e54a8>,
'[please': <[Link] at
0x7f61aa572240>,
'this...': <[Link] at
0x7f61aa4e5630>,
'soon': <[Link] at
0x7f61aa4e5710>,
'worry': <[Link] at
0x7f61aa4e57b8>,
'Job]:': <[Link] at
0x7f61aa4e58d0>,
'deal': <[Link] at
0x7f61aa4e59b0>,
'pounding': <[Link] at
0x7f61aa4e59e8>,
'[Are': <[Link] at
0x7f61aa4e5a90>,
'begin': <[Link] at
0x7f61aa4e5b00>,
'isolated': <[Link] at
0x7f61aa4e5c18>,
'anyways': <[Link] at
0x7f61aa4e5c50>,
'garbage': <[Link] at
0x7f61aa4e5c88>,
'awww': <[Link] at
0x7f61aa4e5cf8>,
'intelligence': <[Link] at
0x7f61aa4e5d68>,
'being': <[Link] at
0x7f61aa4e5e48>,
'married?]:': <[Link] at
0x7f61aa4e5eb8>,
'omg': <[Link] at
0x7f61aa440dd8>,
'...': <[Link] at
0x7f61aa4e5f28>,
'highlight': <[Link] at
0x7f61aa4e5fd0>,
'to': <[Link] at
0x7f61aa4e8978>,
'AHH': <[Link] at
0x7f61aa4e8b38>,
'OVER!!!!!!!!!': <[Link] at
0x7f61aa4e8b70>,
'Cried': <[Link] at
0x7f61aa4e8c18>,
'SAYING?!?!?': <[Link] at
0x7f61aa4e8c50>,
'olivia.': <[Link] at
0x7f61aa4e8da0>,
"she'll": <[Link] at
0x7f61aa4e8f60>,
'community,': <[Link] at
0x7f61aa4e8f98>,
'cold.': <[Link] at
0x7f61aa5dc978>,
'not': <[Link] at
0x7f61aa4e8898>,
'transcripts': <[Link] at
0x7f61aa4e8160>,
'promises...i': <[Link] at
0x7f61aa5c7ba8>,
'totem': <[Link] at
0x7f61aa4e82e8>,
'naked,': <[Link] at
0x7f61aa554320>,
'hate': <[Link] at
0x7f61aa4e8358>,
'gas': <[Link] at
0x7f61aa4e85c0>,
'beat': <[Link] at
0x7f61aa4e85f8>,
'Jungle': <[Link] at
0x7f61aa4e8748>,
'band': <[Link] at
0x7f61aa5697b8>,
'ought': <[Link] at
0x7f61aa4e8828>,
'ishouldnt': <[Link] at
0x7f61aa4e7128>,
'funni': <[Link] at
0x7f61aa4e7208>,
'camera': <[Link] at
0x7f61aa4e7278>,
"Mom's": <[Link] at
0x7f61aa4e7400>,
'invitations': <[Link] at
0x7f61aa4e7438>,
'sheets,': <[Link] at
0x7f61aa4e7470>,
'sony': <[Link] at
0x7f61aa4e74a8>,
'Could': <[Link] at
0x7f61aa4e7588>,
'"goodness"': <[Link] at
0x7f61aa4e75c0>,
'commentators': <[Link] at
0x7f61aa4e7668>,
'learned': <[Link] at
0x7f61aa4e7710>,
'quit': <[Link] at
0x7f61aa4e7748>,
"mother's": <[Link] at
0x7f61aa5dc9e8>,
'Hussein,': <[Link] at
0x7f61aa5b9320>,
'Funny,': <[Link] at
0x7f61aa4e7860>,
'Actually': <[Link] at
0x7f61aa4e7898>,
'upsetting.': <[Link] at
0x7f61aa4e7a90>,
'ring!)': <[Link] at
0x7f61aa4e7b00>,
'material': <[Link] at
0x7f61aa4e7b38>,
'…': <[Link] at
0x7f61aa4e7be0>,
'kind': <[Link] at
0x7f61aa4e7c18>,
'Moon"': <[Link] at
0x7f61aa4e7cc0>,
'james,': <[Link] at
0x7f61aa4e7d30>,
'regardless': <[Link] at
0x7f61aa4e7d68>,
'WATCHED': <[Link] at
0x7f61aa4e7da0>,
'possibly': <[Link] at
0x7f61aa5bbe10>,
'Make': <[Link] at
0x7f61aa4e7ef0>,
'airplanes,': <[Link] at
0x7f61aa463f60>,
'Exaggerated,': <[Link] at
0x7f61aa4e7f98>,
'head,': <[Link] at
0x7f61aa4e7fd0>,
'graceful': <[Link] at
0x7f61aa4d9128>,
'but': <[Link] at
0x7f61aa4d9550>,
'low': <[Link] at
0x7f61aa4d95f8>,
'it!!!': <[Link] at
0x7f61aa4d9710>,
'usual)': <[Link] at
0x7f61aa4d97f0>,
'doing?:': <[Link] at
0x7f61aa4d9908>,
"wat's": <[Link] at
0x7f61aa4d99b0>,
'disadvantages': <[Link] at
0x7f61aa5dcb00>,
'breaks': <[Link] at
0x7f61aa4d9cf8>,
'partner,': <[Link] at
0x7f61aa4d8048>,
'totally': <[Link] at
0x7f61aa4d80f0>,
'break?!': <[Link] at
0x7f61aa4d81d0>,
'remember,': <[Link] at
0x7f61aa3fedd8>,
'nose.': <[Link] at
0x7f61aa4d82b0>,
'...gets': <[Link] at
0x7f61aa4d82e8>,
'circles': <[Link] at
0x7f61aa4d8320>,
'list?': <[Link] at
0x7f61aa4d84a8>,
'babble.': <[Link] at
0x7f61aa4ec2e8>,
'Those': <[Link] at
0x7f61aa5c19b0>,
'hers,': <[Link] at
0x7f61aa554518>,
'Kucinich).': <[Link] at
0x7f61aa4d8a90>,
'toxic,': <[Link] at
0x7f61aa4d8ac8>,
'mates.': <[Link] at
0x7f61aa4d8be0>,
'rock!': <[Link] at
0x7f61aa4d8d68>,
'birthday': <[Link] at
0x7f61aa4d8e48>,
'okay-': <[Link] at
0x7f61aa4d8ef0>,
'Twenty-six': <[Link] at
0x7f61aa4d8f60>,
'Molly': <[Link] at
0x7f61aa4d8f98>,
'everyone.i': <[Link] at
0x7f61aa4d8fd0>,
'brought': <[Link] at
0x7f61aa4db320>,
'rusty.': <[Link] at
0x7f61aa4db358>,
"Let's": <[Link] at
0x7f61aa4db390>,
'soon?': <[Link] at
0x7f61aa4db400>,
'19.': <[Link] at
0x7f61aa4db4e0>,
'shuffle': <[Link] at
0x7f61aa4db8d0>,
"you're": <[Link] at
0x7f61aa4dbac8>,
'somehow?': <[Link] at
0x7f61aa4ec400>,
'naked?]:': <[Link] at
0x7f61aa5d4c18>,
'...i': <[Link] at
0x7f61aa4dbd68>,
'friend': <[Link] at
0x7f61aa4da048>,
'away;': <[Link] at
0x7f61aa4da320>,
'tending': <[Link] at
0x7f61aa4da358>,
'creates': <[Link] at
0x7f61aa4da5f8>,
'certitude,': <[Link] at
0x7f61aa4da668>,
'job...some': <[Link] at
0x7f61aa4da6a0>,
'room.': <[Link] at
0x7f61aa4da748>,
'...will': <[Link] at
0x7f61aa4da7b8>,
'mincing': <[Link] at
0x7f61aa4da8d0>,
'dog/cat/bird/fish,':
<[Link] at 0x7f61aa4da908>,
'way,': <[Link] at
0x7f61aa5bf0b8>,
'nvm...': <[Link] at
0x7f61aa4daba8>,
'illness,': <[Link] at
0x7f61aa4dac18>,
'good.': <[Link] at
0x7f61aa4dacc0>,
'bother??': <[Link] at
0x7f61aa4dada0>,
'curse': <[Link] at
0x7f61aa4dadd8>,
"daughter's": <[Link] at
0x7f61aa4daf60>,
'(albeit,': <[Link] at
0x7f61aa4dc3c8>,
'okay.': <[Link] at
0x7f61aa4ede48>,
'boxers': <[Link] at
0x7f61aa4dc588>,
'Calculus,': <[Link] at
0x7f61aa4dc6a0>,
'MEAN': <[Link] at
0x7f61aa4dc7b8>,
'rosie.': <[Link] at
0x7f61aa4dc9e8>,
'hard': <[Link] at
0x7f61aa4dca90>,
'life...think': <[Link] at
0x7f61aa4dcba8>,
'takes': <[Link] at
0x7f61aa4dce48>,
'pretty.': <[Link] at
0x7f61aa5c1c88>,
'award': <[Link] at
0x7f61aa4dceb8>,
'their': <[Link] at
0x7f61aa4dcf28>,
'plainly.': <[Link] at
0x7f61aa4dcfd0>,
'noone': <[Link] at
0x7f61aa4ca1d0>,
'say...no': <[Link] at
0x7f61aa438978>,
'thats': <[Link] at
0x7f61aa4ca2e8>,
'learning': <[Link] at
0x7f61aa5dcd30>,
'sleep': <[Link] at
0x7f61aa4ca4a8>,
'against': <[Link] at
0x7f61aa4ca5c0>,
'rubbish': <[Link] at
0x7f61aa565358>,
'years,': <[Link] at
0x7f61aa4ca9b0>,
'theatre)': <[Link] at
0x7f61aa4caa20>,
'[Kissed': <[Link] at
0x7f61aa4caa90>,
'love?': <[Link] at
0x7f61aa4caba8>,
'Forgetting': <[Link] at
0x7f61aa4cada0>,
'Whoever': <[Link] at
0x7f61aa4cb358>,
'bacon': <[Link] at
0x7f61aa4cb438>,
'wishing': <[Link] at
0x7f61aa5ce940>,
'fantastic.': <[Link] at
0x7f61aa4cb898>,
'rosalie...': <[Link] at
0x7f61aa4cbc18>,
'souned': <[Link] at
0x7f61aa5dce48>,
'bulbous': <[Link] at
0x7f61aa4cbeb8>,
'in-depth': <[Link] at
0x7f61aa4cbef0>,
'proof': <[Link] at
0x7f61aa4cd240>,
'however,': <[Link] at
0x7f61aa4cd278>,
'at': <[Link] at
0x7f61aa4cd390>,
"you'll": <[Link] at
0x7f61aa4cd438>,
'Will': <[Link] at
0x7f61aa4cd780>,
'Chotky': <[Link] at
0x7f61aa4cda90>,
'o0o!': <[Link] at
0x7f61aa4cf2b0>,
'overnight,': <[Link] at
0x7f61aa442208>,
'6.': <[Link] at
0x7f61aa4cf400>,
'expensive': <[Link] at
0x7f61aa4cf518>,
'employers': <[Link] at
0x7f61aa4cf550>,
'especially': <[Link] at
0x7f61aa4cf828>,
'lives,': <[Link] at
0x7f61aa4cf860>,
'dumb': <[Link] at
0x7f61aa4cf898>,
'EVERYONE!!!': <[Link] at
0x7f61aa4cfc88>,
'mind,': <[Link] at
0x7f61aa4cfd68>,
'terms': <[Link] at
0x7f61aa4cffd0>,
'deception': <[Link] at
0x7f61aa4ce390>,
'glad.': <[Link] at
0x7f61aa4ce5c0>,
'20:': <[Link] at
0x7f61aa453e48>,
'disappeared!!!!!!!!':
<[Link] at 0x7f61aa4ce978>,
'candy:': <[Link] at
0x7f61aa4ceba8>,
'PRODUCTIVE!!': <[Link] at
0x7f61aa5d7048>,
'Goals': <[Link] at
0x7f61aa4cec18>,
'like,': <[Link] at
0x7f61aa4cecf8>,
'Carter': <[Link] at
0x7f61aa4cedd8>,
'So': <[Link] at
0x7f61aa4cef60>,
'5:': <[Link] at
0x7f61aa4d2048>,
'stalled.': <[Link] at
0x7f61aa4d2208>,
'fewer': <[Link] at
0x7f61aa4d26d8>,
'lies': <[Link] at
0x7f61aa4d27b8>,
'faces': <[Link] at
0x7f61aa5b9828>,
'im': <[Link] at
0x7f61aa4d2898>,
'kina': <[Link] at
0x7f61aa4d2ba8>,
'Each': <[Link] at
0x7f61aa4d2e10>,
'know...even': <[Link] at
0x7f61aa4d2e48>,
'thrown': <[Link] at
0x7f61aa4d2eb8>,
"can't": <[Link] at
0x7f61aa4d3128>,
'close-minded.': <[Link] at
0x7f61aa4d31d0>,
'aint': <[Link] at
0x7f61aa4d3240>,
'the': <[Link] at
0x7f61aa4d36d8>,
'Ikea': <[Link] at
0x7f61aa4d37b8>,
'trying': <[Link] at
0x7f61aa4d38d0>,
'Coulter': <[Link] at
0x7f61aa4d3940>,
'cleaner,': <[Link] at
0x7f61aa4d3b00>,
'Mix]"': <[Link] at
0x7f61aa4d3ba8>,
'surface,': <[Link] at
0x7f61aa4d3c50>,
'mean,': <[Link] at
0x7f61aa4d50b8>,
'Graham),': <[Link] at
0x7f61aa4d5160>,
'Congress,': <[Link] at
0x7f61aa4d5198>,
'animals': <[Link] at
0x7f61aa4d5208>,
'small': <[Link] at
0x7f61aa4d5278>,
'steps.': <[Link] at
0x7f61aa4d5390>,
'[relationship]': <[Link]
at 0x7f61aa4d5438>,
'[Wanted': <[Link] at
0x7f61aa4d55c0>,
'finals...too': <[Link] at
0x7f61aa4d55f8>,
'definitely.': <[Link] at
0x7f61aa554eb8>,
'I:': <[Link] at
0x7f61aa4d56a0>,
'what...even': <[Link] at
0x7f61aa4eef98>,
'......': <[Link] at
0x7f61aa4d5978>,
'lies).': <[Link] at
0x7f61aa4d59e8>,
'longer': <[Link] at
0x7f61aa4d5a90>,
'animals.': <[Link] at
0x7f61aa5b9908>,
'mindless': <[Link] at
0x7f61aa4d5b38>,
'disappear….': <[Link] at
0x7f61aa4d5cc0>,
'places': <[Link] at
0x7f61aa4d6c88>,
'sheets.': <[Link] at
0x7f61aa4d6cf8>,
'here.': <[Link] at
0x7f61aa4d6d68>,
'both,': <[Link] at
0x7f61aa4d6e10>,
'xela': <[Link] at
0x7f61aa4d6e80>,
'creeping': <[Link] at
0x7f61aa4d6be0>,
'dressy': <[Link] at
0x7f61aa4d6048>,
'melting': <[Link] at
0x7f61aa4d6198>,
'30': <[Link] at
0x7f61aa4d6240>,
'Questions': <[Link] at
0x7f61aa5b99e8>,
'indicates': <[Link] at
0x7f61aa4d6390>,
'guess': <[Link] at
0x7f61aa4d63c8>,
'37': <[Link] at
0x7f61aa4d6400>,
'strong,': <[Link] at
0x7f61aa4d6588>,
"I'd": <[Link] at
0x7f61aa4d6668>,
'Band': <[Link] at
0x7f61aa4d6940>,
'portly.': <[Link] at
0x7f61aa4d6a20>,
'dere': <[Link] at
0x7f61aa4d6ac8>,
'weeee': <[Link] at
0x7f61aa5b9ef0>,
'reason': <[Link] at
0x7f61aa4d6b00>,
'az': <[Link] at
0x7f61aa4d7208>,
'pond..': <[Link] at
0x7f61aa5e1ef0>,
'anyway).': <[Link] at
0x7f61aa4d77b8>,
'adventurous': <[Link] at
0x7f61aa4d7828>,
'supply': <[Link] at
0x7f61aa4d70b8>,
'Bored': <[Link] at
0x7f61aa4d7240>,
'black': <[Link] at
0x7f61aa4d7278>,
'cambridge?': <[Link] at
0x7f61aa4d7358>,
'noise': <[Link] at
0x7f61aa4d7438>,
'Winnipeg.': <[Link] at
0x7f61aa4d7470>,
'There': <[Link] at
0x7f61aa4d74e0>,
'chat': <[Link] at
0x7f61aa4d7588>,
'HERE': <[Link] at
0x7f61aa4d76a0>,
'choose': <[Link] at
0x7f61aa4d78d0>,
'morality,': <[Link] at
0x7f61aa4d7a90>,
'favors': <[Link] at
0x7f61aa4d7b38>,
'[If': <[Link] at
0x7f61aa4d7c50>,
'nvm,': <[Link] at
0x7f61aa4d7cc0>,
'tragedy': <[Link] at
0x7f61aa4d7d30>,
'japanese': <[Link] at
0x7f61aa4d7da0>,
'invite': <[Link] at
0x7f61aa54b780>,
'way.': <[Link] at
0x7f61aa4d7e10>,
'HAPPY': <[Link] at
0x7f61aa4d7f98>,
'fierce': <[Link] at
0x7f61aa5e78d0>,
'fools': <[Link] at
0x7f61aa4d13c8>,
'goes': <[Link] at
0x7f61aa4d1400>,
'wafers': <[Link] at
0x7f61aa4d1470>,
':-D': <[Link] at
0x7f61aa5e7c88>,
'feathers': <[Link] at
0x7f61aa5e7e10>,
'still...': <[Link] at
0x7f61aa4425f8>,
'selene': <[Link] at
0x7f61aa4d15c0>,
'dinner"': <[Link] at
0x7f61aa4d16a0>,
'EVERY': <[Link] at
0x7f61aa4d16d8>,
'(2)': <[Link] at
0x7f61aa4d1710>,
'hormones': <[Link] at
0x7f61aa4d1860>,
'singing': <[Link] at
0x7f61aa4d1898>,
'carry': <[Link] at
0x7f61aa4d1a58>,
'bestfriend': <[Link] at
0x7f61aa4d1ac8>,
'AmeriCorps': <[Link] at
0x7f61aa4d1b00>,
'tuesday': <[Link] at
0x7f61aa4d1c50>,
'plants.': <[Link] at
0x7f61aa4d1fd0>,
'Presidential': <[Link] at
0x7f61aa4d1048>,
'dunno...i': <[Link] at
0x7f61aa4d1278>,
'[few': <[Link] at
0x7f61aa4dd048>,
'exercise.': <[Link] at
0x7f61aa4dd080>,
'WITH': <[Link] at
0x7f61aa4dd198>,
'Figueroa': <[Link] at
0x7f61aa4dd1d0>,
'softens': <[Link] at
0x7f61aa4dd320>,
'true.': <[Link] at
0x7f61aa466eb8>,
'ballpark': <[Link] at
0x7f61aa4dd588>,
'sleep,': <[Link] at
0x7f61aa4dd5f8>,
'names.': <[Link] at
0x7f61aa4dd6d8>,
'you’re': <[Link] at
0x7f61aa4dd710>,
'price': <[Link] at
0x7f61aa4dd7f0>,
'pig': <[Link] at
0x7f61aa4dd940>,
'time:': <[Link] at
0x7f61aa4dda20>,
'Colella': <[Link] at
0x7f61aa4dda90>,
'gift': <[Link] at
0x7f61aa4ddb70>,
'american': <[Link] at
0x7f61aa4ddba8>,
'poopie': <[Link] at
0x7f61aa4ddcf8>,
'floor': <[Link] at
0x7f61aa4ddd68>,
'talked': <[Link] at
0x7f61aa4dddd8>,
'age': <[Link] at
0x7f61aa4dde10>,
'sad.': <[Link] at
0x7f61aa4dde48>,
'usually': <[Link] at
0x7f61aa4dde80>,
"i'd": <[Link] at
0x7f61aa4040f0>,
'New]:': <[Link] at
0x7f61aa4ddeb8>,
'out,': <[Link] at
0x7f61aa4ddef0>,
'Secondly,': <[Link] at
0x7f61aa4ddf28>,
'kicked': <[Link] at
0x7f61aa4e0048>,
'stuff': <[Link] at
0x7f61aa4e0080>,
'essences': <[Link] at
0x7f61aa4e0128>,
'live': <[Link] at
0x7f61aa4e0198>,
'aditi.': <[Link] at
0x7f61aa4e01d0>,
'prepare,': <[Link] at
0x7f61aa4e0320>,
'Ave': <[Link] at
0x7f61aa4e0358>,
'Given': <[Link] at
0x7f61aa4e0438>,
'C"': <[Link] at
0x7f61aa4e04e0>,
'touching': <[Link] at
0x7f61aa4e0588>,
'Jeep),': <[Link] at
0x7f61aa5df438>,
'Los': <[Link] at
0x7f61aa4e0668>,
'wide.': <[Link] at
0x7f61aa4e06d8>,
'though.': <[Link] at
0x7f61aa4e0748>,
'sometime,': <[Link] at
0x7f61aa554dd8>,
'had.': <[Link] at
0x7f61aa4e07f0>,
'dreams': <[Link] at
0x7f61aa4e0908>,
'jobs': <[Link] at
0x7f61aa4e0940>,
'bike': <[Link] at
0x7f61aa4e09b0>,
'waterfall': <[Link] at
0x7f61aa4e09e8>,
'uhh....': <[Link] at
0x7f61aa4e0a58>,
'strenuous': <[Link] at
0x7f61aa4e0a90>,
'overly-perky': <[Link] at
0x7f61aa554e48>,
'....that': <[Link] at
0x7f61aa4e0b70>,
'fraud': <[Link] at
0x7f61aa4e0be0>,
'ahaha': <[Link] at
0x7f61aa4e0c88>,
'New': <[Link] at
0x7f61aa4e0cc0>,
'shopping': <[Link] at
0x7f61aa4e0d30>,
'extra': <[Link] at
0x7f61aa4e0d68>,
'use.': <[Link] at
0x7f61aa4e0e10>,
'running--while': <[Link]
at 0x7f61aa4e0e80>,
"won't": <[Link] at
0x7f61aa4e0ef0>,
'no:': <[Link] at
0x7f61aa4e0f28>,
'verb,': <[Link] at
0x7f61aa4e0f60>,
'punch': <[Link] at
0x7f61aa4e0f98>,
'tamar.': <[Link] at
0x7f61aa4e0fd0>,
'summer': <[Link] at
0x7f61aa4e3080>,
'got': <[Link] at
0x7f61aa4e30f0>,
'breath,': <[Link] at
0x7f61aa4e3240>,
'answer': <[Link] at
0x7f61aa4e3278>,
'selves': <[Link] at
0x7f61aa5d4eb8>,
'everthing': <[Link] at
0x7f61aa4dd908>,
'nap,': <[Link] at
0x7f61aa4e3470>,
'CBC': <[Link] at
0x7f61aa4e34a8>,
'argument': <[Link] at
0x7f61aa4e34e0>,
'if': <[Link] at
0x7f61aa4e3550>,
'sorts': <[Link] at
0x7f61aa5edd30>,
'fields,': <[Link] at
0x7f61aa4e35c0>,
'canning': <[Link] at
0x7f61aa4e36a0>,
'worry..': <[Link] at
0x7f61aa4e36d8>,
'curtains!': <[Link] at
0x7f61aa4e3828>,
'why…': <[Link] at
0x7f61aa4e38d0>,
'fainting': <[Link] at
0x7f61aa4e3940>,
'ONLY': <[Link] at
0x7f61aa4e3978>,
'no-one': <[Link] at
0x7f61aa4e3a58>,
'floating': <[Link] at
0x7f61aa4e3a90>,
'messy,': <[Link] at
0x7f61aa4e3b38>,
'third': <[Link] at
0x7f61aa4e3ba8>,
'stood,': <[Link] at
0x7f61aa4e3c18>,
'fishing?': <[Link] at
0x7f61aa4e3cc0>,
'shall': <[Link] at
0x7f61aa4e3d30>,
'everything': <[Link] at
0x7f61aa4e3d68>,
'dog': <[Link] at
0x7f61aa4e3da0>,
'semester!': <[Link] at
0x7f61aa53ca20>,
'hurts': <[Link] at
0x7f61aa4e3dd8>,
'blab': <[Link] at
0x7f61aa4e3e10>,
'Cyan425:': <[Link] at
0x7f61aa4e3e80>,
'kid': <[Link] at
0x7f61aa4e3eb8>,
'Rumsfeld': <[Link] at
0x7f61aa4e3ef0>,
'be:': <[Link] at
0x7f61aa4e3f60>,
'character': <[Link] at
0x7f61aa4e3f98>,
'too;': <[Link] at
0x7f61aa4e3fd0>,
'cheese.': <[Link] at
0x7f61aa4e4048>,
'showin': <[Link] at
0x7f61aa4e4080>,
'DiFranco.': <[Link] at
0x7f61aa4e40b8>,
'weeks.': <[Link] at
0x7f61aa4e40f0>,
'authorized': <[Link] at
0x7f61aa4e4128>,
'Or': <[Link] at
0x7f61aa4e4208>,
'easier.': <[Link] at
0x7f61aa4e4278>,
'deserve': <[Link] at
0x7f61aa4e42b0>,
'reads': <[Link] at
0x7f61aa4e42e8>,
'beautiful': <[Link] at
0x7f61aa4e4390>,
'avril': <[Link] at
0x7f61aa4e43c8>,
'days.': <[Link] at
0x7f61aa5b9e80>,
'"can': <[Link] at
0x7f61aa4e4400>,
'player:': <[Link] at
0x7f61aa4e4470>,
'american??': <[Link] at
0x7f61aa4e44a8>,
'Michelle': <[Link] at
0x7f61aa4e44e0>,
'confusing,': <[Link] at
0x7f61aa4e4550>,
'YoUr': <[Link] at
0x7f61aa4e4588>,
'away...': <[Link] at
0x7f61aa4e5358>,
'handed': <[Link] at
0x7f61aa4e45f8>,
'casual': <[Link] at
0x7f61aa4e46a0>,
'colorful': <[Link] at
0x7f61aa4e4828>,
'lives.': <[Link] at
0x7f61aa4e4898>,
'selfishness...busying':
<[Link] at 0x7f61aa4e4978>,
'shakes': <[Link] at
0x7f61aa4e4a20>,
'workouts.': <[Link] at
0x7f61aa4e4b70>,
'upon': <[Link] at
0x7f61aa4e4cc0>,
'BACK': <[Link] at
0x7f61aa4e4d68>,
'Radio': <[Link] at
0x7f61aa4e4e48>,
'"Truly,': <[Link] at
0x7f61aa4e4e80>,
'lord': <[Link] at
0x7f61aa4e4ef0>,
'Opening': <[Link] at
0x7f61aa4e4f28>,
'counts?': <[Link] at
0x7f61aa4e4f60>,
'sorry?': <[Link] at
0x7f61aa5df780>,
'His': <[Link] at
0x7f61aa4e1710>,
'article': <[Link] at
0x7f61aa4e1860>,
'(Dear': <[Link] at
0x7f61aa4e1898>,
'FAITH': <[Link] at
0x7f61aa4e1b70>,
'Girl**': <[Link] at
0x7f61aa4e1c18>,
'school': <[Link] at
0x7f61aa5df7b8>,
'hheeh.': <[Link] at
0x7f61aa4e1dd8>,
'done,': <[Link] at
0x7f61aa4e1e48>,
'foot': <[Link] at
0x7f61aa4e1eb8>,
'change...ppl': <[Link] at
0x7f61aa4e1f28>,
'lungs': <[Link] at
0x7f61aa4e1fd0>,
"didn't": <[Link] at
0x7f61aa4e1630>,
']': <[Link] at
0x7f61aa4e1048>,
'summer.': <[Link] at
0x7f61aa4e1080>,
'side,': <[Link] at
0x7f61aa4e10b8>,
'this': <[Link] at
0x7f61aa4e1128>,
'step': <[Link] at
0x7f61aa4e1160>,
'sloth': <[Link] at
0x7f61aa4e11d0>,
'essences,': <[Link] at
0x7f61aa4e1438>,
'spice': <[Link] at
0x7f61aa4e14e0>,
'Interesting:': <[Link] at
0x7f61aa4e1518>,
'survive': <[Link] at
0x7f61aa4e1588>,
'intelligence"': <[Link] at
0x7f61aa4e15c0>,
'cliff': <[Link] at
0x7f61aa53c048>,
'dragging': <[Link] at
0x7f61aa53c080>,
'Worst': <[Link] at
0x7f61aa5c1ba8>,
'"L"': <[Link] at
0x7f61aa53c160>,
'columnists': <[Link] at
0x7f61aa53c198>,
'shopping.': <[Link] at
0x7f61aa53c1d0>,
'have...satisfied': <[Link]
at 0x7f61aa53c208>,
'lie.': <[Link] at
0x7f61aa53c278>,
'flying': <[Link] at
0x7f61aa53c320>,
'perhaps': <[Link] at
0x7f61aa53c358>,
'myself..': <[Link] at
0x7f61aa53c390>,
'thing.)': <[Link] at
0x7f61aa53c3c8>,
'shattered': <[Link] at
0x7f61aa53c400>,
'ACL': <[Link] at
0x7f61aa53c438>,
'dressed,': <[Link] at
0x7f61aa53c4a8>,
'someone...and': <[Link] at
0x7f61aa53c588>,
'Random': <[Link] at
0x7f61aa558a20>,
'painful': <[Link] at
0x7f61aa53c5f8>,
'Florida?]:': <[Link] at
0x7f61aa53c630>,
'Gulf': <[Link] at
0x7f61aa53c668>,
'stupid': <[Link] at
0x7f61aa53c6a0>,
'kneecap': <[Link] at
0x7f61aa53c6d8>,
'26th': <[Link] at
0x7f61aa53c710>,
'recently': <[Link] at
0x7f61aa53c748>,
'Eye': <[Link] at
0x7f61aa53c780>,
'Insecure:': <[Link] at
0x7f61aa53c7b8>,
'Organized:': <[Link] at
0x7f61aa53c7f0>,
'school...*sigh*': <[Link]
at 0x7f61aa53c828>,
'shoulders': <[Link] at
0x7f61aa53c860>,
'MoO': <[Link] at
0x7f61aa53c898>,
'following': <[Link] at
0x7f61aa53c8d0>,
'on,': <[Link] at
0x7f61aa53c908>,
'pollution,': <[Link] at
0x7f61aa53c940>,
'rosalie': <[Link] at
0x7f61aa53c978>,
'law': <[Link] at
0x7f61aa53c9b0>,
'norway,': <[Link] at
0x7f61aa53c9e8>,
'have]': <[Link] at
0x7f61aa5b6438>,
'...cheers': <[Link] at
0x7f61aa53ca58>,
'DrAmA': <[Link] at
0x7f61aa53ca90>,
'searching': <[Link] at
0x7f61aa5b4240>,
'people!': <[Link] at
0x7f61aa53cb00>,
'fun!': <[Link] at
0x7f61aa53cb38>,
'Yellowcard': <[Link] at
0x7f61aa53cb70>,
'terminally': <[Link] at
0x7f61aa53cba8>,
'right.': <[Link] at
0x7f61aa53cbe0>,
'feet': <[Link] at
0x7f61aa53cc18>,
'person.': <[Link] at
0x7f61aa53cc50>,
"they're": <[Link] at
0x7f61aa53cc88>,
'Opposition': <[Link] at
0x7f61aa53ccc0>,
"veterans'": <[Link] at
0x7f61aa53ccf8>,
'Quiz': <[Link] at
0x7f61aa53cd30>,
'lying,': <[Link] at
0x7f61aa53cd68>,
'7.': <[Link] at
0x7f61aa53cda0>,
'mention': <[Link] at
0x7f61aa53cdd8>,
'weirdest': <[Link] at
0x7f61aa53ce10>,
'"Stay': <[Link] at
0x7f61aa5d7b00>,
'rear': <[Link] at
0x7f61aa53ce48>,
'clairol': <[Link] at
0x7f61aa53ce80>,
'nvm': <[Link] at
0x7f61aa53ceb8>,
'minute': <[Link] at
0x7f61aa558358>,
'getting': <[Link] at
0x7f61aa53cf60>,
'prefer': <[Link] at
0x7f61aa53cf98>,
'open': <[Link] at
0x7f61aa53cfd0>,
'feeble': <[Link] at
0x7f61aa54b048>,
'October': <[Link] at
0x7f61aa5bb160>,
'LIKE': <[Link] at
0x7f61aa54b0b8>,
'do': <[Link] at
0x7f61aa54b0f0>,
'amount': <[Link] at
0x7f61aa54b128>,
'gerbils': <[Link] at
0x7f61aa54b160>,
'nasty': <[Link] at
0x7f61aa558400>,
'Responsible:': <[Link] at
0x7f61aa54b1d0>,
'America.': <[Link] at
0x7f61aa54b208>,
'"I\'d': <[Link] at
0x7f61aa54b240>,
'game': <[Link] at
0x7f61aa54b278>,
'behind"': <[Link] at
0x7f61aa54b2b0>,
'Free': <[Link] at
0x7f61aa54b2e8>,
'6:30.': <[Link] at
0x7f61aa54b320>,
'doom,': <[Link] at
0x7f61aa54b358>,
'family,': <[Link] at
0x7f61aa54b390>,
'odd': <[Link] at
0x7f61aa54b3c8>,
'bio': <[Link] at
0x7f61aa54b400>,
'going...': <[Link] at
0x7f61aa54b438>,
'post-its,': <[Link] at
0x7f61aa54b470>,
'teachers': <[Link] at
0x7f61aa54b4a8>,
'Time': <[Link] at
0x7f61aa54b4e0>,
'11:10': <[Link] at
0x7f61aa54b518>,
'orchestra...': <[Link] at
0x7f61aa569b38>,
'jacket': <[Link] at
0x7f61aa54b588>,
'Talkative:': <[Link] at
0x7f61aa54b5c0>,
'left-middle': <[Link] at
0x7f61aa54b5f8>,
'radical': <[Link] at
0x7f61aa54b630>,
'forever.': <[Link] at
0x7f61aa54b668>,
'Guess': <[Link] at
0x7f61aa560b38>,
'them,': <[Link] at
0x7f61aa54b6d8>,
'normal,': <[Link] at
0x7f61aa5dfc18>,
"lavigne's": <[Link] at
0x7f61aa54b748>,
'places.': <[Link] at
0x7f61aa54b7b8>,
'laugh': <[Link] at
0x7f61aa54b7f0>,
'vik': <[Link] at
0x7f61aa54b860>,
'yet...or': <[Link] at
0x7f61aa5cec50>,
'night..': <[Link] at
0x7f61aa54b898>,
'states': <[Link] at
0x7f61aa54b8d0>,
'done)': <[Link] at
0x7f61aa54b908>,
'excuses': <[Link] at
0x7f61aa5dfcc0>,
'treason.': <[Link] at
0x7f61aa54b978>,
'Gold': <[Link] at
0x7f61aa54b9b0>,
'words?': <[Link] at
0x7f61aa54b9e8>,
'fall': <[Link] at
0x7f61aa54ba20>,
'online': <[Link] at
0x7f61aa54ba58>,
'lips,': <[Link] at
0x7f61aa54ba90>,
'PLEAAAASSSSSSEEEEEEE':
<[Link] at 0x7f61aa54bac8>,
'God': <[Link] at
0x7f61aa54bb00>,
'b/c': <[Link] at
0x7f61aa54bb38>,
'worst': <[Link] at
0x7f61aa54bb70>,
'cancelling': <[Link] at
0x7f61aa54bba8>,
'by': <[Link] at
0x7f61aa54bbe0>,
'BS': <[Link] at
0x7f61aa54bc18>,
'bugs': <[Link] at
0x7f61aa54bc50>,
'succumb': <[Link] at
0x7f61aa54bc88>,
'baby...': <[Link] at
0x7f61aa54bcc0>,
'seems': <[Link] at
0x7f61aa54bcf8>,
'color(s):': <[Link] at
0x7f61aa54bd30>,
'Washington-based': <[Link]
at 0x7f61aa54bd68>,
'support': <[Link] at
0x7f61aa54bda0>,
'never)': <[Link] at
0x7f61aa54bdd8>,
'afternoon': <[Link] at
0x7f61aa54be10>,
'sprints.': <[Link] at
0x7f61aa54be48>,
'tank': <[Link] at
0x7f61aa54be80>,
'center': <[Link] at
0x7f61aa445080>,
'repetition': <[Link] at
0x7f61aa54bef0>,
'loneliness': <[Link] at
0x7f61aa5e5a90>,
'"Fast': <[Link] at
0x7f61aa54bf60>,
'UNDERWORLD': <[Link] at
0x7f61aa438208>,
'(hmm,': <[Link] at
0x7f61aa54bfd0>,
'shoes.': <[Link] at
0x7f61aa54f048>,
'(chocolate': <[Link] at
0x7f61aa54f080>,
'THE': <[Link] at
0x7f61aa54f0b8>,
'bakin': <[Link] at
0x7f61aa54f0f0>,
'those': <[Link] at
0x7f61aa54f128>,
'post...my': <[Link] at
0x7f61aa5dfe48>,
'about.': <[Link] at
0x7f61aa54f198>,
'helped': <[Link] at
0x7f61aa54f1d0>,
'hit': <[Link] at
0x7f61aa54f240>,
'unlike': <[Link] at
0x7f61aa54f278>,
'comments,': <[Link] at
0x7f61aa54f2b0>,
'yellow.': <[Link] at
0x7f61aa54f2e8>,
'youll': <[Link] at
0x7f61aa54f320>,
'Finally': <[Link] at
0x7f61aa54f358>,
'David': <[Link] at
0x7f61aa54f390>,
'cover': <[Link] at
0x7f61aa54f3c8>,
'Colin': <[Link] at
0x7f61aa54f400>,
'complain': <[Link] at
0x7f61aa54f438>,
'sometime': <[Link] at
0x7f61aa54f470>,
'shore,': <[Link] at
0x7f61aa54f4a8>,
'be?]:': <[Link] at
0x7f61aa4420b8>,
'lee': <[Link] at
0x7f61aa54f4e0>,
'Lonely': <[Link] at
0x7f61aa54f518>,
'starred': <[Link] at
0x7f61aa54f550>,
'sumtin': <[Link] at
0x7f61aa54f588>,
'tints?': <[Link] at
0x7f61aa54f5c0>,
'homework': <[Link] at
0x7f61aa54f5f8>,
'towers': <[Link] at
0x7f61aa54f630>,
'saddest': <[Link] at
0x7f61aa46bcf8>,
'Garden,': <[Link] at
0x7f61aa54f668>,
'green,': <[Link] at
0x7f61aa54f6a0>,
'you:': <[Link] at
0x7f61aa54f6d8>,
'sex?': <[Link] at
0x7f61aa54f710>,
'black,': <[Link] at
0x7f61aa54f748>,
'feasible,': <[Link] at
0x7f61aa54f780>,
'YOU...': <[Link] at
0x7f61aa54f7b8>,
'trouble?': <[Link] at
0x7f61aa54f7f0>,
'me...appreciative':
<[Link] at 0x7f61aa4609e8>,
'learner': <[Link] at
0x7f61aa54f860>,
'hours': <[Link] at
0x7f61aa54f898>,
'feast': <[Link] at
0x7f61aa54f8d0>,
'again!': <[Link] at
0x7f61aa54f908>,
'tip': <[Link] at
0x7f61aa54f940>,
'You...': <[Link] at
0x7f61aa54f978>,
'KNOW': <[Link] at
0x7f61aa54f9b0>,
'purple': <[Link] at
0x7f61aa54f9e8>,
'Dreams': <[Link] at
0x7f61aa54fa20>,
'here': <[Link] at
0x7f61aa54fa58>,
'accused': <[Link] at
0x7f61aa54fa90>,
'since': <[Link] at
0x7f61aa54fac8>,
'HATE': <[Link] at
0x7f61aa54fb00>,
'walk': <[Link] at
0x7f61aa54fb38>,
'outta': <[Link] at
0x7f61aa54fb70>,
'yet,': <[Link] at
0x7f61aa54fba8>,
"other...we're": <[Link] at
0x7f61aa54fbe0>,
'look': <[Link] at
0x7f61aa54fc18>,
':-/': <[Link] at
0x7f61aa54fc50>,
'yet': <[Link] at
0x7f61aa54fc88>,
'background': <[Link] at
0x7f61aa54fcc0>,
'is.': <[Link] at
0x7f61aa54fcf8>,
'now...': <[Link] at
0x7f61aa54fd68>,
'grow': <[Link] at
0x7f61aa54fda0>,
'dough': <[Link] at
0x7f61aa54fdd8>,
'government,': <[Link] at
0x7f61aa54fe10>,
'okie...that': <[Link] at
0x7f61aa54fe48>,
'plan': <[Link] at
0x7f61aa54fe80>,
'ummm...': <[Link] at
0x7f61aa54feb8>,
'king....': <[Link] at
0x7f61aa54fef0>,
'Marianne': <[Link] at
0x7f61aa54ff60>,
'until': <[Link] at
0x7f61aa54ff98>,
'mashed': <[Link] at
0x7f61aa5e1208>,
'rain': <[Link] at
0x7f61aa553048>,
'freshman': <[Link] at
0x7f61aa553080>,
'calls': <[Link] at
0x7f61aa5530b8>,
"us...we're": <[Link] at
0x7f61aa5530f0>,
'Soviet': <[Link] at
0x7f61aa553128>,
'gears,': <[Link] at
0x7f61aa553160>,
'knife': <[Link] at
0x7f61aa553198>,
'Floods,': <[Link] at
0x7f61aa5531d0>,
'(and': <[Link] at
0x7f61aa553208>,
'America': <[Link] at
0x7f61aa553240>,
'shi,': <[Link] at
0x7f61aa553278>,
'considering': <[Link] at
0x7f61aa5532b0>,
'committed': <[Link] at
0x7f61aa5532e8>,
'situation,': <[Link] at
0x7f61aa553320>,
'stole': <[Link] at
0x7f61aa553358>,
'brushing': <[Link] at
0x7f61aa553390>,
'happily': <[Link] at
0x7f61aa5533c8>,
'hand': <[Link] at
0x7f61aa553400>,
'problem': <[Link] at
0x7f61aa553438>,
'us': <[Link] at
0x7f61aa553470>,
'color': <[Link] at
0x7f61aa5534a8>,
'barely': <[Link] at
0x7f61aa558a90>,
'2:': <[Link] at
0x7f61aa553518>,
'repetition.': <[Link] at
0x7f61aa553550>,
'ready': <[Link] at
0x7f61aa553588>,
'everynight,': <[Link] at
0x7f61aa5535c0>,
'brownies': <[Link] at
0x7f61aa5364a8>,
'freaked': <[Link] at
0x7f61aa553630>,
'medium.': <[Link] at
0x7f61aa447048>,
'IS': <[Link] at
0x7f61aa5536a0>,
'helps': <[Link] at
0x7f61aa5536d8>,
'sophie?': <[Link] at
0x7f61aa553710>,
'"Trust': <[Link] at
0x7f61aa553748>,
'Now,': <[Link] at
0x7f61aa536748>,
'tact': <[Link] at
0x7f61aa5537b8>,
'needs': <[Link] at
0x7f61aa5537f0>,
'uniter,': <[Link] at
0x7f61aa553828>,
'He': <[Link] at
0x7f61aa553860>,
'family)': <[Link] at
0x7f61aa553898>,
'again...or': <[Link] at
0x7f61aa5538d0>,
'hearts': <[Link] at
0x7f61aa553908>,
'react': <[Link] at
0x7f61aa5e1400>,
'Flogging': <[Link] at
0x7f61aa553978>,
'running,': <[Link] at
0x7f61aa5539e8>,
'razors': <[Link] at
0x7f61aa4eaf28>,
'rarely': <[Link] at
0x7f61aa445630>,
'daunted:': <[Link] at
0x7f61aa553a90>,
'very': <[Link] at
0x7f61aa553ac8>,
'around': <[Link] at
0x7f61aa4ec4e0>,
'except': <[Link] at
0x7f61aa553b38>,
'war,"': <[Link] at
0x7f61aa553b70>,
'become': <[Link] at
0x7f61aa553ba8>,
'know,,,': <[Link] at
0x7f61aa553be0>,
'asleep': <[Link] at
0x7f61aa553c18>,
'sad...that': <[Link] at
0x7f61aa4ed668>,
'of,': <[Link] at
0x7f61aa553c88>,
'week,': <[Link] at
0x7f61aa553cc0>,
'SATs...fuun...sux...but':
<[Link] at 0x7f61aa553cf8>,
'...[should': <[Link] at
0x7f61aa553d30>,
'dropped': <[Link] at
0x7f61aa4ee0b8>,
'sure,': <[Link] at
0x7f61aa553da0>,
'cool.': <[Link] at
0x7f61aa553dd8>,
'jetlag': <[Link] at
0x7f61aa553e10>,
'fit.': <[Link] at
0x7f61aa54b828>,
'Arrogant:': <[Link] at
0x7f61aa553e80>,
'now?]:': <[Link] at
0x7f61aa553eb8>,
'objectives': <[Link] at
0x7f61aa553ef0>,
'me...they': <[Link] at
0x7f61aa553f28>,
'call': <[Link] at
0x7f61aa553f60>,
'Today': <[Link] at
0x7f61aa53b240>,
'checking': <[Link] at
0x7f61aa53b400>,
'tried': <[Link] at
0x7f61aa554048>,
'old,': <[Link] at
0x7f61aa554080>,
'glasses': <[Link] at
0x7f61aa5540b8>,
'bill': <[Link] at
0x7f61aa5540f0>,
'fourth,': <[Link] at
0x7f61aa554128>,
'better': <[Link] at
0x7f61aa554160>,
'ground': <[Link] at
0x7f61aa554198>,
'More': <[Link] at
0x7f61aa5541d0>,
'gameroom': <[Link] at
0x7f61aa4e5588>,
'above': <[Link] at
0x7f61aa554240>,
'eventful.': <[Link] at
0x7f61aa554278>,
'happen': <[Link] at
0x7f61aa5542b0>,
'Lazy': <[Link] at
0x7f61aa5542e8>,
'license': <[Link] at
0x7f61aa4e8320>,
'bleating': <[Link] at
0x7f61aa554358>,
'start.': <[Link] at
0x7f61aa554390>,
'will': <[Link] at
0x7f61aa5543c8>,
'?': <[Link] at
0x7f61aa554400>,
'napping': <[Link] at
0x7f61aa554438>,
'Better?': <[Link] at
0x7f61aa554470>,
'linoleum': <[Link] at
0x7f61aa5544a8>,
'SOMETHING!': <[Link] at
0x7f61aa5544e0>,
'sophie': <[Link] at
0x7f61aa4d8828>,
'reacts,': <[Link] at
0x7f61aa554550>,
'Car"': <[Link] at
0x7f61aa554588>,
'extinct.': <[Link] at
0x7f61aa5e1550>,
'knowin': <[Link] at
0x7f61aa5545f8>,
'looks': <[Link] at
0x7f61aa554630>,
'alex!': <[Link] at
0x7f61aa554668>,
'analyze': <[Link] at
0x7f61aa5546a0>,
'internet': <[Link] at
0x7f61aa5546d8>,
'am,': <[Link] at
0x7f61aa554710>,
"I'll": <[Link] at
0x7f61aa554748>,
'go:': <[Link] at
0x7f61aa554780>,
'hardest': <[Link] at
0x7f61aa5547b8>,
'bed:': <[Link] at
0x7f61aa5547f0>,
'tower!!': <[Link] at
0x7f61aa554828>,
'(analyze': <[Link] at
0x7f61aa554860>,
'Rice': <[Link] at
0x7f61aa554898>,
'bravest': <[Link] at
0x7f61aa5548d0>,
...}

[Link]('I', 'My')

0.082851942583535218

print(posts[5])
[Link]('ring', 'husband')

I've tried starting blog after blog and it just


never feels right. Then I read today that it
feels strange to most people, but the more you do
it the better it gets (hmm, sounds suspiciously
like something else!) so I decided to give it
another try. My husband bought me a notepad at
urlLink McNally (the best bookstore in Western
Canada) with that title and a picture of a 50s
housewife grinning desperately. Each page has
something funny like "New curtains! Hurrah!".
For some reason it struck me as absolutely
hilarious and has stuck in my head ever since.
What were those women thinking?

0.037229111896779618

[Link]('ring', 'housewife')

0.11547398696865138

[Link]('women', 'housewife') # Diversity


friendly

-0.14627530812290576
Doc2Vec
The same technique of word2vec is extrapolated to documents.
Here, we do everything done in word2vec + we vectorize the
documents too

import numpy as np

# 0 for male, 1 for female


y_posts =
[Link](([Link](len(filtered_male_posts))
,

[Link](len(filtered_female_posts))))

len(y_posts)

4842
Convolutional Neural Networks for
Sentence Classification
Train convolutional network for sentiment analysis.
Based on "Convolutional Neural Networks for Sentence
Classification" by Yoon Kim [Link]
For 'CNN-non-static' gets to 82.1% after 61 epochs with
following settings: embedding_dim = 20
filter_sizes = (3, 4) num_filters = 3 dropout_prob = (0.7, 0.8)
hidden_dims = 100
For 'CNN-rand' gets to 78-79% after 7-8 epochs with following
settings: embedding_dim = 20
filter_sizes = (3, 4) num_filters = 150 dropout_prob = (0.25, 0.5)
hidden_dims = 150
For 'CNN-static' gets to 75.4% after 7 epochs with following
settings: embedding_dim = 100
filter_sizes = (3, 4) num_filters = 150 dropout_prob = (0.25, 0.5)
hidden_dims = 150
it turns out that such a small data set as "Movie reviews
with one sentence per review" (Pang and Lee, 2005)
requires much smaller network than the one introduced in
the original article:
embedding dimension is only 20 (instead of 300; 'CNN-
static' still requires ~100)
2 filter sizes (instead of 3)
higher dropout probabilities and
3 filters per filter size is enough for 'CNN-non-static'
(instead of 100)
embedding initialization does not require prebuilt Google
Word2Vec data. Training Word2Vec on the same "Movie
reviews" data set is enough to achieve performance
reported in the article (81.6%)
** Another distinct difference is slidind MaxPooling window of
length=2 instead of MaxPooling over whole feature map as in
the article

import numpy as np
import word_embedding
from word2vec import train_word2vec

from [Link] import Sequential, Model


from [Link] import (Activation, Dense,
Dropout, Embedding,
Flatten, Input,
Conv1D, MaxPooling1D)
from [Link] import Concatenate

[Link](2)

Using gpu device 0: GeForce GTX 760 (CNMeM is


enabled with initial size: 90.0% of memory, cuDNN
4007)
Using Theano backend.

Parameters
Model Variations. See Kim Yoon's Convolutional Neural
Networks for Sentence Classification, Section 3 for detail.

model_variation = 'CNN-rand' # CNN-rand | CNN-


non-static | CNN-static
print('Model variation is %s' % model_variation)

Model variation is CNN-rand

# Model Hyperparameters
sequence_length = 56
embedding_dim = 20
filter_sizes = (3, 4)
num_filters = 150
dropout_prob = (0.25, 0.5)
hidden_dims = 150

# Training parameters
batch_size = 32
num_epochs = 100
val_split = 0.1

# Word2Vec parameters, see train_word2vec


min_word_count = 1 # Minimum word count
context = 10 # Context window size

Data Preparation
# Load data
print("Loading data...")
x, y, vocabulary, vocabulary_inv =
word_embedding.load_data()

if model_variation=='CNN-non-static' or
model_variation=='CNN-static':
embedding_weights = train_word2vec(x,
vocabulary_inv,

embedding_dim, min_word_count,
context)
if model_variation=='CNN-static':
x = embedding_weights[0][x]
elif model_variation=='CNN-rand':
embedding_weights = None
else:
raise ValueError('Unknown model variation')

Loading data...

# Shuffle data
shuffle_indices =
[Link]([Link](len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices].argmax(axis=1)

print("Vocabulary Size:
{:d}".format(len(vocabulary)))
Vocabulary Size: 18765

Building CNN Model

graph_in = Input(shape=(sequence_length,
embedding_dim))
convs = []
for fsz in filter_sizes:
conv = Conv1D(filters=num_filters,
filter_length=fsz,
padding='valid',
activation='relu',
strides=1)(graph_in)
pool = MaxPooling1D(pool_length=2)(conv)
flatten = Flatten()(pool)
[Link](flatten)

if len(filter_sizes)>1:
out = Concatenate()(convs)
else:
out = convs[0]

graph = Model(input=graph_in, output=out)

# main sequential model


model = Sequential()
if not model_variation=='CNN-static':
[Link](Embedding(len(vocabulary),
embedding_dim, input_length=sequence_length,

weights=embedding_weights))
[Link](Dropout(dropout_prob[0], input_shape=
(sequence_length, embedding_dim)))
[Link](graph)
[Link](Dense(hidden_dims))
[Link](Dropout(dropout_prob[1]))
[Link](Activation('relu'))
[Link](Dense(1))
[Link](Activation('sigmoid'))

[Link](loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])

# Training model
#
==================================================
[Link](x_shuffled, y_shuffled,
batch_size=batch_size,
nb_epoch=num_epochs,
validation_split=val_split, verbose=2)

Train on 9595 samples, validate on 1067 samples


Epoch 1/100
1s - loss: 0.6516 - acc: 0.6005 - val_loss: 0.5692
- val_acc: 0.7151
Epoch 2/100
1s - loss: 0.4556 - acc: 0.7896 - val_loss: 0.5154
- val_acc: 0.7573
Epoch 3/100
1s - loss: 0.3556 - acc: 0.8532 - val_loss: 0.5050
- val_acc: 0.7816
Epoch 4/100
1s - loss: 0.2978 - acc: 0.8779 - val_loss: 0.5335
- val_acc: 0.7901
Epoch 5/100
1s - loss: 0.2599 - acc: 0.8972 - val_loss: 0.5592
- val_acc: 0.7769
Epoch 6/100
1s - loss: 0.2248 - acc: 0.9112 - val_loss: 0.5559
- val_acc: 0.7685
Epoch 7/100
1s - loss: 0.1994 - acc: 0.9219 - val_loss: 0.5760
- val_acc: 0.7704
Epoch 8/100
1s - loss: 0.1801 - acc: 0.9326 - val_loss: 0.6014
- val_acc: 0.7788
Epoch 9/100
1s - loss: 0.1472 - acc: 0.9449 - val_loss: 0.6637
- val_acc: 0.7751
Epoch 10/100
1s - loss: 0.1269 - acc: 0.9537 - val_loss: 0.7281
- val_acc: 0.7563
Epoch 11/100
1s - loss: 0.1123 - acc: 0.9592 - val_loss: 0.7452
- val_acc: 0.7788
Epoch 12/100
1s - loss: 0.0897 - acc: 0.9658 - val_loss: 0.8504
- val_acc: 0.7591
Epoch 13/100
1s - loss: 0.0811 - acc: 0.9723 - val_loss: 0.8935
- val_acc: 0.7573
Epoch 14/100
1s - loss: 0.0651 - acc: 0.9764 - val_loss: 0.8738
- val_acc: 0.7685
Epoch 15/100
1s - loss: 0.0540 - acc: 0.9809 - val_loss: 0.9407
- val_acc: 0.7648
Epoch 16/100
1s - loss: 0.0408 - acc: 0.9857 - val_loss: 1.1880
- val_acc: 0.7638
Epoch 17/100
1s - loss: 0.0341 - acc: 0.9886 - val_loss: 1.2878
- val_acc: 0.7638
Epoch 18/100
1s - loss: 0.0306 - acc: 0.9901 - val_loss: 1.4448
- val_acc: 0.7573
Epoch 19/100
1s - loss: 0.0276 - acc: 0.9917 - val_loss: 1.5300
- val_acc: 0.7591
Epoch 20/100
1s - loss: 0.0249 - acc: 0.9917 - val_loss: 1.4825
- val_acc: 0.7666
Epoch 21/100
1s - loss: 0.0220 - acc: 0.9937 - val_loss: 1.4357
- val_acc: 0.7601
Epoch 22/100
1s - loss: 0.0188 - acc: 0.9945 - val_loss: 1.4081
- val_acc: 0.7657
Epoch 23/100
1s - loss: 0.0182 - acc: 0.9954 - val_loss: 1.7145
- val_acc: 0.7610
Epoch 24/100
1s - loss: 0.0129 - acc: 0.9964 - val_loss: 1.7047
- val_acc: 0.7704
Epoch 25/100
1s - loss: 0.0064 - acc: 0.9981 - val_loss: 1.9119
- val_acc: 0.7629
Epoch 26/100
1s - loss: 0.0108 - acc: 0.9969 - val_loss: 1.8306
- val_acc: 0.7704
Epoch 27/100
1s - loss: 0.0105 - acc: 0.9973 - val_loss: 1.9624
- val_acc: 0.7619
Epoch 28/100
1s - loss: 0.0112 - acc: 0.9973 - val_loss: 1.8552
- val_acc: 0.7694
Epoch 29/100
1s - loss: 0.0110 - acc: 0.9968 - val_loss: 1.8585
- val_acc: 0.7657
Epoch 30/100
1s - loss: 0.0071 - acc: 0.9983 - val_loss: 2.0571
- val_acc: 0.7694
Epoch 31/100
1s - loss: 0.0089 - acc: 0.9975 - val_loss: 2.0361
- val_acc: 0.7629
Epoch 32/100
1s - loss: 0.0074 - acc: 0.9978 - val_loss: 2.0010
- val_acc: 0.7648
Epoch 33/100
1s - loss: 0.0074 - acc: 0.9981 - val_loss: 2.0995
- val_acc: 0.7498
Epoch 34/100
1s - loss: 0.0125 - acc: 0.9971 - val_loss: 2.2003
- val_acc: 0.7610
Epoch 35/100
1s - loss: 0.0074 - acc: 0.9981 - val_loss: 2.1526
- val_acc: 0.7582
Epoch 36/100
1s - loss: 0.0068 - acc: 0.9984 - val_loss: 2.1754
- val_acc: 0.7648
Epoch 37/100
1s - loss: 0.0065 - acc: 0.9979 - val_loss: 2.0810
- val_acc: 0.7498
Epoch 38/100
1s - loss: 0.0078 - acc: 0.9980 - val_loss: 2.3443
- val_acc: 0.7460
Epoch 39/100
1s - loss: 0.0038 - acc: 0.9991 - val_loss: 2.1696
- val_acc: 0.7629
Epoch 40/100
1s - loss: 0.0062 - acc: 0.9985 - val_loss: 2.2752
- val_acc: 0.7545
Epoch 41/100
1s - loss: 0.0044 - acc: 0.9985 - val_loss: 2.3457
- val_acc: 0.7535
Epoch 42/100
1s - loss: 0.0066 - acc: 0.9985 - val_loss: 2.1172
- val_acc: 0.7629
Epoch 43/100
1s - loss: 0.0052 - acc: 0.9987 - val_loss: 2.3550
- val_acc: 0.7619
Epoch 44/100
1s - loss: 0.0024 - acc: 0.9993 - val_loss: 2.3832
- val_acc: 0.7610
Epoch 45/100
1s - loss: 0.0042 - acc: 0.9989 - val_loss: 2.4242
- val_acc: 0.7648
Epoch 46/100
1s - loss: 0.0048 - acc: 0.9990 - val_loss: 2.4529
- val_acc: 0.7563
Epoch 47/100
1s - loss: 0.0036 - acc: 0.9994 - val_loss: 2.8412
- val_acc: 0.7282
Epoch 48/100
1s - loss: 0.0037 - acc: 0.9991 - val_loss: 2.4515
- val_acc: 0.7619
Epoch 49/100
1s - loss: 0.0031 - acc: 0.9991 - val_loss: 2.4849
- val_acc: 0.7676
Epoch 50/100
1s - loss: 0.0078 - acc: 0.9990 - val_loss: 2.5083
- val_acc: 0.7563
Epoch 51/100
1s - loss: 0.0105 - acc: 0.9981 - val_loss: 2.3538
- val_acc: 0.7601
Epoch 52/100
1s - loss: 0.0076 - acc: 0.9986 - val_loss: 2.4405
- val_acc: 0.7685
Epoch 53/100
1s - loss: 0.0043 - acc: 0.9991 - val_loss: 2.5753
- val_acc: 0.7591
Epoch 54/100
1s - loss: 0.0044 - acc: 0.9989 - val_loss: 2.5550
- val_acc: 0.7582
Epoch 55/100
1s - loss: 0.0034 - acc: 0.9994 - val_loss: 2.6361
- val_acc: 0.7591
Epoch 56/100
1s - loss: 0.0041 - acc: 0.9994 - val_loss: 2.6753
- val_acc: 0.7563
Epoch 57/100
1s - loss: 0.0042 - acc: 0.9990 - val_loss: 2.6464
- val_acc: 0.7601
Epoch 58/100
1s - loss: 0.0037 - acc: 0.9992 - val_loss: 2.6616
- val_acc: 0.7582
Epoch 59/100
1s - loss: 0.0060 - acc: 0.9990 - val_loss: 2.6052
- val_acc: 0.7619
Epoch 60/100
1s - loss: 0.0051 - acc: 0.9990 - val_loss: 2.7033
- val_acc: 0.7498
Epoch 61/100
1s - loss: 0.0034 - acc: 0.9994 - val_loss: 2.7142
- val_acc: 0.7526
Epoch 62/100
1s - loss: 0.0047 - acc: 0.9994 - val_loss: 2.7656
- val_acc: 0.7591
Epoch 63/100
1s - loss: 0.0083 - acc: 0.9990 - val_loss: 2.7971
- val_acc: 0.7526
Epoch 64/100
1s - loss: 0.0046 - acc: 0.9992 - val_loss: 2.6585
- val_acc: 0.7545
Epoch 65/100
1s - loss: 0.0062 - acc: 0.9989 - val_loss: 2.6194
- val_acc: 0.7535
Epoch 66/100
1s - loss: 0.0062 - acc: 0.9993 - val_loss: 2.6255
- val_acc: 0.7694
Epoch 67/100
1s - loss: 0.0036 - acc: 0.9990 - val_loss: 2.6384
- val_acc: 0.7582
Epoch 68/100
1s - loss: 0.0066 - acc: 0.9991 - val_loss: 2.6743
- val_acc: 0.7648
Epoch 69/100
1s - loss: 0.0030 - acc: 0.9995 - val_loss: 2.8236
- val_acc: 0.7535
Epoch 70/100
1s - loss: 0.0048 - acc: 0.9993 - val_loss: 2.7829
- val_acc: 0.7610
Epoch 71/100
1s - loss: 0.0062 - acc: 0.9990 - val_loss: 2.6402
- val_acc: 0.7573
Epoch 72/100
1s - loss: 0.0037 - acc: 0.9992 - val_loss: 2.9089
- val_acc: 0.7526
Epoch 73/100
1s - loss: 0.0069 - acc: 0.9985 - val_loss: 2.7071
- val_acc: 0.7535
Epoch 74/100
1s - loss: 0.0033 - acc: 0.9995 - val_loss: 2.6727
- val_acc: 0.7601
Epoch 75/100
1s - loss: 0.0069 - acc: 0.9990 - val_loss: 2.6967
- val_acc: 0.7601
Epoch 76/100
1s - loss: 0.0089 - acc: 0.9989 - val_loss: 2.7479
- val_acc: 0.7666
Epoch 77/100
1s - loss: 0.0046 - acc: 0.9994 - val_loss: 2.7192
- val_acc: 0.7629
Epoch 78/100
1s - loss: 0.0069 - acc: 0.9989 - val_loss: 2.7173
- val_acc: 0.7629
Epoch 79/100
1s - loss: 8.6550e-04 - acc: 0.9998 - val_loss:
2.7283 - val_acc: 0.7601
Epoch 80/100
1s - loss: 0.0011 - acc: 0.9995 - val_loss: 2.8405
- val_acc: 0.7629
Epoch 81/100
1s - loss: 0.0040 - acc: 0.9994 - val_loss: 2.8725
- val_acc: 0.7619
Epoch 82/100
1s - loss: 0.0055 - acc: 0.9992 - val_loss: 2.8490
- val_acc: 0.7601
Epoch 83/100
1s - loss: 0.0059 - acc: 0.9989 - val_loss: 2.7838
- val_acc: 0.7545
Epoch 84/100
1s - loss: 0.0054 - acc: 0.9994 - val_loss: 2.8706
- val_acc: 0.7526
Epoch 85/100
1s - loss: 0.0060 - acc: 0.9992 - val_loss: 2.9374
- val_acc: 0.7516
Epoch 86/100
1s - loss: 0.0087 - acc: 0.9982 - val_loss: 2.7966
- val_acc: 0.7573
Epoch 87/100
1s - loss: 0.0084 - acc: 0.9991 - val_loss: 2.8620
- val_acc: 0.7619
Epoch 88/100
1s - loss: 0.0053 - acc: 0.9990 - val_loss: 2.8450
- val_acc: 0.7601
Epoch 89/100
1s - loss: 0.0054 - acc: 0.9990 - val_loss: 2.8303
- val_acc: 0.7629
Epoch 90/100
1s - loss: 0.0073 - acc: 0.9991 - val_loss: 2.8474
- val_acc: 0.7657
Epoch 91/100
1s - loss: 0.0037 - acc: 0.9994 - val_loss: 3.0151
- val_acc: 0.7432
Epoch 92/100
1s - loss: 0.0017 - acc: 0.9999 - val_loss: 2.9555
- val_acc: 0.7582
Epoch 93/100
1s - loss: 0.0080 - acc: 0.9991 - val_loss: 2.9178
- val_acc: 0.7554
Epoch 94/100
1s - loss: 0.0078 - acc: 0.9991 - val_loss: 2.8724
- val_acc: 0.7582
Epoch 95/100
1s - loss: 0.0012 - acc: 0.9997 - val_loss: 2.9582
- val_acc: 0.7545
Epoch 96/100
1s - loss: 0.0058 - acc: 0.9989 - val_loss: 2.8944
- val_acc: 0.7479
Epoch 97/100
1s - loss: 0.0094 - acc: 0.9985 - val_loss: 2.7146
- val_acc: 0.7516
Epoch 98/100
1s - loss: 0.0044 - acc: 0.9993 - val_loss: 2.9052
- val_acc: 0.7498
Epoch 99/100
1s - loss: 0.0030 - acc: 0.9995 - val_loss: 3.1474
- val_acc: 0.7470
Epoch 100/100
1s - loss: 0.0051 - acc: 0.9990 - val_loss: 3.1746
- val_acc: 0.7451

<[Link] at 0x7f78362ae400>
Another Example
Using Keras + GloVe - Global Vectors for Word
Representation
Recurrent Neural networks
RNN

A recurrent neural network (RNN) is a class of artificial neural


network where connections between units form a directed
cycle. This creates an internal state of the network which allows
it to exhibit dynamic temporal behavior.

[Link](units,
activation='tanh', use_bias=True,

kernel_initializer='glorot_uniform',

recurrent_initializer='orthogonal',

bias_initializer='zeros',

kernel_regularizer=None,

recurrent_regularizer=None,

bias_regularizer=None,

activity_regularizer=None,
kernel_constraint=None, recurrent_constraint=None,

bias_constraint=None, dropout=0.0,
recurrent_dropout=0.0)

Arguments:
units: Positive integer, dimensionality of the output space.
activation: Activation function to use (see activations). If
you pass None, no activation is applied (ie. "linear"
activation: a(x) = x ).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights
matrix, used for the linear transformation of the inputs. (see
initializers).
recurrent_initializer: Initializer for the
recurrent_kernel weights matrix, used for the linear
transformation of the recurrent state. (see initializers).
bias_initializer: Initializer for the bias vector (see
initializers).
kernel_regularizer: Regularizer function applied to the
kernel weights matrix (see regularizer).
recurrent_regularizer: Regularizer function applied to the
recurrent_kernel weights matrix (see regularizer).
bias_regularizer: Regularizer function applied to the bias
vector (see regularizer).
activity_regularizer: Regularizer function applied to the
output of the layer (its "activation"). (see regularizer).
kernel_constraint: Constraint function applied to the
kernel weights matrix (see constraints).
recurrent_constraint: Constraint function applied to the
recurrent_kernel weights matrix (see constraints).
bias_constraint: Constraint function applied to the bias
vector (see constraints).
dropout: Float between 0 and 1. Fraction of the units to
drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the
units to drop for the linear transformation of the recurrent
state.

Backprop Through time


Contrary to feed-forward neural networks, the RNN is
characterized by the ability of encoding longer past information,
thus very suitable for sequential models. The BPTT extends the
ordinary BP algorithm to suit the recurrent neural architecture.

Reference: Backpropagation through Time

%matplotlib inline

import numpy as np
import pandas as pd
import theano
import [Link] as T
import keras
import numpy as np
import [Link] as plt

from [Link] import LabelEncoder


from [Link] import StandardScaler
from sklearn.model_selection import
train_test_split
# -- Keras Import
from [Link] import Sequential
from [Link] import Dense, Activation
from [Link] import image

from [Link] import imdb


from [Link] import mnist

from [Link] import Sequential


from [Link] import Dense, Dropout,
Activation, Flatten
from [Link] import Conv2D, MaxPooling2D

from [Link] import np_utils


from [Link] import sequence
from [Link] import Embedding
from [Link] import LSTM, GRU,
SimpleRNN

from [Link] import Activation,


TimeDistributed, RepeatVector
from [Link] import EarlyStopping,
ModelCheckpoint

Using TensorFlow backend.


IMDB sentiment classification task
This is a dataset for binary sentiment classification containing
substantially more data than previous benchmark datasets.
IMDB provided a set of 25,000 highly polar movie reviews for
training, and 25,000 for testing.
There is additional unlabeled data for use as well. Raw text and
already processed bag of words formats are provided.
[Link]

Data Preparation - IMDB

max_features = 20000
maxlen = 100 # cut texts after this number of
words (among top max_features most common words)
batch_size = 32

print("Loading data...")
(X_train, y_train), (X_test, y_test) =
imdb.load_data(num_words=max_features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Example:')
print(X_train[:1])

print("Pad sequences (samples x time)")


X_train = sequence.pad_sequences(X_train,
maxlen=maxlen)
X_test = sequence.pad_sequences(X_test,
maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

Loading data...
Downloading data from
[Link]
25000 train sequences
25000 test sequences
Example:
[ [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65,
458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100,
43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150,
4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536,
1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6,
147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22,
71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4,
22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62,
386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16,
480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25,
124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12,
215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4,
107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723,
36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4,
12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32,
2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22,
21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51,
36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38,
1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32,
15, 16, 5345, 19, 178, 32]]
Pad sequences (samples x time)
X_train shape: (25000, 100)
X_test shape: (25000, 100)
Model building

print('Build model...')
model = Sequential()
[Link](Embedding(max_features, 128,
input_length=maxlen))
[Link](SimpleRNN(128))
[Link](Dropout(0.5))
[Link](Dense(1))
[Link](Activation('sigmoid'))

# try using different optimizers and different


optimizer configs
[Link](loss='binary_crossentropy',
optimizer='adam')

print("Train...")
[Link](X_train, y_train, batch_size=batch_size,
epochs=1,
validation_data=(X_test, y_test))

Build model...
Train...

/Users/valerio/anaconda3/envs/deep-learning-
pydatait-tutorial/lib/python3.5/site-
packages/keras/backend/tensorflow_backend.py:2094:
UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with
Tensorflow backend
[Link]('\n'.join(msg))
Train on 25000 samples, validate on 25000 samples
Epoch 1/1
25000/25000 [==============================] -
104s - loss: 0.7329 - val_loss: 0.6832

<[Link] at 0x138767780>

LSTM
A LSTM network is an artificial neural network that contains
LSTM blocks instead of, or in addition to, regular network units.
A LSTM block may be described as a "smart" network unit that
can remember a value for an arbitrary length of time.
Unlike traditional RNNs, an Long short-term memory network is
well-suited to learn from experience to classify, process and
predict time series when there are very long time lags of
unknown size between important events.

[Link](units,
activation='tanh',
recurrent_activation='hard_sigmoid',
use_bias=True,

kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',

bias_initializer='zeros', unit_forget_bias=True,
kernel_regularizer=None,

recurrent_regularizer=None, bias_regularizer=None,
activity_regularizer=None,

kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None,
dropout=0.0,
recurrent_dropout=0.0)

Arguments
units: Positive integer, dimensionality of the output space.
activation: Activation function to use If you pass None, no
activation is applied (ie. "linear" activation: a(x) = x ).
recurrent_activation: Activation function to use for the
recurrent step.
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the kernel weights
matrix, used for the linear transformation of the inputs.
recurrent_initializer: Initializer for the
recurrent_kernel weights matrix, used for the linear
transformation of the recurrent state.
bias_initializer: Initializer for the bias vector.
unit_forget_bias: Boolean. If True, add 1 to the bias of the
forget gate at initialization. Setting it to true will also force
bias_initializer="zeros" . This is recommended in
Jozefowicz et al.
kernel_regularizer: Regularizer function applied to the
kernel weights matrix.
recurrent_regularizer: Regularizer function applied to the
recurrent_kernel weights matrix.
bias_regularizer: Regularizer function applied to the bias
vector.
activity_regularizer: Regularizer function applied to the
output of the layer (its "activation").
kernel_constraint: Constraint function applied to the
kernel weights matrix.
recurrent_constraint: Constraint function applied to the
recurrent_kernel weights matrix.
bias_constraint: Constraint function applied to the bias
vector.
dropout: Float between 0 and 1. Fraction of the units to
drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the
units to drop for the linear transformation of the recurrent
state.

GRU
Gated recurrent units are a gating mechanism in recurrent
neural networks.
Much similar to the LSTMs, they have fewer parameters than
LSTM, as they lack an output gate.
[Link](units,
activation='tanh',
recurrent_activation='hard_sigmoid',
use_bias=True,

kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',

bias_initializer='zeros', kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None,

recurrent_constraint=None, bias_constraint=None,
dropout=0.0,
recurrent_dropout=0.0)

Your Turn! - Hands on Rnn

print('Build model...')
model = Sequential()
[Link](Embedding(max_features, 128,
input_length=maxlen))

# !!! Play with those! try and get better results!


#[Link](SimpleRNN(128))
#[Link](GRU(128))
#[Link](LSTM(128))

[Link](Dropout(0.5))
[Link](Dense(1))
[Link](Activation('sigmoid'))

# try using different optimizers and different


optimizer configs
[Link](loss='binary_crossentropy',
optimizer='adam')

print("Train...")
[Link](X_train, y_train, batch_size=batch_size,
epochs=4, validation_data=(X_test,
y_test))
score, acc = [Link](X_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
Convolutional LSTM
This section demonstrates the use of a Convolutional
LSTM network.
This network is used to predict the next frame of an
artificially generated movie which contains moving squares.

Artificial Data Generation


Generate movies with 3 to 7 moving squares inside.

The squares are of shape $1 \times 1$ or $2 \times 2$ pixels,


which move linearly over time.
For convenience we first create movies with bigger width and
height ( 80x80 ) and at the end we select a $40 \times 40$
window.

# Artificial Data Generation


def generate_movies(n_samples=1200, n_frames=15):
row = 80
col = 80
noisy_movies = [Link]((n_samples, n_frames,
row, col, 1), dtype=[Link])
shifted_movies = [Link]((n_samples,
n_frames, row, col, 1),
dtype=[Link])

for i in range(n_samples):
# Add 3 to 7 moving squares
n = [Link](3, 8)
for j in range(n):
# Initial position
xstart = [Link](20, 60)
ystart = [Link](20, 60)
# Direction of motion
directionx = [Link](0, 3) -
1
directiony = [Link](0, 3) -
1

# Size of the square


w = [Link](2, 4)

for t in range(n_frames):
x_shift = xstart + directionx * t
y_shift = ystart + directiony * t
noisy_movies[i, t, x_shift - w:
x_shift + w,
y_shift - w: y_shift
+ w, 0] += 1

# Make it more robust by adding


noise.
# The idea is that if during
inference,
# the value of the pixel is not
exactly one,
# we need to train the network to
be robust and still
# consider it as a pixel belonging
to a square.
if [Link](0, 2):
noise_f =
(-1)**[Link](0, 2)
noisy_movies[i, t,
x_shift - w - 1:
x_shift + w + 1,
y_shift - w - 1:
y_shift + w + 1,
0] += noise_f *
0.1

# Shift the ground truth by 1


x_shift = xstart + directionx * (t
+ 1)
y_shift = ystart + directiony * (t
+ 1)
shifted_movies[i, t, x_shift - w:
x_shift + w,
y_shift - w:
y_shift + w, 0] += 1

# Cut to a 40x40 window


noisy_movies = noisy_movies[::, ::, 20:60,
20:60, ::]
shifted_movies = shifted_movies[::, ::, 20:60,
20:60, ::]
noisy_movies[noisy_movies >= 1] = 1
shifted_movies[shifted_movies >= 1] = 1
return noisy_movies, shifted_movies

Model

from [Link] import Sequential


from [Link] import Conv3D
from [Link].convolutional_recurrent import
ConvLSTM2D
from [Link] import
BatchNormalization
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

Using TensorFlow backend.

We create a layer which take as input movies of shape


(n_frames, width, height, channels) and returns a
movie of identical shape.

seq = Sequential()
[Link](ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 40, 40, 1),
padding='same',
return_sequences=True))
[Link](BatchNormalization())

[Link](ConvLSTM2D(filters=40, kernel_size=(3, 3),


padding='same',
return_sequences=True))
[Link](BatchNormalization())

[Link](ConvLSTM2D(filters=40, kernel_size=(3, 3),


padding='same',
return_sequences=True))
[Link](BatchNormalization())

[Link](ConvLSTM2D(filters=40, kernel_size=(3, 3),


padding='same',
return_sequences=True))
[Link](BatchNormalization())

[Link](Conv3D(filters=1, kernel_size=(3, 3, 3),


activation='sigmoid',
padding='same',
data_format='channels_last'))
[Link](loss='binary_crossentropy',
optimizer='adadelta')

Train the Network

Beware: This takes time (~3 mins per epoch on


my hardware)

# Train the network


noisy_movies, shifted_movies =
generate_movies(n_samples=1200)
[Link](noisy_movies[:1000],
shifted_movies[:1000], batch_size=10,
epochs=20, validation_split=0.05)

Train on 950 samples, validate on 50 samples


Epoch 1/50
950/950 [==============================] - 180s -
loss: 0.3293 - val_loss: 0.6113
Epoch 2/50
950/950 [==============================] - 181s -
loss: 0.0629 - val_loss: 0.4206
Epoch 3/50
950/950 [==============================] - 180s -
loss: 0.0187 - val_loss: 0.2585
Epoch 4/50
950/950 [==============================] - 180s -
loss: 0.0062 - val_loss: 0.2087
Epoch 5/50
950/950 [==============================] - 179s -
loss: 0.0134 - val_loss: 0.1884
Epoch 6/50
950/950 [==============================] - 180s -
loss: 0.0024 - val_loss: 0.1025
Epoch 7/50
950/950 [==============================] - 179s -
loss: 0.0013 - val_loss: 0.0079
Epoch 8/50
950/950 [==============================] - 180s -
loss: 8.1664e-04 - val_loss: 7.7649e-04
Epoch 9/50
950/950 [==============================] - 180s -
loss: 5.9629e-04 - val_loss: 4.9810e-04
Epoch 10/50
950/950 [==============================] - 180s -
loss: 4.8772e-04 - val_loss: 4.5704e-04
Epoch 11/50
950/950 [==============================] - 179s -
loss: 4.1252e-04 - val_loss: 3.7326e-04
Epoch 12/50
950/950 [==============================] - 180s -
loss: 3.6413e-04 - val_loss: 3.3256e-04
Epoch 13/50
950/950 [==============================] - 179s -
loss: 3.2918e-04 - val_loss: 2.8421e-04
Epoch 14/50
950/950 [==============================] - 179s -
loss: 2.9520e-04 - val_loss: 2.8827e-04
Epoch 15/50
950/950 [==============================] - 179s -
loss: 2.7647e-04 - val_loss: 2.5144e-04
Epoch 16/50
950/950 [==============================] - 181s -
loss: 2.5863e-04 - val_loss: 2.5015e-04
Epoch 17/50
950/950 [==============================] - 180s -
loss: 2.4067e-04 - val_loss: 2.2645e-04
Epoch 18/50
950/950 [==============================] - 180s -
loss: 2.2378e-04 - val_loss: 2.1206e-04
Epoch 19/50
950/950 [==============================] - 179s -
loss: 2.1416e-04 - val_loss: 2.0406e-04
Epoch 20/50
950/950 [==============================] - 179s -
loss: 2.0244e-04 - val_loss: 1.9820e-04
Epoch 21/50
20/950 [..............................] - ETA:
170s - loss: 1.8054e-04

--------------------------------------------------
-------------------------

KeyboardInterrupt
Traceback (most recent call last)

<ipython-input-4-5547645715ec> in <module>()
2 noisy_movies, shifted_movies =
generate_movies(n_samples=1200)
3 [Link](noisy_movies[:1000],
shifted_movies[:1000], batch_size=10,
----> 4 epochs=50, validation_split=0.05)

/home/valerio/anaconda3/lib/python3.5/site-
packages/keras/[Link] in fit(self, x, y,
batch_size, epochs, verbose, callbacks,
validation_split, validation_data, shuffle,
class_weight, sample_weight, initial_epoch,
**kwargs)
854
class_weight=class_weight,
855
sample_weight=sample_weight,
--> 856
initial_epoch=initial_epoch)
857
858 def evaluate(self, x, y,
batch_size=32, verbose=1,

/home/valerio/anaconda3/lib/python3.5/site-
packages/keras/engine/[Link] in fit(self, x,
y, batch_size, epochs, verbose, callbacks,
validation_split, validation_data, shuffle,
class_weight, sample_weight, initial_epoch,
**kwargs)
1496 val_f=val_f,
val_ins=val_ins, shuffle=shuffle,
1497
callback_metrics=callback_metrics,
-> 1498
initial_epoch=initial_epoch)
1499
1500 def evaluate(self, x, y,
batch_size=32, verbose=1, sample_weight=None):

/home/valerio/anaconda3/lib/python3.5/site-
packages/keras/engine/[Link] in
_fit_loop(self, f, ins, out_labels, batch_size,
epochs, verbose, callbacks, val_f, val_ins,
shuffle, callback_metrics, initial_epoch)
1150 batch_logs['size'] =
len(batch_ids)
1151
callbacks.on_batch_begin(batch_index, batch_logs)
-> 1152 outs = f(ins_batch)
1153 if not isinstance(outs,
list):
1154 outs = [outs]
/home/valerio/anaconda3/lib/python3.5/site-
packages/keras/backend/tensorflow_backend.py in
__call__(self, inputs)
2227 session = get_session()
2228 updated = [Link]([Link]
+ [self.updates_op],
-> 2229
feed_dict=feed_dict)
2230 return updated[:len([Link])]
2231

/home/valerio/anaconda3/lib/python3.5/site-
packages/tensorflow/python/client/[Link] in
run(self, fetches, feed_dict, options,
run_metadata)
776 try:
777 result = self._run(None, fetches,
feed_dict, options_ptr,
--> 778 run_metadata_ptr)
779 if run_metadata:
780 proto_data =
tf_session.TF_GetBuffer(run_metadata_ptr)

/home/valerio/anaconda3/lib/python3.5/site-
packages/tensorflow/python/client/[Link] in
_run(self, handle, fetches, feed_dict, options,
run_metadata)
980 if final_fetches or final_targets:
981 results = self._do_run(handle,
final_targets, final_fetches,
--> 982
feed_dict_string, options, run_metadata)
983 else:
984 results = []

/home/valerio/anaconda3/lib/python3.5/site-
packages/tensorflow/python/client/[Link] in
_do_run(self, handle, target_list, fetch_list,
feed_dict, options, run_metadata)
1030 if handle is None:
1031 return self._do_call(_run_fn,
self._session, feed_dict, fetch_list,
-> 1032 target_list,
options, run_metadata)
1033 else:
1034 return self._do_call(_prun_fn,
self._session, handle, feed_dict,

/home/valerio/anaconda3/lib/python3.5/site-
packages/tensorflow/python/client/[Link] in
_do_call(self, fn, *args)
1037 def _do_call(self, fn, *args):
1038 try:
-> 1039 return fn(*args)
1040 except [Link] as e:
1041 message = compat.as_text([Link])

/home/valerio/anaconda3/lib/python3.5/site-
packages/tensorflow/python/client/[Link] in
_run_fn(session, feed_dict, fetch_list,
target_list, options, run_metadata)
1019 return tf_session.TF_Run(session,
options,
1020
feed_dict, fetch_list, target_list,
-> 1021 status,
run_metadata)
1022
1023 def _prun_fn(session, handle,
feed_dict, fetch_list):

KeyboardInterrupt:

Test the Network

# Testing the network on one movie


# feed it with the first 7 positions and then
# predict the new positions
which = 1004
track = noisy_movies[which][:7, ::, ::, ::]

for j in range(16):
new_pos = [Link](track[[Link], ::,
::, ::, ::])
new = new_pos[::, -1, ::, ::, ::]
track = [Link]((track, new), axis=0)

# And then compare the predictions


# to the ground truth
track2 = noisy_movies[which][::, ::, ::, ::]
for i in range(15):
fig = [Link](figsize=(10, 5))

ax = fig.add_subplot(121)

if i >= 7:
[Link](1, 3, 'Predictions !',
fontsize=20, color='w')
else:
[Link](1, 3, 'Inital trajectory',
fontsize=20)

toplot = track[i, ::, ::, 0]

[Link](toplot)
ax = fig.add_subplot(122)
[Link](1, 3, 'Ground truth', fontsize=20)

toplot = track2[i, ::, ::, 0]


if i >= 2:
toplot = shifted_movies[which][i - 1, ::,
::, 0]

[Link](toplot)
[Link]('img/convlstm/%i_animate.png' % (i
+ 1))
RNN using LSTM

source: [Link]
LSTMs
from [Link] import SGD
from [Link] import one_hot,
text_to_word_sequence
from [Link] import np_utils
from [Link] import Sequential
from [Link] import Dense, Dropout,
Activation
from [Link] import Embedding
from [Link] import LSTM, GRU
from [Link] import sequence

Reading blog post from data directory

import os
import pickle
import numpy as np

DATA_DIRECTORY =
[Link]([Link]([Link]),
'..', 'data', 'word_embeddings')
print(DATA_DIRECTORY)

male_posts = []
female_post = []

with
open([Link](DATA_DIRECTORY,"male_blog_list.t
xt"),"rb") as male_file:
male_posts= [Link](male_file)
with
open([Link](DATA_DIRECTORY,"female_blog_list
.txt"),"rb") as female_file:
female_posts = [Link](female_file)

filtered_male_posts = list(filter(lambda p: len(p)


> 0, male_posts))
filtered_female_posts = list(filter(lambda p:
len(p) > 0, female_posts))

# text processing - one hot builds index of the


words
male_one_hot = []
female_one_hot = []
n = 30000
for post in filtered_male_posts:
try:
male_one_hot.append(one_hot(post, n,
split=" ", lower=True))
except:
continue

for post in filtered_female_posts:


try:

female_one_hot.append(one_hot(post,n,split=" ",
lower=True))
except:
continue

# 0 for male, 1 for female


concatenate_array_rnn =
[Link](([Link](len(male_one_hot)),

[Link](len(female_one_hot))))

from sklearn.model_selection import


train_test_split

X_train_rnn, X_test_rnn, y_train_rnn, y_test_rnn =


train_test_split([Link]((female_one_hot,ma
le_one_hot)),

concatenate_array_rnn,

test_size=0.2)

maxlen = 100
X_train_rnn = sequence.pad_sequences(X_train_rnn,
maxlen=maxlen)
X_test_rnn = sequence.pad_sequences(X_test_rnn,
maxlen=maxlen)
print('X_train_rnn shape:', X_train_rnn.shape,
y_train_rnn.shape)
print('X_test_rnn shape:', X_test_rnn.shape,
y_test_rnn.shape)

max_features = 30000
dimension = 128
output_dimension = 128
model = Sequential()
[Link](Embedding(max_features, dimension))
[Link](LSTM(output_dimension))
[Link](Dropout(0.5))
[Link](Dense(1))
[Link](Activation('sigmoid'))

[Link](loss='mean_squared_error',
optimizer='sgd', metrics=['accuracy'])

[Link](X_train_rnn, y_train_rnn, batch_size=32,


epochs=4, validation_data=(X_test_rnn,
y_test_rnn))

score, acc = [Link](X_test_rnn,


y_test_rnn, batch_size=32)

print(score, acc)
Using TFIDF Vectorizer as an input
instead of one hot encoder

from sklearn.feature_extraction.text import


TfidfVectorizer

vectorizer =
TfidfVectorizer(decode_error='ignore', norm='l2',
min_df=5)
tfidf_male =
vectorizer.fit_transform(filtered_male_posts)
tfidf_female =
vectorizer.fit_transform(filtered_female_posts)

flattened_array_tfidf_male = tfidf_male.toarray()
flattened_array_tfidf_female =
tfidf_male.toarray()

y_rnn =
[Link](([Link](len(flattened_array_tfidf
_male)),

[Link](len(flattened_array_tfidf_female))))

X_train_rnn, X_test_rnn, y_train_rnn, y_test_rnn =


train_test_split([Link]((flattened_array_t
fidf_male,

flattened_array_tfidf_female)),

y_rnn,test_size=0.2)

maxlen = 100
X_train_rnn = sequence.pad_sequences(X_train_rnn,
maxlen=maxlen)
X_test_rnn = sequence.pad_sequences(X_test_rnn,
maxlen=maxlen)
print('X_train_rnn shape:', X_train_rnn.shape,
y_train_rnn.shape)
print('X_test_rnn shape:', X_test_rnn.shape,
y_test_rnn.shape)

max_features = 30000
model = Sequential()
[Link](Embedding(max_features, dimension))
[Link](LSTM(output_dimension))
[Link](Dropout(0.5))
[Link](Dense(1))
[Link](Activation('sigmoid'))

[Link](loss='mean_squared_error',optimizer=
'sgd', metrics=['accuracy'])

[Link](X_train_rnn, y_train_rnn,
batch_size=32, epochs=1,
validation_data=(X_test_rnn,
y_test_rnn))

score,acc = [Link](X_test_rnn, y_test_rnn,


batch_size=32)

print(score, acc)
Sentence Generation using LSTM

# reading all the male text data into one string


male_post = ' '.join(filtered_male_posts)

#building character set for the male posts


character_set_male = set(male_post)
#building two indices - character index and index
of character
char_indices = dict((c, i) for i, c in
enumerate(character_set_male))
indices_char = dict((i, c) for i, c in
enumerate(character_set_male))

# cut the text in semi-redundant sequences of


maxlen characters
maxlen = 20
step = 1
sentences = []
next_chars = []
for i in range(0, len(male_post) - maxlen, step):
[Link](male_post[i : i + maxlen])
next_chars.append(male_post[i + maxlen])

#Vectorisation of input
x_male = [Link]((len(male_post), maxlen,
len(character_set_male)), dtype=[Link])
y_male = [Link]((len(male_post),
len(character_set_male)), dtype=[Link])

print(x_male.shape, y_male.shape)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
x_male[i, t, char_indices[char]] = 1
y_male[i, char_indices[next_chars[i]]] = 1

print(x_male.shape, y_male.shape)

# build the model: a single LSTM


print('Build model...')
model = Sequential()
[Link](LSTM(128, input_shape=(maxlen,
len(character_set_male))))
[Link](Dense(len(character_set_male)))
[Link](Activation('softmax'))

optimizer = RMSprop(lr=0.01)
[Link](loss='categorical_crossentropy',
optimizer=optimizer, metrics=['accuracy'])

auto_text_generating_male_model.compile(loss='mean
_squared_error',optimizer='sgd')

import random, sys

# helper function to sample an index from a


probability array
def sample(a, diversity=0.75):
if [Link]() > diversity:
return [Link](a)
while 1:
i = [Link](0, len(a)-1)
if a[i] > [Link]():
return i

# train the model, output generated text after


each iteration
for iteration in range(1,10):
print()
print('-' * 50)
print('Iteration', iteration)
[Link](x_male, y_male, batch_size=128,
epochs=1)

start_index = [Link](0, len(male_post)


- maxlen - 1)

for diversity in [0.2, 0.4, 0.6, 0.8]:


print()
print('----- diversity:', diversity)

generated = ''
sentence = male_post[start_index :
start_index + maxlen]
generated += sentence
print('----- Generating with seed: "' +
sentence + '"')

for iteration in range(400):


try:
x = [Link]((1, maxlen,
len(character_set_male)))
for t, char in
enumerate(sentence):
x[0, t, char_indices[char]] =
1.

preds = [Link](x,
verbose=0)[0]
next_index = sample(preds,
diversity)
next_char =
indices_char[next_index]

generated += next_char
sentence = sentence[1:] +
next_char
except:
continue

print(sentence)
print()
Custom Keras Layer

Idea:
We build a custom activation layer called Antirectifier, which
modifies the shape of the tensor that passes through it.
We need to specify two methods: get_output_shape_for
and call .

Note that the same result can also be achieved via a Lambda
layer ( [Link] ).

[Link](function,
output_shape=None, arguments=None)

Because our custom layer is written with primitives from the


Keras backend ( K ), our code can run both on TensorFlow and
Theano.

from [Link] import Sequential


from [Link] import Dense, Dropout, Layer,
Activation
from [Link] import mnist
from keras import backend as K
from [Link] import np_utils
Using TensorFlow backend.
AntiRectifier Layer

class Antirectifier(Layer):
'''This is the combination of a sample-wise
L2 normalization with the concatenation of the
positive part of the input with the negative
part
of the input. The result is a tensor of
samples that are
twice as large as the input samples.

It can be used in place of a ReLU.

# Input shape
2D tensor of shape (samples, n)

# Output shape
2D tensor of shape (samples, 2*n)

# Theoretical justification
When applying ReLU, assuming that the
distribution
of the previous output is approximately
centered around 0.,
you are discarding half of your input.
This is inefficient.

Antirectifier allows to return all-


positive outputs like ReLU,
without discarding any data.

Tests on MNIST show that Antirectifier


allows to train networks
with twice less parameters yet with
comparable
classification accuracy as an equivalent
ReLU-based network.
'''

def compute_output_shape(self, input_shape):


shape = list(input_shape)
assert len(shape) == 2 # only valid for
2D tensors
shape[-1] *= 2
return tuple(shape)

def call(self, inputs):


inputs -= [Link](inputs, axis=1,
keepdims=True)
inputs = K.l2_normalize(inputs, axis=1)
pos = [Link](inputs)
neg = [Link](-inputs)
return [Link]([pos, neg], axis=1)

Parametrs and Settings

# global parameters
batch_size = 128
nb_classes = 10
nb_epoch = 10
Data Preparation

# the data, shuffled and split between train and


test sets
(X_train, y_train), (X_test, y_test) =
mnist.load_data()

X_train = X_train.reshape(60000, 784)


X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices


Y_train = np_utils.to_categorical(y_train,
nb_classes)
Y_test = np_utils.to_categorical(y_test,
nb_classes)

60000 train samples


10000 test samples
Model with Custom Layer

# build the model


model = Sequential()
[Link](Dense(256, input_shape=(784,)))
[Link](Antirectifier())
[Link](Dropout(0.1))
[Link](Dense(256))
[Link](Antirectifier())
[Link](Dropout(0.1))
[Link](Dense(10))
[Link](Activation('softmax'))

# compile the model


[Link](loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])

# train the model


[Link](X_train, Y_train,
batch_size=batch_size, epochs=nb_epoch,
verbose=1, validation_data=(X_test,
Y_test))

Train on 60000 samples, validate on 10000 samples


Epoch 1/10
60000/60000 [==============================] - 4s
- loss: 0.6029 - acc: 0.9154 - val_loss: 0.1556 -
val_acc: 0.9612
Epoch 2/10
60000/60000 [==============================] - 3s
- loss: 0.1252 - acc: 0.9662 - val_loss: 0.0990 -
val_acc: 0.9714
Epoch 3/10
60000/60000 [==============================] - 3s
- loss: 0.0813 - acc: 0.9766 - val_loss: 0.0796 -
val_acc: 0.9758
Epoch 4/10
60000/60000 [==============================] - 3s
- loss: 0.0634 - acc: 0.9810 - val_loss: 0.0783 -
val_acc: 0.9747
Epoch 5/10
60000/60000 [==============================] - 3s
- loss: 0.0513 - acc: 0.9847 - val_loss: 0.0685 -
val_acc: 0.9792
Epoch 6/10
60000/60000 [==============================] - 3s
- loss: 0.0428 - acc: 0.9867 - val_loss: 0.0669 -
val_acc: 0.9792
Epoch 7/10
60000/60000 [==============================] - 3s
- loss: 0.0381 - acc: 0.9885 - val_loss: 0.0668 -
val_acc: 0.9799
Epoch 8/10
60000/60000 [==============================] - 3s
- loss: 0.0314 - acc: 0.9903 - val_loss: 0.0672 -
val_acc: 0.9790
Epoch 9/10
60000/60000 [==============================] - 3s
- loss: 0.0276 - acc: 0.9913 - val_loss: 0.0616 -
val_acc: 0.9817
Epoch 10/10
60000/60000 [==============================] - 3s
- loss: 0.0238 - acc: 0.9926 - val_loss: 0.0608 -
val_acc: 0.9825
<[Link] at 0x7f2c140fbac8>
Excercise
Compare with an equivalent network that is 2x bigger (in terms
of Dense layers) + ReLU)

## your code here


Keras Functional API

Recall: All models (layers) are callables

from [Link] import Input, Dense


from [Link] import Model

# this returns a tensor


inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and


returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# this creates a model that includes


# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
[Link](optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
[Link](data, labels) # starts training
Multi-Input Networks

Keras Merge Layer


Here's a good use case for the functional API: models with
multiple inputs and outputs.
The functional API makes it easy to manipulate a large number
of intertwined datastreams.
Let's consider the following model.

from [Link] import Dense, Input


from [Link] import Model
from [Link] import concatenate

left_input = Input(shape=(784, ),
name='left_input')
left_branch = Dense(32, input_dim=784,
name='left_branch')(left_input)

right_input = Input(shape=(784,),
name='right_input')
right_branch = Dense(32, input_dim=784,
name='right_branch')(right_input)

x = concatenate([left_branch, right_branch])
predictions = Dense(10, activation='softmax',
name='main_output')(x)

model = Model(inputs=[left_input, right_input],


outputs=predictions)
Resulting Model will look like the following network:

Such a two-branch model can then be trained via e.g.:

[Link](optimizer='rmsprop',
loss='categorical_crossentropy', metrics=
['accuracy'])
[Link]([input_data_1, input_data_2], targets)
# we pass one data array per model input
Try yourself

Step 1: Get Data - MNIST

# let's load MNIST data as we did in the exercise


on MNIST with FC Nets

# %load ../solutions/sol_821.py

Step 2: Create the Multi-Input Network

## try yourself

## `evaluate` the model on test data

Keras supports different Merge strategies:


add : element-wise sum
concatenate : tensor concatenation. You can specify the
concatenation axis via the argument concat_axis.
multiply : element-wise multiplication
average : tensor average
maximum : element-wise maximum of the inputs.
dot : dot product. You can specify which axes to reduce
along via the argument dot_axes. You can also specify
applying any normalisation. In that case, the output of the
dot product is the cosine proximity between the two
samples.
You can also pass a function as the mode argument, allowing
for arbitrary transformations:

merged = Merge([left_branch, right_branch],


mode=lambda x: x[0] - x[1])
Even more interesting
Here's a good use case for the functional API: models with
multiple inputs and outputs.
The functional API makes it easy to manipulate a large number
of intertwined datastreams.
Let's consider the following model (from: [Link]
started/functional-api-guide/ )
Problem and Data
We seek to predict how many retweets and likes a news
headline will receive on Twitter.
The main input to the model will be the headline itself, as a
sequence of words, but to spice things up, our model will also
have an auxiliary input, receiving extra data such as the time of
day when the headline was posted, etc.
The model will also be supervised via two loss functions.
Using the main loss function earlier in a model is a good
regularization mechanism for deep models.
from [Link] import Input, Embedding, LSTM,
Dense
from [Link] import Model

# Headline input: meant to receive sequences of


100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a
"name" argument.
main_input = Input(shape=(100,), dtype='int32',
name='main_input')

# This embedding layer will encode the input


sequence
# into a sequence of dense 512-dimensional
vectors.
x = Embedding(output_dim=512, input_dim=10000,
input_length=100)(main_input)

# A LSTM will transform the vector sequence into a


single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)

Using TensorFlow backend.

Here we insert the auxiliary loss, allowing the LSTM and


Embedding layer to be trained smoothly even though the main
loss will be much higher in the model.

auxiliary_output = Dense(1, activation='sigmoid',


name='aux_output')(lstm_out)

At this point, we feed into the model our auxiliary input data by
concatenating it with the LSTM output:

from [Link] import concatenate


auxiliary_input = Input(shape=(5,),
name='aux_input')
x = concatenate([lstm_out, auxiliary_input])

# We stack a deep densely-connected network on top


x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# And finally we add the main logistic regression


layer
main_output = Dense(1, activation='sigmoid',
name='main_output')(x)

Model Definition

model = Model(inputs=[main_input,
auxiliary_input], outputs=[main_output,
auxiliary_output])

We compile the model and assign a weight of 0.2 to the


auxiliary loss.
To specify different loss_weights or loss for each different
output, you can use a list or a dictionary. Here we pass a single
loss as the loss argument, so the same loss will be used on all
outputs.

Note:
Since our inputs and outputs are named (we passed them a
"name" argument), We can compile&fit the model via:
[Link](optimizer='rmsprop',
loss={'main_output':
'binary_crossentropy', 'aux_output':
'binary_crossentropy'},
loss_weights={'main_output': 1.,
'aux_output': 0.2})

# And trained it via:


[Link]({'main_input': headline_data,
'aux_input': additional_data},
{'main_output': labels, 'aux_output':
labels},
epochs=50, batch_size=32)
Conclusions
Keras is a powerful and battery-included framework for
Deep Learning in Python
Keras is simple to use..
...but it is not for simple things!
Some References for ..

Cutting Edge
Fractal Net Implementation with Keras:
[Link] -
Please check out: [Link]
resources

Hyper-Cool
Hyperas: [Link]
A web dashboard for Keras Models

Super-Cool
[Link]: [Link]
Your Keras Model inside your Browser
Showcase: [Link]

You might also like