DEEP LEARNING
UNIT - III
By,
DR. HIMANI DESHPANDE (TSEC, MUMBAI) Dr. Himani Deshpande 1
UNIT – III AUTOENCODERS
3.1
Introduction, Linear Autoencoder, Undercomplete Autoencoder, Overcomplete
Autoencoders, Regularization in Autoencoders
3.2
Denoising Autoencoders, Sparse Autoencoders, Contractive Autoencoders
3.3
Application of Autoencoders: Image Compression
2
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS
¡ An autoencoder is a type of artificial neural network used to learn efficient
codings of unlabeled data (unsupervised learning).
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS
¡ Autoencoders are designed to reproduce their input, especially for images.
¡ Key point is to reproduce the input from a learned encoding.
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS. ARCHITECTURE
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODER
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
HIGHLIGHT NOTES
¡ The way we try to mark and learn
important things for exams instead of
learning the whole book chapter.
Autoencoder focuses on reproducing
significant information with some loss
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTO ENCODERS
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS
10
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
Why can't we just copy ?
AUTOENCODERS
That way latent layers
will not learn anything.
11
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS
12
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
PCA AND AUTOENCODERS
13
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
PCA AND AUTOENCODERS
14
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
PCA AND AUTOENCODERS
15
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
APPLICATION
SELF DRIVING CARS
16
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SELF DRIVING CARS
17
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
Properties of Autoencoders
DATA SPECIFIC
UNSUPERVISED
LOSSY
PROPERTIES OF AUTOENCODERS
• Autoencoders are data-specific, which means that they will only be able to compress
data similar to what they have been trained on.
• Example, an autoencoder trained on pictures of faces would do a rather poor job of
compressing pictures of trees, because the features it would learn would be face-specific.
• Autoencoders are lossy, which means that the decompressed outputs will be
degraded compared to the original inputs.
• Autoencoders are learned automatically from data examples, which is a useful
property: it means that it is easy to train specialized instances of the algorithm that will
perform well on a specific type of input. It doesn’t require any new engineering, just
appropriate training data. 20
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
PARTS OF AUTO ENCODERS
¡ Encoder : This part of the network encodes or compresses the input data into a
latent-space representation. The compressed data typically looks garbled, nothing
like the original data.
¡ Decoder : This part of network decodes or reconstructs the encoded data(latent
space representation) back to original dimension. The decoded data is a lossy
reconstruction of the original data.
21
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTO-ENCODER
Usually <784
Compact
NN code representation of
Encoder the input object
28 X 28 = 784
Learn together
code NN
Decoder
Can reconstruct
the original object
22
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS
Minimize 𝑥 − 𝑥# "
As close as possible
encode decode
𝑥 𝑐 𝑥#
𝑊 𝑊!
hidden layer
Input layer (linear) output layer
Bottleneck later
23
DR. HIMANI DESHPANDE (TSEC, MUMBAI) Output of the hidden layer is the code
TRAINING AUTOENCODERS
CODE SIZE
NUMBER OF LAYERS
NUMBER OF NODES
PER LAYER LOSS FUNCTION
TRAINING AUTOENCODERS
If we are working with image data, the most popular loss
functions for reconstruction are MSE Loss and L1 Loss.
In case the inputs and outputs are within the range [0,1], as in
MNIST, we can also make use of Binary Cross Entropy as the
reconstruction loss.
TRAINING AUTOENCODERS
You need to set 4 hyperparameters before training an autoencoder:
1.Code size: The code size or the size of the bottleneck is the most important hyperparameter used to tune the
autoencoder. The bottleneck size decides how much the data has to be compressed. This can also act as a
regularisation term.
2.Number of layers: Like all neural networks, an important hyperparameter to tune autoencoders is the depth
of the encoder and the decoder. While a higher depth increases model complexity, a lower depth is faster to
process.
3.Number of nodes per layer: The number of nodes per layer defines the weights we use per layer. Typically,
the number of nodes decreases with each subsequent layer in the autoencoder as the input to each of these
layers becomes smaller across the layers.
4.Reconstruction Loss: The loss function we use to train the autoencoder is highly dependent on the type of
input and output we want the autoencoder to adapt to. If we are working with image data, the most popular
loss functions for reconstruction are MSE Loss and L1 Loss. In case the inputs and outputs are within the range
[0,1], as in MNIST, we can also make use of Binary Cross Entropy as the reconstruction loss.
AUTO ENCODERS ARCHITECTURE
27
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODER
¡ Autoencoders are identical to multilayer
perceptron neural networks because,
like multilayer perceptrons ,
autoencoders have an input layer, some
hidden layers, and an output layer.
¡ The key difference between a multilayer
perceptron network and an
autoencoder is that the output layer of
an autoencoder has the same number
of neurons as that of the input layer.
28
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODER
h = g(W x i + b) The model is trained to minimize a certain loss function 29
xˆ i = f (W ∗ h + c) which will ensure that xˆi is close to x i
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTO ENCODERS ARCHITECTURE
30
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTO ENCODERS ARCHITECTURE
31
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
ENCODER
32
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
33
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
TYPES OF AE
9. Linear Autoencoder
10. Overcomplete Autoencoder 36
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
LINEAR AUTOENCODERS
¡ A linear autoencoder is a type of autoencoder that uses only linear transformations,
such as matrix multiplication and addition, to compress and reconstruct the data.
¡ A linear autoencoder consists of two parts: an encoder and a decoder. The encoder
takes the input data and maps it to a lower-dimensional space. The decoder then
takes the compressed representation and reconstructs the original data.
¡ The goal of training a linear autoencoder is to minimize the reconstruction error
between the input and output:
L(x, x') = ||x - x'||^2
37
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
LINEAR AUTOENCODERS
¡ A linear autoencoder is a type of autoencoder that uses only linear transformations.
In other words, the encoder and decoder are composed of only linear layers. The
advantage of using a linear autoencoder is that it is computationally efficient and can
be trained on large datasets.
38
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
LINEAR AUTOENCODERS
39
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
UNDERCOMPLETE AUTOENCODER
40
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
UNDERCOMPLETE AUTOENCODER
Ø Let us consider the case where
dim(h) < dim(x i )
Ø If we are still able to reconstruct xˆi perfectly
from h, then what does it say about h?
Ø h is a loss-free encoding of x i . It cap- tures all
the important characteristics of x i
An autoencoder where dim(h) < dim(x i ) is called an
under-complete autoencoder
41
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
UNDERCOMPLETE AUTOENCODERS
¡ An undercomplete autoencoder is one of the simplest types of autoencoders.
¡ The way it works is very straightforward—
Undercomplete autoencoder takes in an image and tries to predict the same image
as output, thus reconstructing the image from the compressed bottleneck region.
¡ Undercomplete autoencoders are truly unsupervised as they do not take any form of
label, the target being the same as the input.
42
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
UNDERCOMPLETE AE
43
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
OVERCOMPLETE AUTOENCODER
44
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
APPLICATION
¡ Applications of undercomplete autoencoders include compression, recommendation
systems as well as outlier detection.
45
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
OVERCOMPLETE AUTOENCODER
46
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
OVERCOMPLETE AUTOENCODER
Ø In such a case the autoencoder could learn a
trivial encoding by simply copying x i into h
and then copying h into xˆi
Ø Such an identity encoding is useless in
practice as it does not really tell us anything
about the important characteristics of the
data
An autoencoder where dim(h) ≥ dim(x i) 47
is called an over complete autoencoder
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
APPLICATION
¡ Very rare applications of overcomplete autoencoder, hypothetical scenarios.
¡ BMI à Height & Weight calculation.
¡ Sometimes inspite of knowing BMI, we might need our network to gain knowledge about height or weight.
Sparse Autoencoder based on regularization
48
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
UNDERCOMPLETE AND OVERCOMPLETE AUTOENCODERS
ARCHITECTURE
The only difference between the two is in the encoding
output's size 49
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
UNDERCOMPLETE AND OVERCOMPLETE AUTOENCODERS
¡ We can make out latent space representation learn useful features by giving it smaller
dimensions then input data. In this case autoencoder is undercomplete. By training an
undercomplete representation, we force the autoencoder to learn the most salient features
of the training data. If we give autoencoder much capacity(like if we have almost same
dimensions for input data and latent space), then it will just learn copying task without
extracting useful features or information from data.
¡ If dimensions of latent space is equal to or greater then to input data, in such case
autoencoder is overcomplete. In such case even linear encoder and linear decoder can learn
to copy the input to the output without learning anything useful about the data distribution.
50
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
STACK AUTOENCODER
A stacked autoencoder is a
neural network consist several
layers of sparse autoencoders
where output of each hidden
layer is connected to the input
of the successive hidden layer.
51
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
Regularization in
Autoencoders
It is fine to
loose some
data in this
case
52
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
REGULARIZATION
¡ A reliable autoencoder must make a tradeoff between two important parts:
• Sensitive enough to inputs so that it can accurately reconstruct input data
• Able to generalize well even when evaluated on unseen data
¡ As a result, our loss function of autoencoder is composed of two different parts.
¡ The first part is the loss function (e.g. mean squared error loss) calculating the difference
between input data and output data.
¡ The second term would act as regularization term which prevents autoencoder from
overfitting.
53
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
REGULARIZATION
¡ Regularization helps with the effects of out-of-control parameters by using different methods to
minimize parameter size over time.
¡ Regularization coefficients L1 and L2 help fight overfitting by making certain weights smaller. Smaller-
valued weights lead to simpler hypotheses, which are the most generalizable.
¡ Unregularized weights with several higher-order polynomials in the feature sets tend to overfit the
training set.
54
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
R EG U L A R I ZAT I O N
ü While poor generalization could happen
even in undercomplete autoencoders it is an
even more serious problem for overcomplete
auto encoders
ü Here, the model can simply learn to copy
x i to h and then h to xˆi
ü To avoid poor generalization, we need to
introduce regularization
55
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
R EG U L A R I ZAT I O N
¡ The simplest solution is to add a
L 2 -regularization term to the objective
function
‘m’ is the number of rows , n is number of
columns in the X vector of image
56
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
R EG U L A R I ZAT I O N
¡ The simplest solution is to add a
L 2 -regularization term to the objective
function
57
Θ= combination of all the weights and bias
Θ = [ w1,w2,w3,….. ]
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
REGULARIZATION
¡ The regularized autoencoders use a loss function that helps the model to have other
properties besides copying input to the output.
¡ We can generally find two types of regularized autoencoder:
¡ the denoising autoencoder and
¡ the sparse autoencoder.
58
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
DENOISING AUTOENCODERS
¡ Autoencoders are Neural Networks which are commonly used for feature selection and extraction.
However, when there are more nodes in the hidden layer than there are inputs, the Network is risking
to learn the so-called “Identity Function”, also called “Null Function”, meaning that the output equals
the input, marking the Autoencoder useless.
¡ Denoising Autoencoders solve this problem by corrupting the data on purpose by randomly turning
some of the input values to zero. In general, the percentage of input nodes which are being set to zero
is about 50%. Other sources suggest a lower count, such as 30%. It depends on the amount of data
and input nodes you have.
¡ When calculating the Loss function, it is important to compare the output values with the original
input, not with the corrupted input. That way, the risk of learning the identity function instead of
extracting features is eliminated.
59
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
DENOISING
AUTOENCODERS
¡ Denoising autoencoders are a robust variant of the standard autoencoders.
¡ They have the same structure as a standard autoencoders but are trained using
samples in which some amount of noise is added.
¡ Thus, we map these noisy samples to their clean version.
¡ This ensures that the network doesn’t learn an identity mapping which will be
pointless.
¡ So, to summarise, denoising autoencoders are used where you want to learn a more
robust latent representation for particular set of input data. 60
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
DENOISING AUTOENCODER
61
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
DENOISING AUTOENCODER
62
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SPARSE AUTOENCODER
¡ A sparse autoencoder is simply an autoencoder whose training criterion involves a sparsity
penalty. In most cases, we would construct our loss function by penalizing activations of
hidden layers so that only a few nodes are encouraged to activate when a single sample is
fed into the network.
¡ There are actually two different ways to construct our sparsity penalty:
¡ L1 regularization and
¡ KL-divergence.
69
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SPARSE AUTOENCODER
70
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SPARSE AUTOENCODER
71
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SPARSE AUTOENCODER
¡ In sparse autoencoders with a sparsity enforcer that directs a single layer network to learn
code dictionary which minimizes the error in reproducing the input while constraining
number of code words for reconstruction.
¡ The sparse autoencoder consists of a single hidden layer, which is connected to the input
vector by a weight matrix forming the encoding step. The hidden layer outputs to a
reconstruction vector, using a tied weight matrix to form the decoder.
73
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
CONTRACTIVE AE
74
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
CONTRACTIVE AUTOENCODER
¡ A Contractive Autoencoder is an autoencoder that adds a penalty term to the classical reconstruction cost
function.
¡ This penalty term corresponds to the Frobenius norm of the Jacobian matrix of the encoder activations with
respect to the input.
The Frobenius Norm of a matrix is defined as the
square root of the sum of the squares of the
elements of the matrix. Approach: Find the sum
of squares of the elements of the matrix and
then print the square root of the calculated
value.
75
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
CONTRACTIVE AUTOENCODER
¡ Contractive Autoencoder was proposed by the researchers at the University of Toronto in 2011 in the
paper Contractive auto-encoders: Explicit invariance during feature extraction. The idea behind that is
to make the autoencoders robust of small changes in the training dataset.
¡ To deal with the above challenge that is posed in basic autoencoders, the authors proposed to add
another penalty term to the loss function of autoencoders.
¡ The Loss function:
Contractive autoencoder adds an extra term in the loss function of autoencoder, it is given as:
i.e the above penalty term is the Frobenius Norm of the encoder, the frobenius norm is just a
generalization of Euclidean norm. 76
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
DENOISING AND CONTRACTIVE AUTOENCODER
¡ There is a connection between the denoising autoencoder and the contractive autoencoder:
¡ the denoising reconstruction error is equivalent to a contractive penalty on the reconstruction function that
maps x to r - g(f(x)).
¡ In other words, denoising autoencoders make the reconstruction function resist small but finite sized
perturbations of the input, whereas contractive autoencoders make the feature extraction function resist
infinitesimal perturbations of the input.
77
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
Symmetric is not
necessary.
DEEP AUTO-ENCODER
¡ Of course, the auto-encoder can be deep
As close as possible
Output Layer
Input Layer
bottle
… …
Layer
Layer
Layer
Layer
Layer
Layer
𝑊# 𝑊" 𝑊"! 𝑊#!
𝑥 Initialize by RBM 𝑥#
Code 78
layer-by-layer
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
85
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
APPLICATIONS
WATERMARK
REMOVAL
01
DENOISING
05
02 DIMENTIONALITY
REDUCTION
IMAGE
COMPRESSION
IMAGE 04 03 FEATURE
COLORIZATION
VARIATION
USE OF AUTO ENCODERS
¡ Data denoising and Dimensionality reduction for data visualization are considered
as two main interesting practical applications of autoencoders. With appropriate
dimensionality and sparsity constraints, autoencoders can learn data projections that
are more interesting than PCA or other basic techniques.
¡ Autoencoders also can be used for Image Reconstruction, Basic Image colorization,
data compression, gray-scale images to colored images, generating higher resolution
images etc.
87
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
APPLICATIONS
Autoencoders present an efficient way to learn a representation of
your data that focuses on the signal, not the noise. You can use them
for a variety of tasks such as:
•Image Compression
•Dimensionality reduction
•Feature extraction
•Denoising of data/images
•Imputing missing data
IMAGE COMPRESSION
89
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
IMAGE COMPRESSION
¡ Autoencoders are a deep learning model for transforming data from a high-
dimensional space to a lower-dimensional space. They work by encoding the data,
whatever its size, to a 1-D vector. This vector can then be decoded to reconstruct
the original data (in this case, an image).
¡ An autoencoder consists of two parts: an encoder network and a decoder network.
The encoder network compresses the input data, while the decoder network
reconstructs the compressed data back into its original form. The compressed data,
also known as the bottleneck layer, is typically much smaller than the input data.
90
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS: APPLICATIONS
¡ Denoising: input clean image + noise and train to reproduce the clean image.
91
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS: APPLICATIONS
¡ Image colorization: input black and white and train to produce color images
92
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTOENCODERS: APPLICATIONS
¡ Watermark removal
93
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
FEATURE VARIATION
94
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
DIMENSIONALITY REDUCTION
95
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
96
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
PROPERTIES OF AUTOENCODERS
¡ Data-specific: Autoencoders are only able to compress data similar to what they have been trained on.
¡ Lossy: The decompressed outputs will be degraded compared to the original inputs.
¡ Learned automatically from examples: It is easy to train specialized instances of the algorithm that will
perform well on a specific type of input.
97
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://www.edureka.co/blog/autoencoders-tutorial/
CAPACITY
¡ As with other NNs, overfitting is a problem when capacity is too large for the data.
¡ Autoencoders address this through some combination of:
¡ Bottleneck layer – fewer degrees of freedom than in possible outputs.
¡ Training to denoise.
¡ Sparsity through regularization.
¡ Contractive penalty.
98
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
BOTTLENECK LAYER (UNDERCOMPLETE)
¡ Suppose input images are nxn and the latent space is m < nxn.
¡ Then the latent space is not sufficient to reproduce all images.
¡ Needs to learn an encoding that captures the important features in training data, sufficient for approximate
reconstruction.
99
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SIMPLE BOTTLENECK LAYER IN KERAS
¡ input_img = Input(shape=(784,))
¡ encoding_dim = 32
¡ encoded = Dense(encoding_dim, activation='relu')(input_img)
¡ decoded = Dense(784, activation='sigmoid')(encoded)
¡ autoencoder = Model(input_img, decoded)
¡ Maps 28x28 images into a 32 dimensional vector.
¡ Can also use more layers and/or convolutions.
100
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://blog.keras.io/building-autoencoders-in-keras.html
DENOISING AUTOENCODERS
¡ Basic autoencoder trains to minimize the loss between x and the reconstruction g(f(x)).
¡ Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise.
¡ Same possible architectures, different training data.
¡ Kaggle has a dataset on damaged documents.
101
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://blog.keras.io/building-autoencoders-in-keras.html
DENOISING AUTOENCODERS
¡ Denoising autoencoders can’t simply memorize the input output relationship.
¡ Intuitively, a denoising autoencoder learns a projection from a neighborhood of our
training data back onto the training data.
102
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
SPARSE AUTOENCODERS
¡ Construct a loss function to penalize activations within a layer.
¡ Usually regularize the weights of a network, not the activations.
¡ Individual nodes of a trained model that activate are data-dependent.
¡ Different inputs will result in activations of different nodes through the network.
¡ Selectively activate regions of the network depending on the input data.
103
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://www.jeremyjordan.me/autoencoders/
SPARSE AUTOENCODERS
¡ Construct a loss function to penalize activations the network.
¡ L1 Regularization: Penalize the absolute value of the vector of activations a in layer h for observation I
¡ KL divergence: Use cross-entropy between average activation and desired activation
104
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://www.jeremyjordan.me/autoencoders/
CONTRACTIVE AUTOENCODERS
¡ Arrange for similar inputs to have similar activations.
¡ I.e., the derivative of the hidden layer activations are small with respect to the input.
¡ Denoising autoencoders make the reconstruction function (encoder+decoder) resist
small perturbations of the input
¡ Contractive autoencoders make the feature extraction function (ie. encoder) resist
infinitesimal perturbations of the input.
105
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://www.jeremyjordan.me/autoencoders/
CONTRACTIVE AUTOENCODERS
¡ Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of
the input.
106
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf
AUTOENCODERS
¡ Both the denoising and contractive autoencoder can perform well
¡ Advantage of denoising autoencoder : simpler to implement-requires adding one or two lines of code to regular
autoencoder-no need to compute Jacobian of hidden layer
¡ Advantage of contractive autoencoder : gradient is deterministic -can use second order optimizers (conjugate gradient,
LBFGS, etc.)-might be more stable than denoising autoencoder, which uses a sampled gradient
¡ To learn more on contractive autoencoders:
¡ Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier
Glorot et Yoshua Bengio, 2011.
107
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf