DL-unite4-Autoencoders.pptx..............

Deep Learning For AI Unit 4- Autoencoders

Autoencoders • Supervised learning uses explicit labels/correct output in order to train a network. • E.g., classification of images. • Unsupervised learning relies on data only. • E.g., CBOW and skip-gram word embeddings: the output is determined implicitly from word order in the input data. • Key point is to produce a useful embedding of words. • The embedding encodes structure such as word similarity and some relationships. • Still need to define a loss – this is an implicit supervision.

Autoencoders • Autoencoders are designed to reproduce their input, especially for images. • Key point is to reproduce the input from a learned encoding.

Autoencoders • Compare PCA/SVD • PCA takes a collection of vectors (images) and produces a usually smaller set of vectors that can be used to approximate the input vectors via linear combination. • Very efficient for certain applications. • Fourier and wavelet compression is similar. • Neural network autoencoders • Can learn nonlinear dependencies • Can use convolutional layers • Can use transfer learning

Autoencoders: structure • Encoder: compress input into a latent-space of usually smaller dimension. h = f(x) • Decoder: reconstruct input from the latent space. r = g(f(x)) with r as close to x as possible

Autoencoders: applications • Denoising: input clean image + noise and train to reproduce the clean image.

Autoencoders: Applications • Image colorization: input black and white and train to produce color images

Autoencoders: Applications • Watermark removal

Properties of Autoencoders • Data-specific: Autoencoders are only able to compress data similar to what they have been trained on. • Lossy: The decompressed outputs will be degraded compared to the original inputs. • Learned automatically from examples: It is easy to train specialized instances of the algorithm that will perform well on a specific type of input.

Capacity • As with other NNs, overfitting is a problem when capacity is too large for the data. • Autoencoders address this through some combination of: • Bottleneck layer – fewer degrees of freedom than in possible outputs. • Training to denoise. • Sparsity through regularization. • Contractive penalty.

Bottleneck layer (undercomplete) • Suppose input images are nxn and the latent space is m < nxn. • Then the latent space is not sufficient to reproduce all images. • Needs to learn an encoding that captures the important features in training data, sufficient for approximate reconstruction.

Simple bottleneck layer in Keras • input_img = Input(shape=(784,)) • encoding_dim = 32 • encoded = Dense(encoding_dim, activation='relu')(input_img) • decoded = Dense(784, activation='sigmoid')(encoded) • autoencoder = Model(input_img, decoded) • Maps 28x28 images into a 32 dimensional vector. • Can also use more layers and/or convolutions.

Denoising autoencoders • Basic autoencoder trains to minimize the loss between x and the reconstruction g(f(x)). • Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise. • Same possible architectures, different training data. • Kaggle has a dataset on damaged documents.

Denoising autoencoders • Denoising autoencoders can’t simply memorize the input output relationship. • Intuitively, a denoising autoencoder learns a projection from a neighborhood of our training data back onto the training data.

Sparse autoencoders • Construct a loss function to penalize activations within a layer. • Usually regularize the weights of a network, not the activations. • Individual nodes of a trained model that activate are data-dependent. • Different inputs will result in activations of different nodes through the network. • Selectively activate regions of the network depending on the input data.

Sparse autoencoders • Construct a loss function to penalize activations the network. • L1 Regularization: Penalize the absolute value of the vector of activations a in layer h for observation I • KL divergence: Use cross-entropy between average activation and desired activation

Contractive autoencoders • Arrange for similar inputs to have similar activations. • I.e., the derivative of the hidden layer activations are small with respect to the input. • Denoising autoencoders make the reconstruction function (encoder+decoder) resist small perturbations of the input • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input.

Contractive autoencoders • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input.

Autoencoders • Both the denoising and contractive autoencoder can perform well • Advantage of denoising autoencoder : simpler to implement- requires adding one or two lines of code to regular autoencoder- no need to compute Jacobian of hidden layer • Advantage of contractive autoencoder : gradient is deterministic -can use second order optimizers (conjugate gradient, LBFGS, etc.)-might be more stable than denoising autoencoder, which uses a sampled gradient • To learn more on contractive autoencoders: • Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot et Yoshua Bengio, 2011.

DL-unite4-Autoencoders.pptx..............

More Related Content

Similar to DL-unite4-Autoencoders.pptx..............

Recently uploaded

DL-unite4-Autoencoders.pptx..............