Deep Learning For AI Unit 4- Autoencoders
Autoencoders • Supervised learning uses explicit labels/correct output in order to train a network. • E.g., classification of images. • Unsupervised learning relies on data only. • E.g., CBOW and skip-gram word embeddings: the output is determined implicitly from word order in the input data. • Key point is to produce a useful embedding of words. • The embedding encodes structure such as word similarity and some relationships. • Still need to define a loss – this is an implicit supervision.
Autoencoders • Autoencoders are designed to reproduce their input, especially for images. • Key point is to reproduce the input from a learned encoding.
Autoencoders • Compare PCA/SVD • PCA takes a collection of vectors (images) and produces a usually smaller set of vectors that can be used to approximate the input vectors via linear combination. • Very efficient for certain applications. • Fourier and wavelet compression is similar. • Neural network autoencoders • Can learn nonlinear dependencies • Can use convolutional layers • Can use transfer learning
Autoencoders: structure • Encoder: compress input into a latent-space of usually smaller dimension. h = f(x) • Decoder: reconstruct input from the latent space. r = g(f(x)) with r as close to x as possible
Autoencoders: applications • Denoising: input clean image + noise and train to reproduce the clean image.
Autoencoders: Applications • Image colorization: input black and white and train to produce color images
Autoencoders: Applications • Watermark removal
Properties of Autoencoders • Data-specific: Autoencoders are only able to compress data similar to what they have been trained on. • Lossy: The decompressed outputs will be degraded compared to the original inputs. • Learned automatically from examples: It is easy to train specialized instances of the algorithm that will perform well on a specific type of input.
Capacity • As with other NNs, overfitting is a problem when capacity is too large for the data. • Autoencoders address this through some combination of: • Bottleneck layer – fewer degrees of freedom than in possible outputs. • Training to denoise. • Sparsity through regularization. • Contractive penalty.
Bottleneck layer (undercomplete) • Suppose input images are nxn and the latent space is m < nxn. • Then the latent space is not sufficient to reproduce all images. • Needs to learn an encoding that captures the important features in training data, sufficient for approximate reconstruction.
Simple bottleneck layer in Keras • input_img = Input(shape=(784,)) • encoding_dim = 32 • encoded = Dense(encoding_dim, activation='relu')(input_img) • decoded = Dense(784, activation='sigmoid')(encoded) • autoencoder = Model(input_img, decoded) • Maps 28x28 images into a 32 dimensional vector. • Can also use more layers and/or convolutions.
Denoising autoencoders • Basic autoencoder trains to minimize the loss between x and the reconstruction g(f(x)). • Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise. • Same possible architectures, different training data. • Kaggle has a dataset on damaged documents.
Denoising autoencoders • Denoising autoencoders can’t simply memorize the input output relationship. • Intuitively, a denoising autoencoder learns a projection from a neighborhood of our training data back onto the training data.
Sparse autoencoders • Construct a loss function to penalize activations within a layer. • Usually regularize the weights of a network, not the activations. • Individual nodes of a trained model that activate are data-dependent. • Different inputs will result in activations of different nodes through the network. • Selectively activate regions of the network depending on the input data.
Sparse autoencoders • Construct a loss function to penalize activations the network. • L1 Regularization: Penalize the absolute value of the vector of activations a in layer h for observation I • KL divergence: Use cross-entropy between average activation and desired activation
Contractive autoencoders • Arrange for similar inputs to have similar activations. • I.e., the derivative of the hidden layer activations are small with respect to the input. • Denoising autoencoders make the reconstruction function (encoder+decoder) resist small perturbations of the input • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input.
Contractive autoencoders • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input.
Autoencoders • Both the denoising and contractive autoencoder can perform well • Advantage of denoising autoencoder : simpler to implement- requires adding one or two lines of code to regular autoencoder- no need to compute Jacobian of hidden layer • Advantage of contractive autoencoder : gradient is deterministic -can use second order optimizers (conjugate gradient, LBFGS, etc.)-might be more stable than denoising autoencoder, which uses a sampled gradient • To learn more on contractive autoencoders: • Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot et Yoshua Bengio, 2011.

DL-unite4-Autoencoders.pptx..............

  • 1.
    Deep Learning ForAI Unit 4- Autoencoders
  • 2.
    Autoencoders • Supervised learninguses explicit labels/correct output in order to train a network. • E.g., classification of images. • Unsupervised learning relies on data only. • E.g., CBOW and skip-gram word embeddings: the output is determined implicitly from word order in the input data. • Key point is to produce a useful embedding of words. • The embedding encodes structure such as word similarity and some relationships. • Still need to define a loss – this is an implicit supervision.
  • 3.
    Autoencoders • Autoencoders aredesigned to reproduce their input, especially for images. • Key point is to reproduce the input from a learned encoding.
  • 4.
    Autoencoders • Compare PCA/SVD •PCA takes a collection of vectors (images) and produces a usually smaller set of vectors that can be used to approximate the input vectors via linear combination. • Very efficient for certain applications. • Fourier and wavelet compression is similar. • Neural network autoencoders • Can learn nonlinear dependencies • Can use convolutional layers • Can use transfer learning
  • 5.
    Autoencoders: structure • Encoder:compress input into a latent-space of usually smaller dimension. h = f(x) • Decoder: reconstruct input from the latent space. r = g(f(x)) with r as close to x as possible
  • 6.
    Autoencoders: applications • Denoising:input clean image + noise and train to reproduce the clean image.
  • 7.
    Autoencoders: Applications • Imagecolorization: input black and white and train to produce color images
  • 8.
  • 9.
    Properties of Autoencoders •Data-specific: Autoencoders are only able to compress data similar to what they have been trained on. • Lossy: The decompressed outputs will be degraded compared to the original inputs. • Learned automatically from examples: It is easy to train specialized instances of the algorithm that will perform well on a specific type of input.
  • 10.
    Capacity • As withother NNs, overfitting is a problem when capacity is too large for the data. • Autoencoders address this through some combination of: • Bottleneck layer – fewer degrees of freedom than in possible outputs. • Training to denoise. • Sparsity through regularization. • Contractive penalty.
  • 11.
    Bottleneck layer (undercomplete) •Suppose input images are nxn and the latent space is m < nxn. • Then the latent space is not sufficient to reproduce all images. • Needs to learn an encoding that captures the important features in training data, sufficient for approximate reconstruction.
  • 12.
    Simple bottleneck layerin Keras • input_img = Input(shape=(784,)) • encoding_dim = 32 • encoded = Dense(encoding_dim, activation='relu')(input_img) • decoded = Dense(784, activation='sigmoid')(encoded) • autoencoder = Model(input_img, decoded) • Maps 28x28 images into a 32 dimensional vector. • Can also use more layers and/or convolutions.
  • 13.
    Denoising autoencoders • Basicautoencoder trains to minimize the loss between x and the reconstruction g(f(x)). • Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise. • Same possible architectures, different training data. • Kaggle has a dataset on damaged documents.
  • 14.
    Denoising autoencoders • Denoisingautoencoders can’t simply memorize the input output relationship. • Intuitively, a denoising autoencoder learns a projection from a neighborhood of our training data back onto the training data.
  • 15.
    Sparse autoencoders • Constructa loss function to penalize activations within a layer. • Usually regularize the weights of a network, not the activations. • Individual nodes of a trained model that activate are data-dependent. • Different inputs will result in activations of different nodes through the network. • Selectively activate regions of the network depending on the input data.
  • 16.
    Sparse autoencoders • Constructa loss function to penalize activations the network. • L1 Regularization: Penalize the absolute value of the vector of activations a in layer h for observation I • KL divergence: Use cross-entropy between average activation and desired activation
  • 17.
    Contractive autoencoders • Arrangefor similar inputs to have similar activations. • I.e., the derivative of the hidden layer activations are small with respect to the input. • Denoising autoencoders make the reconstruction function (encoder+decoder) resist small perturbations of the input • Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input.
  • 18.
    Contractive autoencoders • Contractiveautoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of the input.
  • 19.
    Autoencoders • Both thedenoising and contractive autoencoder can perform well • Advantage of denoising autoencoder : simpler to implement- requires adding one or two lines of code to regular autoencoder- no need to compute Jacobian of hidden layer • Advantage of contractive autoencoder : gradient is deterministic -can use second order optimizers (conjugate gradient, LBFGS, etc.)-might be more stable than denoising autoencoder, which uses a sampled gradient • To learn more on contractive autoencoders: • Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot et Yoshua Bengio, 2011.