Deep Learning Chapter 14 discusses autoencoders. Autoencoders are neural networks trained to copy their input to their output. They have an encoder that maps the input to a hidden representation and a decoder that maps this back to the output. Autoencoders are commonly used for dimensionality reduction, feature learning, and extracting a low-dimensional representation of the input data. Regularized autoencoders add constraints like sparsity or contractive penalties to prevent the autoencoder from learning the identity function and force it to learn meaningful representations. Denoising autoencoders are trained to reconstruct clean inputs from corrupted versions, which encourages the hidden representation to be robust. Contractive autoencoders add a penalty term that resists small changes to the input
14 Autoencoders AE isneural network that is trained to attempt to copy its input to its output It has a hidden layer [h] that encodes the input [x]. H = f(wx + bias) and also a decoder AE are restricted in ways that allow them to copy the inputs only approximately and so it is forced to prioritize which aspects of the input should be copied which can be great for feature extraction AE traditionally used for dimensionality reduction or feature learning AE can be considered a special case of feedforward networks and can be trained with similar techniques - - MiniBatch GD. They’re also trained by recirculation
6.
14.1 Undercomplete Autoencoders UndercompleteAE is one in which the dimension of Hidden layer [h] is less than the dimensions of Input layer [x]. We are typically not interested in the AE output but the hidden layer [h] [h] is typically constrained to smaller dimension than [x] which is called Undercomplete. This forces an AE to capture only the salient features If the AE is allowed too much capacity, then it just learn to copy the inputs with extracting useful information about the distribution of the data
7.
14.2 Regularized Autoencoders AEwith hidden layers with dimensions equal to or greater than the input are called Overcomplete. Regularized autoencoders provide the ability train a architecture of autoencoder successfully, choosing the code dimension and the capacity of the encoder and decoder based on the complexity of distribution to be modeled. Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code size small, regularized autoencoders use a loss function that encourages the model to have other properties besides the ability to copy its input to its output. These other properties include sparsity of the representation, smallness of the derivative of the representation, and robustness to noise or to missing inputs. A regularized autoencoder can be nonlinear and overcomplete but still learn something useful about the data distribution, even if the model capacity is great enough to learn a identity function.
8.
14.2.1 Sparse Autoencoders Wecan also think of the penalty as a regularizer term added to a feedforward network whose main task is to copy inputs to outputs and perform some supervised tasks Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. Another way to think about sparse AE framework is approximating maximum likelihood of a generative model* with latent variables
9.
14.2.2 Denoising Autoencoders Adenoising AE (DAE) is an AE that receives a corrupted input [x^hat ] and then try to reconstruct the original inputs We use a corruption process C ( x^hat | x) which represents a conditional distribution over the corrupted samples x^hat given the original input [x]. The AE then learn a reconstruction distribution p (x | x^hat) for training pairs (x | x^hat) Typically, we can perform gradient based optimization and as long as the encoder is deterministic, the denoising AE is a feed forward network and can be trained with the same techniques as other similar networks Denoising AE shows how useful byproducts emerge by just reducing the reconstruction error. They also show how high capacity models may be used as autoencoders and still learn useful features without learning the identity function
10.
14.2.3 Regularizing byPenalizing Derivatives Another strategy for regularizing autoencoders is to use a penalty gamma, as in sparse autoencoders but with a different form . This forces the model to be invariant to slight changes in the input vector [x]. Since this is only applies to training examples, it forces the AE to capture useful information about the training distribution This is called a contractive AE
11.
14.3 Representational Power,Layer Size and Depth AE are usually trained using single layer encoders and decoders but we can also make the hidden layer [h] deep Since the encoder and decoder are both feed forward networks, they both benefit from deep architectures. The major advantage of deep architectures for feed forward neural networks is that they can represent an approximation of any function to an arbitrary degree of accuracy A deep encoder can also approximate any mapping from the input [x] to hidden layer [h] given enough hidden units. Depth exponentially reduces training cost and amount of training data needed and achieves better compression.
12.
14.4 Stochastic Encodersand Decoders AEs are essentially feed forward neural networks For a stochastic AE, the encoder and decoder are not just simple functions but sampled from a distribution. p(h | x) for the encoder and p (x | h) for the decoder
13.
14.5.1 Estimating theScore Score matching is an alternative to maximum likelihood and provides an alternative to probability distribution by encouraging the model to have the same score as the data distribution at every training point in [x] For AEs, learning the gradient field is one way of learning the structure of p(data) Denoising training of a specific kind of AE (sigmoid hidden units , linear reconstruction units) is equivalent to training an RBM (restricted Boltzmann machine , basic neural network) with Gaussian visible units .
14.
14.5.1 Estimating theScore Score matching applied to RBMs yields a cost function that is identical to the reconstruction error combined with a regularization term similar to the contractive penalty of the CAE
15.
14.5.2 Historical Perspective Theidea of using MLP (multilayer perceptron) to denoise goes back to the 80’s A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of at least three layers of nodes. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training Denoising encoders are in some MLP trained to denoise. But the term “Denoising AE” refers to a model that learn not only to denoise its inputs but also learn a good internal representation (useful features) The learned representation can be used to pretrain a deeper unsupervised / supervised network The motivation for Denoising AE was to allow the learning of a very high capacity encoder while preventing the encoder / decoder from learning an identity function
16.
14.6 Learning Manifoldswith Autoencoders Most learning algorithms including AE exploit the idea that data concentrates around a low dimension manifold or learning surface. AEs aims to learn the structure of this manifold All AE training procedure involve a compromise between two forces - Learning a representation [h] of a training example [x] such that [x] can be reconstructed via [h] through a decoder - Satisfying the constraint of regularization penalty. This is usually an architectural constraint that limits the capacity of the AE These technique prefer methods that are less sensitive to the input The two forces force the hidden representation to capture information about the structure of the data generating distribution
17.
14.6 Learning Manifoldswith Autoencoders The AE can afford to represent only the variations needed for reconstruction The manifold captures a local coordinate system if the generating distribution concentrates near a low dimensional manifold. Hence the encoder learns a mapping from [x] to representative space that is only sensitive to changes along the manifold and not changes orthogonal to the manifold. The AE recovers the manifold structure if the reconstruction function is insensitive to the perturbation in the input -see example Most of the ML research of learning nonlinear manifolds has focused on non parametric methods based on the nearest-neighbour graph
18.
14.6 Learning Manifoldswith Autoencoders - Discuss locally linear Gaussian patches AI problems can usually have very complicated structures or manifolds that can be difficult to capture from local interpolation
19.
14.7 Contractive Autoencoders Thecontractive AE introduces a regularizer on the [h] , the hidden layer There is a connection between Denoising AE and Contractive AE. In the limit of small Gaussian input noise the denoising reconstruction error is equivalent to the contractive penalty on the reconstruction function. That is Denoising AEs make the reconstruction function resist small but finite perturbations of the input while contractive AEs make the feature extraction function resist small perturbations to the input The CAE maps a neighborhood of input points to smaller neighborhood of output points,hence contracting the input.
20.
14.7 Contractive Autoencoders RegularizedAE learn manifolds by balancing opposing forces. For CAEs these are reconstruction error and the contractive penalty. Reconstruction error alone would allow the CAE to learn an identity function and contractive penalty alone will allow the CAE to learn features that are constant wrt [x] A good strategy to train AEs is to train a series of single layer AEs each trained to reconstruct the previous AEs hidden layer. The composition of these AEs forms a deep AE. Because each layer was separately trained to be contractive , the deep AE is contractive as well which is different from training the full AE with penalty The contractive penalty can also obtain useless results unless corrective action is taken
21.
14.8 Predictive SparseDecomposition PSD is hybrid model of sparse coding and parametric AE. A parametric AE is trained to predict the output of iterative inference and have been applied to unsupervised feature learning for object recognition in images/ video and audio PSD consists an encoder / decoder which are both parametric. The training algorithm alternates between minimizing wrt [h] and minimizing wrt to the model parameters PSD regularizes the decoder to use parameters for f(x) can infer good values For PSD the parametric encoder [f] is used to compute the learned features when the model is deployed. Evaluating [f] is computationally inexpensive vs inferring [h] via gradient descent. PSDs can be stacked and used to initialize a deep network
22.
14.9 Applications ofAutoencoders AE have successfully applied to recommendation systems, dimensionality reduction and information retrieval. The learned representation in [h] were qualitatively easier to interpret and relate to the underlying categories, with those categories manifesting as clusters Lower dimensional representation can improve performance on classification tasks since they consume less memory cheaper to run One task that benefits greatly from dimensionality reduction is information retrieval since search can become extremely efficient in low dimension spaces.
23.
14.9 Applications ofAutoencoders We can use DR (dimensional reduction) to produce [h] that is low dimension and binary and then store the entries in database mapping binary code vectors to entries (lookup) Searching of the hash table is very efficient. This approach to IR (Information Retrieval) via DM (data mining) and binarization is called semantic hashing. To produce binary codes for semantic hashing, we typically use an encoder with sigmoids (as activation function) on the final layer.
24.
Thanks Special thanks toLaura Montoya and Accel.ai ! Ashish Kumar ashish.fagna@gmail.com Twitter: @ashish_fagna LinkedIn: https://www.linkedin.com/in/ashkmr1