|
4 | 4 | "cell_type": "markdown", |
5 | 5 | "metadata": {}, |
6 | 6 | "source": [ |
7 | | - "# Deep feature consistent variational auto-encoder\n", |
| 7 | + "# Deep feature consistent variational autoencoder\n", |
8 | 8 | "\n", |
9 | 9 | "## Introduction\n", |
10 | 10 | "\n", |
11 | | - "This article introduces the *deep feature consistent variational auto-encoder*<sup>[1]</sup> (DFC VAE) and provides a Keras implementation to demonstrate the advantages over a plain *variational auto-encoder*<sup>[2]</sup> (VAE).\n", |
| 11 | + "This article introduces the *deep feature consistent variational autoencoder*<sup>[1]</sup> (DFC VAE) and provides a Keras implementation to demonstrate the advantages over a plain *variational autoencoder*<sup>[2]</sup> (VAE).\n", |
12 | 12 | "\n", |
13 | 13 | "A plain VAE is trained with a loss function that makes pixel-by-pixel comparisons between the original image and the reconstructured image. This often leads to generated images that are rather blurry. DFC VAEs on the other hand are trained with a loss function that first feeds the original and reconstructed image into a pre-trained convolutional neural network (CNN) to extract higher level features and then compares the these features to compute a so-called *perceptual loss*. \n", |
14 | 14 | "\n", |
15 | 15 | "The core idea of the perceptual loss is to seek consistency between the hidden representations of two images. Images that are perceived to be similar should also have a small perceptual loss even if they significantly differ in a pixel-by-pixel comparison (due to translation, rotation, ...). This results in generated images that look more naturally and are less blurry. The CNN used for feature extraction is called *perceptual model* in this article.\n", |
16 | 16 | "\n", |
17 | 17 | "### Plain VAE\n", |
18 | 18 | "\n", |
19 | | - "In a [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/master/variational_autoencoder.ipynb?flush_cache=true) I introduced the variational auto-encoder (VAE) and how it can be trained with a variational lower bound $\\mathcal{L}$ as optimization objective using stochastic gradient ascent methods. In context of stochastic gradient descent its negative value is used as loss function $L_{vae}$ which is a sum of a reconstruction loss $L_{rec}$ and a regularization term $L_{kl}$:\n", |
| 19 | + "In a [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/master/variational_autoencoder.ipynb?flush_cache=true) I introduced the variational autoencoder (VAE) and how it can be trained with a variational lower bound $\\mathcal{L}$ as optimization objective using stochastic gradient ascent methods. In context of stochastic gradient descent its negative value is used as loss function $L_{vae}$ which is a sum of a reconstruction loss $L_{rec}$ and a regularization term $L_{kl}$:\n", |
20 | 20 | "\n", |
21 | 21 | "$$\n", |
22 | 22 | "\\begin{align*}\n", |
|
42 | 42 | "cell_type": "markdown", |
43 | 43 | "metadata": {}, |
44 | 44 | "source": [ |
45 | | - "*Fig. 1: Plain variational auto-encoder*" |
| 45 | + "*Fig. 1: Plain variational autoencoder*" |
46 | 46 | ] |
47 | 47 | }, |
48 | 48 | { |
|
78 | 78 | "cell_type": "markdown", |
79 | 79 | "metadata": {}, |
80 | 80 | "source": [ |
81 | | - "*Fig. 2. Deep feature consistent variational auto-encoder*" |
| 81 | + "*Fig. 2. Deep feature consistent variational autoencoder*" |
82 | 82 | ] |
83 | 83 | }, |
84 | 84 | { |
|
276 | 276 | "cell_type": "markdown", |
277 | 277 | "metadata": {}, |
278 | 278 | "source": [ |
279 | | - "After loading the MNIST dataset and normalizing pixel values to interval $[0,1]$ we have now everything we need to train the two auto-encoders. This takes a few minutes per model on a GPU. The default setting however is to load the pre-trained weights for the auto-encoders instead of training them." |
| 279 | + "After loading the MNIST dataset and normalizing pixel values to interval $[0,1]$ we have now everything we need to train the two autoencoders. This takes a few minutes per model on a GPU. The default setting however is to load the pre-trained weights for the autoencoders instead of training them." |
280 | 280 | ] |
281 | 281 | }, |
282 | 282 | { |
|
367 | 367 | "from variational_autoencoder_dfc_util import plot_image_rows\n", |
368 | 368 | "\n", |
369 | 369 | "def encode(model, images):\n", |
370 | | - " '''Encodes images with the encoder of the given auto-encoder model'''\n", |
| 370 | + " '''Encodes images with the encoder of the given autoencoder model'''\n", |
371 | 371 | " return model.get_layer('encoder').predict(images)[0]\n", |
372 | 372 | "\n", |
373 | 373 | "\n", |
374 | 374 | "def decode(model, codes):\n", |
375 | | - " '''Decodes latent vectors with the decoder of the given auto-encoder model'''\n", |
| 375 | + " '''Decodes latent vectors with the decoder of the given autoencoder model'''\n", |
376 | 376 | " return model.get_layer('decoder').predict(codes)\n", |
377 | 377 | "\n", |
378 | 378 | "\n", |
379 | 379 | "def encode_decode(model, images):\n", |
380 | | - " '''Encodes and decodes an image with the given auto-encoder model'''\n", |
| 380 | + " '''Encodes and decodes an image with the given autoencoder model'''\n", |
381 | 381 | " return decode(model, encode(model, images))\n", |
382 | 382 | "\n", |
383 | 383 | "\n", |
|
497 | 497 | "cell_type": "markdown", |
498 | 498 | "metadata": {}, |
499 | 499 | "source": [ |
500 | | - "On average, the original images have the highest Laplacian variance (highest focus or least blur) whereas the reconstructed images are more blurry. But the images reconstructed by the DFC VAE are significantly less blurry than those reconstructed by the plain VAE. The statistical significance of this difference can verified with a [t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) for paired samples (the same test images are used by both auto-encoders):" |
| 500 | + "On average, the original images have the highest Laplacian variance (highest focus or least blur) whereas the reconstructed images are more blurry. But the images reconstructed by the DFC VAE are significantly less blurry than those reconstructed by the plain VAE. The statistical significance of this difference can verified with a [t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) for paired samples (the same test images are used by both autoencoders):" |
501 | 501 | ] |
502 | 502 | }, |
503 | 503 | { |
|
0 commit comments