Introduction to Deep Generative Models Herman Dong Music and Audio Computing Lab (MACLab), Research Center for Information Technology Innovation, Academia Sinica
MuseGAN Learn about our recent work on using GAN to compose pop song at https://salu133445.github.io/musegan/ Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang and Yi-Hsuan Yang. 2017. MuseGAN: Symbolic-domain Music Generation and Accompaniment with Multi-track Sequential Generative Adversarial Networks. arXiv preprint arXiv:1709.06298.
Outline • Brief introduction to deep generative models • AE (Autoencoder) • VAE (Variational Autoencoder) • GAN (Generative Adversarial Networks) • AAE (Adversarial Autoencoder) • VAE/GAN • ADA (Adversarial Domain Adaption) • Reformulation • Graphical model representation • Connection to Wake-sleep Algorithm
AE (Autoencoder) • minimize reconstruction loss EX D E(X) D(E(X)) Lossreconstruction loss latent data
AE (Autoencoder) • minimize reconstruction loss reconstruction loss EX D E(X) D(E(X)) Loss latent data
AE (Autoencoder) • minimize reconstruction loss reconstruction loss EX D E(X) D(E(X)) Loss latent data
VAE (Variational Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) PP(z) z ε latentdata
VAE (Variational Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata ≈
VAE (Variational Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata ≈
VAE (Variational Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata μ σ ≈
VAE (Variational Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata μ σ μ = z σ + ∙ ε ≈
VAE (Variational Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε testing (generation) reconstruction loss KL divergence latentdata μ σ μ = z σ + ∙ ε ≈
GAN (Generative Adversarial Network) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 latent data
GAN (Generative Adversarial Network) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real latent data
GAN (Generative Adversarial Network) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real latent data
GAN (Generative Adversarial Network) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real latent data
GAN (Generative Adversarial Network) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real testing (generation) latent data
GAN vs VAE • GAN • Generator aim to fool the discriminator • Discriminator aim to distinguish generated data from real data • output images are sharper • higher diversity, lower stability • VAE • Objective: reconstruct real data • using pixel-to-pixel loss • output images are more blurred • lower diversity, higher stability GAN VAE A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
GAN vs VAE • GAN • Generator aim to fool the discriminator • Discriminator aim to distinguish generated data from real data • output images are sharper • higher diversity, lower stability • VAE • Objective: reconstruct real data • using pixel-to-pixel loss • output images are more blurred • lower diversity, higher stability GAN VAE A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
AAE (Adversarial Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss PP(z) z ε reconstruction loss latentdata ≈ Loss KL divergence
AAE (Adversarial Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss PP(z) reconstruction loss latentdata ≈ D 1/0 adversarial training Make Q(X) undistinguishable from real p(X) for D Distinguish Q(X) being fake from p(X) being real
AAE (Adversarial Autoencoder) • minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss PP(z) reconstruction loss latentdata ≈ D 1/0 adversarial training Make Q(X) undistinguishable from real p(X) for D Distinguish Q(X) being fake from p(X) being real
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ Loss reconstruction loss X VAE
GAN VAE/GAN data G(z) zG latent D1/0 X
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) zG D1/0 X
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) z D1/0 (G) X
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN (G) X Make G(z) and P(z) undistinguishable from real data
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss (G) X Make G(z) and P(z) undistinguishable from real data
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 (G) X Make G(z) and P(z) undistinguishable from real data 𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 (G) 𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN X Make G(z) and P(z) undistinguishable from real data
VAE/GAN VAE VAEDIS GAN/VAE GAN Ground Truth VAE VAEDIS VAE/GAN Generation test Reconstruction test A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
VAE/GAN Q Q(X) p(z) Loss PP(z) z ε KL divergence latent data ≈ G(z) zG latent D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 (G) 𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN !? X
What’s going on? latentdata Loss reconstruction loss P X Q Q(X) (+ε) P(Q(X)) AE
What’s going on? latentdata Loss reconstruction loss P z~p(z) Loss KL divergenceX Q Q(X) (+ε) P(Q(X)) VAE
What’s going on? latentdata Loss reconstruction loss P z~p(z) 1/0D adversarial divergence X Q Q(X) (+ε) P(Q(X)) AAE
What’s going on? latentdata D1/0 adversarial loss P z~p(z)P(z) X GAN
What’s going on? latentdata D1/0 adversarial loss Loss hidden representation reconstruction loss P z~p(z)P(z) Loss KL divergenceX Q Q(X) (+ε) P(Q(X)) VAE/GAN
What’s going on? latentdata Loss reconstruction loss D1/0 adversarial loss Loss hidden representation reconstruction loss P z~p(z)P(z) Loss KL divergence 1/0D adversarial divergence X Q Q(X) (+ε) P(Q(X))
ADA (Adversarial Domain Adaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) data feature Xsrc C class
ADA (Adversarial Domain Adaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt data feature Xsrc G(Xtgt) C class bad features
ADA (Adversarial Domain Adaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt adversarial training Make G(Xtgt) and G(Xsrc) undistinguishable for D Distinguish G(Xtgt) as target domain from G(Xsrc) as source domain data feature Xsrc G(Xtgt) D 1/0 C class
ADA (Adversarial Domain Adaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt adversarial training Make G(Xtgt) and G(Xsrc) undistinguishable for D Distinguish G(Xtgt) as target domain from G(Xsrc) as source domain data feature Xsrc G(Xtgt) D 1/0 C class
ADA (Adversarial Domain Adaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt adversarial training Make G(Xtgt) and G(Xsrc) undistinguishable for D Distinguish G(Xtgt) as target domain from G(Xsrc) as source domain data feature Xsrc G(Xtgt) D 1/0 C class testing (classification)
On Unifying Deep Generative Models z y x latent (data) data (feature) label
On Unifying Deep Generative Models z y x latent (data) data (feature) label 𝒑𝒑 𝒛𝒛
On Unifying Deep Generative Models • 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator • 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator z y x latent (data) data (feature) label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
On Unifying Deep Generative Models • 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator • 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator • Solid line – generative process • Dashed line – inference process • Hollow arrow – deterministic transformation • Red arrow – adversarial mechanism • 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙 z y x 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent (data) data (feature) label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ADA 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 GAN
On Unifying Deep Generative Models • 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator • 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator • Solid line – generative process • Dashed line – inference process • Hollow arrow – deterministic transformation • Red arrow – adversarial mechanism • 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙 z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 GAN 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ADA latent (data) data (feature) label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN (ADA)
On Unifying Deep Generative Models z y x latent data label VAE
On Unifying Deep Generative Models z y x latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 VAE
On Unifying Deep Generative Models z y x 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
On Unifying Deep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
degenerated adversarial discriminator On Unifying Deep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
degenerated adversarial discriminator adversary activated VAE (AAVAE) On Unifying Deep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
On Unifying Deep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒛𝒛 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN
On Unifying Deep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 InfoGAN z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN degenerated code space 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
On Unifying Deep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒛𝒛 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN
On Unifying Deep Generative Models x y z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒛𝒛 data latent labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒙𝒙 𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
On Unifying Deep Generative Models x y z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒛𝒛 data latent label AAE z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒙𝒙 𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
GAN vs VAE z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 VAE label
InfoGAN vs VAE z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 VAE labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 InfoGAN 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
InfoGAN vs AAE z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 InfoGAN x y z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒛𝒛 latent label 𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝜼𝜼 𝒙𝒙 𝒛𝒛, 𝒚𝒚 data 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 AAE 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
Wake-sleep Algorithm • 𝒉𝒉 - general latent variables • 𝝀𝝀 - general parameters • 𝜽𝜽 - generator parameters • In wake phase, update 𝜽𝜽 by fitting 𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 to 𝒙𝒙 and 𝒉𝒉 inferred by 𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 . • In sleep phase, update 𝝀𝝀 based on generated samples. • VAE: 𝒉𝒉 → 𝒛𝒛, 𝝀𝝀 → 𝜼𝜼 • GAN: 𝒉𝒉 → 𝒚𝒚, 𝝀𝝀 → 𝝓𝝓 Wake: max 𝜽𝜽 𝔼𝔼𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 log 𝑝𝑝𝜃𝜃 𝑥𝑥 ℎ Sleep: max 𝜆𝜆 𝔼𝔼𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 𝒑𝒑 𝒉𝒉 log 𝑞𝑞𝜆𝜆 ℎ 𝑥𝑥
References • D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2014 • I. J. GoodFellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Genrative Adversrial Nets. arXiv:preprint arXiv:1406.2661, 2014 • A. B. L. Larsen, S. K. Sønderby and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015 • A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow and B. Frey. Adversarial Autoencoders. arXiv preprint arXiv:1511.05644, 2016 • Z. Hu, Z. Yang, R. Salakhutdinov and E. P. Xing. On unifying deep generative models. arXiv preprint arXiv:1706.00550, 2017

Introduction to Deep Generative Models

  • 1.
    Introduction to Deep GenerativeModels Herman Dong Music and Audio Computing Lab (MACLab), Research Center for Information Technology Innovation, Academia Sinica
  • 2.
    MuseGAN Learn about ourrecent work on using GAN to compose pop song at https://salu133445.github.io/musegan/ Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang and Yi-Hsuan Yang. 2017. MuseGAN: Symbolic-domain Music Generation and Accompaniment with Multi-track Sequential Generative Adversarial Networks. arXiv preprint arXiv:1709.06298.
  • 3.
    Outline • Brief introductionto deep generative models • AE (Autoencoder) • VAE (Variational Autoencoder) • GAN (Generative Adversarial Networks) • AAE (Adversarial Autoencoder) • VAE/GAN • ADA (Adversarial Domain Adaption) • Reformulation • Graphical model representation • Connection to Wake-sleep Algorithm
  • 4.
    AE (Autoencoder) • minimizereconstruction loss EX D E(X) D(E(X)) Lossreconstruction loss latent data
  • 5.
    AE (Autoencoder) • minimizereconstruction loss reconstruction loss EX D E(X) D(E(X)) Loss latent data
  • 6.
    AE (Autoencoder) • minimizereconstruction loss reconstruction loss EX D E(X) D(E(X)) Loss latent data
  • 7.
    VAE (Variational Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) PP(z) z ε latentdata
  • 8.
    VAE (Variational Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata ≈
  • 9.
    VAE (Variational Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata ≈
  • 10.
    VAE (Variational Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata μ σ ≈
  • 11.
    VAE (Variational Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε reconstruction loss KL divergence latentdata μ σ μ = z σ + ∙ ε ≈
  • 12.
    VAE (Variational Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss Loss PP(z) z ε testing (generation) reconstruction loss KL divergence latentdata μ σ μ = z σ + ∙ ε ≈
  • 13.
    GAN (Generative AdversarialNetwork) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 latent data
  • 14.
    GAN (Generative AdversarialNetwork) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real latent data
  • 15.
    GAN (Generative AdversarialNetwork) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real latent data
  • 16.
    GAN (Generative AdversarialNetwork) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real latent data
  • 17.
    GAN (Generative AdversarialNetwork) • minimize distance between the distribution of real data and generated samples Gz~p(z) G(z) D X 1/0 adversarial training log(1-D(X)) + log(D(G(z))) log(1-D(G(z))) Make G(z) undistinguishable from real data for D Distinguish G(z) being fake from X being real testing (generation) latent data
  • 18.
    GAN vs VAE •GAN • Generator aim to fool the discriminator • Discriminator aim to distinguish generated data from real data • output images are sharper • higher diversity, lower stability • VAE • Objective: reconstruct real data • using pixel-to-pixel loss • output images are more blurred • lower diversity, higher stability GAN VAE A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
  • 19.
    GAN vs VAE •GAN • Generator aim to fool the discriminator • Discriminator aim to distinguish generated data from real data • output images are sharper • higher diversity, lower stability • VAE • Objective: reconstruct real data • using pixel-to-pixel loss • output images are more blurred • lower diversity, higher stability GAN VAE A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
  • 20.
    AAE (Adversarial Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss PP(z) z ε reconstruction loss latentdata ≈ Loss KL divergence
  • 21.
    AAE (Adversarial Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss PP(z) reconstruction loss latentdata ≈ D 1/0 adversarial training Make Q(X) undistinguishable from real p(X) for D Distinguish Q(X) being fake from p(X) being real
  • 22.
    AAE (Adversarial Autoencoder) •minimize reconstruction loss • minimize distance between encoded latent distribution and prior distribution QX Q(X) p(z) Loss PP(z) reconstruction loss latentdata ≈ D 1/0 adversarial training Make Q(X) undistinguishable from real p(X) for D Distinguish Q(X) being fake from p(X) being real
  • 23.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ Loss reconstruction loss X VAE
  • 24.
  • 25.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) zG D1/0 X
  • 26.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) z D1/0 (G) X
  • 27.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN (G) X Make G(z) and P(z) undistinguishable from real data
  • 28.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss (G) X Make G(z) and P(z) undistinguishable from real data
  • 29.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 (G) X Make G(z) and P(z) undistinguishable from real data 𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
  • 30.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) z D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 (G) 𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN X Make G(z) and P(z) undistinguishable from real data
  • 31.
    VAE/GAN VAE VAEDIS GAN/VAE GAN Ground Truth VAE VAEDIS VAE/GAN Generation test Reconstructiontest A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
  • 32.
    VAE/GAN Q Q(X) p(z) Loss PP(z) zε KL divergence latent data ≈ G(z) zG latent D1/0 adversarial training Distinguish G(z) and P(z) being fake from X being real 𝓛𝓛GAN Loss 𝓛𝓛DIS hidden representation reconstruction loss 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 (G) 𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN !? X
  • 33.
  • 34.
    What’s going on? latentdata Loss reconstruction loss Pz~p(z) Loss KL divergenceX Q Q(X) (+ε) P(Q(X)) VAE
  • 35.
    What’s going on? latentdata Loss reconstruction loss Pz~p(z) 1/0D adversarial divergence X Q Q(X) (+ε) P(Q(X)) AAE
  • 36.
  • 37.
  • 38.
  • 39.
    ADA (Adversarial DomainAdaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) data feature Xsrc C class
  • 40.
    ADA (Adversarial DomainAdaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt data feature Xsrc G(Xtgt) C class bad features
  • 41.
    ADA (Adversarial DomainAdaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt adversarial training Make G(Xtgt) and G(Xsrc) undistinguishable for D Distinguish G(Xtgt) as target domain from G(Xsrc) as source domain data feature Xsrc G(Xtgt) D 1/0 C class
  • 42.
    ADA (Adversarial DomainAdaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt adversarial training Make G(Xtgt) and G(Xsrc) undistinguishable for D Distinguish G(Xtgt) as target domain from G(Xsrc) as source domain data feature Xsrc G(Xtgt) D 1/0 C class
  • 43.
    ADA (Adversarial DomainAdaption) • Goal: given labeled data in source domain, aim to classify unlabeled data in target domain. G G(Xsrc) Xtgt adversarial training Make G(Xtgt) and G(Xsrc) undistinguishable for D Distinguish G(Xtgt) as target domain from G(Xsrc) as source domain data feature Xsrc G(Xtgt) D 1/0 C class testing (classification)
  • 44.
    On Unifying DeepGenerative Models z y x latent (data) data (feature) label
  • 45.
    On Unifying DeepGenerative Models z y x latent (data) data (feature) label 𝒑𝒑 𝒛𝒛
  • 46.
    On Unifying DeepGenerative Models • 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator • 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator z y x latent (data) data (feature) label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
  • 47.
    On Unifying DeepGenerative Models • 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator • 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator • Solid line – generative process • Dashed line – inference process • Hollow arrow – deterministic transformation • Red arrow – adversarial mechanism • 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙 z y x 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent (data) data (feature) label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ADA 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 GAN
  • 48.
    On Unifying DeepGenerative Models • 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator • 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator • Solid line – generative process • Dashed line – inference process • Hollow arrow – deterministic transformation • Red arrow – adversarial mechanism • 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙 z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 GAN 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ADA latent (data) data (feature) label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN (ADA)
  • 49.
    On Unifying DeepGenerative Models z y x latent data label VAE
  • 50.
    On Unifying DeepGenerative Models z y x latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 VAE
  • 51.
    On Unifying DeepGenerative Models z y x 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
  • 52.
    On Unifying DeepGenerative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
  • 53.
    degenerated adversarial discriminator On Unifying DeepGenerative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
  • 54.
    degenerated adversarial discriminator adversary activated VAE (AAVAE) On UnifyingDeep Generative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 = � 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1 = � 𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 perfect discriminator VAE
  • 55.
    On Unifying DeepGenerative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒛𝒛 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN
  • 56.
    On Unifying DeepGenerative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 InfoGAN z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN degenerated code space 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
  • 57.
    On Unifying DeepGenerative Models z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒛𝒛 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN
  • 58.
    On Unifying DeepGenerative Models x y z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒛𝒛 data latent labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒙𝒙 𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
  • 59.
    On Unifying DeepGenerative Models x y z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒛𝒛 data latent label AAE z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN 𝒑𝒑 𝒙𝒙 𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
  • 60.
    GAN vs VAE zy x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 latent data label 𝒑𝒑 𝒛𝒛 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛 GAN z y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 VAE label
  • 61.
    InfoGAN vs VAE zy x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚 𝒒𝒒∗ 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 VAE labelz y x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 InfoGAN 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
  • 62.
    InfoGAN vs AAE zy x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒙𝒙 latent data label 𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚 InfoGAN x y z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚 𝒒𝒒𝝓𝝓 𝒓𝒓 𝒚𝒚 𝒛𝒛 latent label 𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚 𝒒𝒒𝜼𝜼 𝒙𝒙 𝒛𝒛, 𝒚𝒚 data 𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚 AAE 𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
  • 63.
    Wake-sleep Algorithm • 𝒉𝒉- general latent variables • 𝝀𝝀 - general parameters • 𝜽𝜽 - generator parameters • In wake phase, update 𝜽𝜽 by fitting 𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 to 𝒙𝒙 and 𝒉𝒉 inferred by 𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 . • In sleep phase, update 𝝀𝝀 based on generated samples. • VAE: 𝒉𝒉 → 𝒛𝒛, 𝝀𝝀 → 𝜼𝜼 • GAN: 𝒉𝒉 → 𝒚𝒚, 𝝀𝝀 → 𝝓𝝓 Wake: max 𝜽𝜽 𝔼𝔼𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 log 𝑝𝑝𝜃𝜃 𝑥𝑥 ℎ Sleep: max 𝜆𝜆 𝔼𝔼𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 𝒑𝒑 𝒉𝒉 log 𝑞𝑞𝜆𝜆 ℎ 𝑥𝑥
  • 64.
    References • D. P.Kingma, M. Welling. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2014 • I. J. GoodFellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Genrative Adversrial Nets. arXiv:preprint arXiv:1406.2661, 2014 • A. B. L. Larsen, S. K. Sønderby and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015 • A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow and B. Frey. Adversarial Autoencoders. arXiv preprint arXiv:1511.05644, 2016 • Z. Hu, Z. Yang, R. Salakhutdinov and E. P. Xing. On unifying deep generative models. arXiv preprint arXiv:1706.00550, 2017