"Can neural networks really compress faces efficiently, without losing identity?"
In this post, I explore this question by building and comparing two popular generative compression architectures: Variational Autoencoder (VAE) and Vector Quantized VAE (VQ-VAE) — trained on passport-style human face images.
🔗 GitHub Repository
📂 Dataset Source (Kaggle)
📦 Why Autoencoders for Image Compression?
Autoencoders learn to reconstruct input data from a compact representation (latent space). This enables lossy compression by:
- Removing irrelevant pixel-level noise
- Learning semantic structure (e.g., eyes, nose, face contour)
- Outputting reconstructions that are visually close to original but much smaller in size
But not all autoencoders are created equal. Let’s break down how VAE and VQ-VAE differ — and which one works best for face images.
🔧 Project Setup
- Dataset: 3000+ frontal face images from Kaggle (balanced by lighting, expression, and gender)
- All images resized to 64×64 or 128×128
- Trained on CPU with PyTorch
- Output format: JPEG (quality=85)
# Install dependencies pip install -r requirements.txt
🧠 Architecture 1: Variational Autoencoder (VAE)
VAE is a probabilistic generative model that learns a continuous latent space:
- Encoder outputs mean (μ) and log variance (logσ²)
- Latent vector sampled as:
z = μ + σ * ε
where ε ~ N(0,1) - Decoder reconstructs image from z
mu = fc_mu(encoder(x)) logvar = fc_logvar(encoder(x)) z = reparameterize(mu, logvar) x_hat = decoder(z)
Loss = MSE reconstruction + KL divergence (to enforce Gaussian distribution)
✅ Pros:
- Smooth latent space, good for interpolation
- Easy to implement
❌ Cons:
- Blurry outputs due to probabilistic sampling
- Gaussian prior limits representation precision
📸 Sample Result (64×64, 50 epochs)
🖼️ Original: 93.71 KB 🔁 Reconstructed: 1.62 KB 📉 Compression Rate: 57.84x
🧠 Architecture 2: Vector Quantized VAE (VQ-VAE)
VQ-VAE replaces the continuous latent space with discrete codebook vectors:
- Encoder outputs feature map → quantized to nearest embedding
- Decoder reconstructs image from quantized features
z = encoder(x) quantized, vq_loss = vector_quantizer(z) x_hat = decoder(quantized)
Loss = MSE reconstruction + VQ commitment loss
✅ Pros:
- Sharper and more detailed reconstructions
- Discrete representations better for downstream tasks
❌ Cons:
- Slightly harder to train
- Requires codebook tuning (size, commitment cost)
📸 Sample Result (128×128, 50 epochs)
🖼️ Original: 93.71 KB 🔁 Reconstructed: 3.66 KB 📉 Compression Rate: 25.58x
⚙️ Why These Architectures?
I chose VAE and VQ-VAE because they represent two fundamentally different approaches to learning compressed representations:
VAE | VQ-VAE | |
---|---|---|
Latent Space | Continuous (Gaussian) | Discrete (codebook) |
Output Style | Smooth, blurry | Crisp, pixel-accurate |
Use Case | Interpolation, generation | Compression, deployment |
In practice, the difference was immediately visible: VQ-VAE produced sharper eyes, better skin texture, and preserved the facial layout more accurately.
📊 Comparison Results
Model | Resolution | Epochs | Output Size | Compression Rate | Visual Quality |
---|---|---|---|---|---|
VAE | 64×64 | 20 | 1.54 KB | 60.85× | ⭐⭐☆☆☆ |
VAE | 64×64 | 50 | 1.62 KB | 57.84× | ⭐⭐⭐☆☆ |
VQ-VAE | 64×64 | 20 | 1.62 KB | 57.98× | ⭐⭐⭐⭐☆ |
VQ-VAE | 128×128 | 50 | 3.66 KB | 25.58× | ⭐⭐⭐⭐⭐ |
🖼️ Visual Comparison
VQ-VAE 128×128 – 50 Epochs
VQ-VAE 64×64 – 20 Epochs
VAE 64×64 – 20 Epochs
VAE 64×64 – 50 Epochs
📉 Loss Curves & Insights
VAE Training Loss
)
- Converges smoothly after ~35 epochs
- Most gain occurs early (first 20 epochs)
VQ-VAE Training Losses
- Breakdown: total, reconstruction, and VQ commitment loss
- VQ loss stabilizes quickly while reconstruction improves more gradually
🧠 Takeaways
- VAE is easier to train and interpret but suffers from blur due to probabilistic sampling
- VQ-VAE captures high-frequency structure better and preserves identity at higher compression
- At 64x64, both models compress extremely well, but VQ-VAE outperforms visually
- At 128x128, VQ-VAE dominates in realism and perceptual clarity
💻 Run the Code Yourself
git clone https://github.com/Ertugrulmutlu/VQVAE-and-VAE cd VQVAE-and-VAE pip install -r requirements.txt python main.py
🧾 References
If you found this comparison helpful or insightful, consider ⭐ starring the GitHub repository — and feel free to reach out with feedback or questions!
— Github Ertuğrul Mutlu
Top comments (0)