Variational Autoencoders | Advanced Topics

Introduction

Variational Autoencoders (VAEs) are a class of generative models that learn to represent high-dimensional data in a lower-dimensional latent space. They combine principles from Bayesian inference and neural networks, making them powerful tools for unsupervised learning tasks.

What is an Autoencoder?

An autoencoder is a type of neural network that is trained to reconstruct its input. It consists of two main parts: the encoder, which compresses the input into a latent representation, and the decoder, which reconstructs the input from this representation.

Variational Autoencoders Explained

VAEs extend traditional autoencoders by introducing a probabilistic twist. Instead of directly mapping an input to a deterministic latent representation, a VAE encodes the input as a distribution (usually Gaussian). This allows for more robust and diverse generation of data.

Mathematical Foundation

The main idea behind VAEs is to maximize the evidence lower bound (ELBO) on the log likelihood of the data. This involves two terms:

The reconstruction loss, which measures how well the decoder can reconstruct the input.
The Kullback-Leibler divergence, which measures how closely the learned latent distribution approximates a prior distribution (usually a standard normal distribution).

The loss function for VAEs can be expressed as:

L(x) = E[log P(x|z)] - D_KL(Q(z|x) || P(z))

Where:

x is the input data.
z is the latent variable.
P(z) is the prior distribution.
Q(z|x) is the variational distribution.

Building a Variational Autoencoder

In this section, we will implement a simple VAE using Python and TensorFlow/Keras. We will use the MNIST dataset for demonstration purposes.

Install necessary libraries:

pip install tensorflow numpy matplotlib

Import libraries and load the dataset:

import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

Defining the VAE Model

Next, we will define the encoder and decoder networks:

Define the encoder:

latent_dim = 2
encoder_inputs = layers.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation='relu', padding='same')(encoder_inputs)
x = layers.MaxPooling2D()(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation='relu')(x)
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var])

Sampling Layer

We need to implement a sampling layer that will sample from the latent space based on the mean and log variance:

Define the sampling function:

def sampling(args):
z_mean, z_log_var = args
batch = keras.backend.shape(z_mean)[0]
dim = keras.backend.int_shape(z_mean)[1]
epsilon = keras.backend.random_normal(shape=(batch, dim))
return z_mean + keras.backend.exp(0.5 * z_log_var) * epsilon

Defining the Decoder

Now we will define the decoder network:

Define the decoder:

latent_inputs = layers.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 32, activation='relu')(latent_inputs)
x = layers.Reshape((7, 7, 32))(x)
x = layers.Conv2DTranspose(32, 3, activation='relu', padding='same')(x)
x = layers.UpSampling2D()(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation='sigmoid', padding='same')(x)
decoder = keras.Model(latent_inputs, decoder_outputs)

Building the VAE Model

We can now build the full VAE model by combining the encoder, decoder, and sampling layer:

Define the VAE:

encoder_inputs = layers.Input(shape=(28, 28, 1))
z_mean, z_log_var = encoder(encoder_inputs)
z = layers.Lambda(sampling)([z_mean, z_log_var])
decoder_outputs = decoder(z)
vae = keras.Model(encoder_inputs, decoder_outputs)

Training the VAE

To train the VAE, we need to define the loss function and compile the model:

Compile and train the model:

def vae_loss(x, x_decoded_mean):
xent_loss = keras.backend.binary_crossentropy(keras.backend.flatten(x), keras.backend.flatten(x_decoded_mean))
kl_loss = - 0.5 * keras.backend.sum(1 + z_log_var - keras.backend.square(z_mean) - keras.backend.exp(z_log_var), axis=-1)
return keras.backend.mean(xent_loss + kl_loss)

vae.compile(optimizer='adam', loss=vae_loss)
vae.fit(x_train, x_train, epochs=30, batch_size=128, validation_data=(x_test, x_test))

Generating New Data

After training the VAE, we can generate new samples from the latent space:

Generate new samples:

import random
import matplotlib.pyplot as plt

# Sample random points in the latent space
z_samples = np.random.normal(size=(10, latent_dim))
generated_images = decoder.predict(z_samples)

# Display generated images
for i in range(10):
plt.subplot(2, 5, i + 1)
plt.imshow(generated_images[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.show()

Conclusion

Variational Autoencoders are a powerful tool for generating new data and learning complex distributions. By leveraging probabilistic modeling, VAEs enable robust inference and generation capabilities, making them suitable for various applications in generative modeling.

Variational Autoencoders Tutorial