Variational Autoencoders Tutorial
Introduction
Variational Autoencoders (VAEs) are a class of generative models that learn to represent high-dimensional data in a lower-dimensional latent space. They combine principles from Bayesian inference and neural networks, making them powerful tools for unsupervised learning tasks.
What is an Autoencoder?
An autoencoder is a type of neural network that is trained to reconstruct its input. It consists of two main parts: the encoder, which compresses the input into a latent representation, and the decoder, which reconstructs the input from this representation.
Variational Autoencoders Explained
VAEs extend traditional autoencoders by introducing a probabilistic twist. Instead of directly mapping an input to a deterministic latent representation, a VAE encodes the input as a distribution (usually Gaussian). This allows for more robust and diverse generation of data.
Mathematical Foundation
The main idea behind VAEs is to maximize the evidence lower bound (ELBO) on the log likelihood of the data. This involves two terms:
- The reconstruction loss, which measures how well the decoder can reconstruct the input.
- The Kullback-Leibler divergence, which measures how closely the learned latent distribution approximates a prior distribution (usually a standard normal distribution).
The loss function for VAEs can be expressed as:
Where:
- x is the input data.
- z is the latent variable.
- P(z) is the prior distribution.
- Q(z|x) is the variational distribution.
Building a Variational Autoencoder
In this section, we will implement a simple VAE using Python and TensorFlow/Keras. We will use the MNIST dataset for demonstration purposes.
Install necessary libraries:
Import libraries and load the dataset:
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))
Defining the VAE Model
Next, we will define the encoder and decoder networks:
Define the encoder:
encoder_inputs = layers.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation='relu', padding='same')(encoder_inputs)
x = layers.MaxPooling2D()(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation='relu')(x)
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var])
Sampling Layer
We need to implement a sampling layer that will sample from the latent space based on the mean and log variance:
Define the sampling function:
z_mean, z_log_var = args
batch = keras.backend.shape(z_mean)[0]
dim = keras.backend.int_shape(z_mean)[1]
epsilon = keras.backend.random_normal(shape=(batch, dim))
return z_mean + keras.backend.exp(0.5 * z_log_var) * epsilon
Defining the Decoder
Now we will define the decoder network:
Define the decoder:
x = layers.Dense(7 * 7 * 32, activation='relu')(latent_inputs)
x = layers.Reshape((7, 7, 32))(x)
x = layers.Conv2DTranspose(32, 3, activation='relu', padding='same')(x)
x = layers.UpSampling2D()(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation='sigmoid', padding='same')(x)
decoder = keras.Model(latent_inputs, decoder_outputs)
Building the VAE Model
We can now build the full VAE model by combining the encoder, decoder, and sampling layer:
Define the VAE:
z_mean, z_log_var = encoder(encoder_inputs)
z = layers.Lambda(sampling)([z_mean, z_log_var])
decoder_outputs = decoder(z)
vae = keras.Model(encoder_inputs, decoder_outputs)
Training the VAE
To train the VAE, we need to define the loss function and compile the model:
Compile and train the model:
xent_loss = keras.backend.binary_crossentropy(keras.backend.flatten(x), keras.backend.flatten(x_decoded_mean))
kl_loss = - 0.5 * keras.backend.sum(1 + z_log_var - keras.backend.square(z_mean) - keras.backend.exp(z_log_var), axis=-1)
return keras.backend.mean(xent_loss + kl_loss)
vae.compile(optimizer='adam', loss=vae_loss)
vae.fit(x_train, x_train, epochs=30, batch_size=128, validation_data=(x_test, x_test))
Generating New Data
After training the VAE, we can generate new samples from the latent space:
Generate new samples:
import matplotlib.pyplot as plt
# Sample random points in the latent space
z_samples = np.random.normal(size=(10, latent_dim))
generated_images = decoder.predict(z_samples)
# Display generated images
for i in range(10):
plt.subplot(2, 5, i + 1)
plt.imshow(generated_images[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.show()
Conclusion
Variational Autoencoders are a powerful tool for generating new data and learning complex distributions. By leveraging probabilistic modeling, VAEs enable robust inference and generation capabilities, making them suitable for various applications in generative modeling.