Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Recurrent Neural Networks (RNN) Tutorial

Introduction to Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows them to exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs, making them suitable for tasks such as speech recognition, language modeling, and time series prediction.

How RNNs Work

In a traditional neural network, it is assumed that all inputs (and outputs) are independent of each other. However, for many tasks, this assumption is not valid. RNNs address this by having loops in the network, allowing information to persist.

Consider an RNN that processes a sequence of words to predict the next word in a sentence. At each time step, the network takes an input and updates its hidden state. The hidden state is passed to the next time step and used along with the new input to produce the output.

time step t:    x_t  ----> [hidden state h_t] ----> output y_t
                             |
time step t+1:  x_{t+1} ----> [hidden state h_{t+1}] ----> output y_{t+1}
                

Mathematical Representation

The RNN cell at each time step can be described mathematically as follows:

Given an input sequence x = (x1, x2, ..., xt), the hidden state ht at time step t is computed using the following equations:

ht = f(Wxh * xt + Whh * ht-1 + bh)
yt = Why * ht + by
                

Here:

  • Wxh is the weight matrix for the input.
  • Whh is the weight matrix for the hidden state.
  • Why is the weight matrix for the output.
  • bh and by are the bias terms.
  • f is the activation function, such as tanh or ReLU.

Types of RNNs

There are several types of RNNs, each designed to handle different kinds of sequence modeling tasks:

  • One-to-One: Standard neural network.
  • One-to-Many: Used for image captioning.
  • Many-to-One: Used for sentiment analysis.
  • Many-to-Many: Used for machine translation or video classification.
One-to-Many:  Image ----> [RNN] ----> Sentence
Many-to-One:  Sentence ----> [RNN] ----> Sentiment
Many-to-Many: Sentence ----> [RNN] ----> Translation
                

Training RNNs

Training RNNs involves backpropagation through time (BPTT). This process involves unrolling the RNN through time and applying the backpropagation algorithm to update the weights.

However, RNNs can suffer from the vanishing and exploding gradient problems, where gradients become too small or too large, making training difficult. Advanced architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) were introduced to mitigate these issues.

Implementing RNNs with Python

Let's implement a simple RNN using Python and the TensorFlow library:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# Define the RNN model
model = Sequential()
model.add(SimpleRNN(50, input_shape=(None, 1)))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Print the model summary
model.summary()
                

In this example, we create a simple RNN with 50 units and a dense output layer. The model is compiled using the Adam optimizer and mean squared error loss function.

Example: Predicting Sine Wave

Let's use our RNN model to predict a sine wave:

import numpy as np
import matplotlib.pyplot as plt

# Generate sine wave data
x = np.linspace(0, 50, 500)
y = np.sin(x)

# Prepare the data for the RNN
X = []
Y = []
for i in range(len(y)-10):
    X.append(y[i:i+10])
    Y.append(y[i+10])
X = np.array(X).reshape(-1, 10, 1)
Y = np.array(Y)

# Train the model
model.fit(X, Y, epochs=100, verbose=1)

# Make predictions
predictions = model.predict(X)

# Plot the results
plt.plot(y, label='True')
plt.plot(np.arange(10, 500), predictions.flatten(), label='Predicted')
plt.legend()
plt.show()
                

This example demonstrates how to generate sine wave data, prepare it for the RNN, train the model, and make predictions. The results are plotted to visualize the true and predicted values.

Conclusion

Recurrent Neural Networks (RNNs) are powerful tools for sequence modeling tasks. While they can be challenging to train, advanced architectures like LSTMs and GRUs have made it easier to handle long-term dependencies. With frameworks like TensorFlow, implementing and experimenting with RNNs has become more accessible.