🔄 Recurrent Neural Networks

Deep learning for sequential data

What are RNNs?

Recurrent Neural Networks are designed for sequential data where order matters. They have memory of previous inputs, making them perfect for time series, text, speech, and video.

Key Concepts:

  • Hidden State: Memory that carries information across time steps
  • Sequential Processing: Process one element at a time
  • Parameter Sharing: Same weights used at each time step
  • LSTM/GRU: Advanced RNNs that solve vanishing gradient problem

🧠 Basic RNN Architecture

At each time step, RNN takes input and previous hidden state to produce output and new hidden state:

ht = tanh(Whh × ht-1 + Wxh × xt + b)

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Simple RNN for sequence classification
model = keras.Sequential([
    layers.SimpleRNN(
        units=128,              # Number of hidden units
        activation='tanh',      # Default activation
        return_sequences=False, # Return only last output
        input_shape=(None, 10)  # (timesteps, features)
    ),
    layers.Dense(1, activation='sigmoid')
])

model.summary()

⚡ Long Short-Term Memory (LSTM)

LSTM solves the vanishing gradient problem in RNNs using gates to control information flow.

LSTM Architecture

LSTM has three gates:

# Text sentiment classification with LSTM
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# Load IMDB dataset
max_features = 10000  # Top 10,000 words
maxlen = 200          # Max review length

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to same length
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)

print(f"Training shape: {X_train.shape}")  # (25000, 200)

# Build LSTM model
model = keras.Sequential([
    layers.Embedding(max_features, 128),  # Word embeddings
    layers.LSTM(128, dropout=0.2, recurrent_dropout=0.2),
    layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train
history = model.fit(
    X_train, y_train,
    batch_size=128,
    epochs=10,
    validation_split=0.2,
    verbose=1
)

# Evaluate
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest accuracy: {test_acc:.4f}")

🚀 Gated Recurrent Unit (GRU)

GRU is a simpler alternative to LSTM with only two gates, often faster while maintaining similar performance.

# GRU for sequence classification
model = keras.Sequential([
    layers.Embedding(max_features, 128),
    layers.GRU(
        units=128,
        dropout=0.2,
        recurrent_dropout=0.2,
        return_sequences=False
    ),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=128, epochs=10, validation_split=0.2)

# LSTM vs GRU:
# - LSTM: More powerful, more parameters, slower
# - GRU: Simpler, faster, often similar performance
# - Try GRU first, use LSTM if needed

📊 Sequence-to-Sequence Tasks

Many-to-One: Sentiment Analysis

# Input: Sequence of words → Output: Single label
model = keras.Sequential([
    layers.Embedding(vocab_size, 128),
    layers.LSTM(64, return_sequences=False),  # Only last output
    layers.Dense(1, activation='sigmoid')
])

Many-to-Many: Time Series Forecasting

# Input: Sequence → Output: Sequence (same length)
model = keras.Sequential([
    layers.LSTM(64, return_sequences=True, input_shape=(timesteps, features)),
    layers.TimeDistributed(layers.Dense(1))
])

# Example: Stock price prediction
import pandas as pd

# Generate sample time series
time_steps = 100
data = np.cumsum(np.random.randn(1000))

# Create sequences
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

seq_length = 20
X, y = create_sequences(data, seq_length)
X = X.reshape(-1, seq_length, 1)

# Split
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# Build model
model = keras.Sequential([
    layers.LSTM(50, return_sequences=True, input_shape=(seq_length, 1)),
    layers.LSTM(50),
    layers.Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.1)

Sequence-to-Sequence: Machine Translation

# Encoder-Decoder architecture
# Encoder: Process input sequence → context vector
encoder_inputs = layers.Input(shape=(None, num_encoder_features))
encoder = layers.LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder: Generate output sequence from context
decoder_inputs = layers.Input(shape=(None, num_decoder_features))
decoder_lstm = layers.LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = layers.Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)

🔄 Bidirectional RNN

Process sequences in both forward and backward directions for better context understanding.

# Bidirectional LSTM
model = keras.Sequential([
    layers.Embedding(max_features, 128),
    layers.Bidirectional(layers.LSTM(64, return_sequences=True)),
    layers.Bidirectional(layers.LSTM(32)),
    layers.Dense(1, activation='sigmoid')
])

# Benefits:
# - Sees future and past context
# - Better for NLP tasks (e.g., named entity recognition)
# - 2x parameters (forward + backward)
# - Cannot be used for real-time prediction

📝 Text Generation with RNN

# Character-level text generation
text = "Your training text here..."

# Create character vocabulary
chars = sorted(set(text))
char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for i, c in enumerate(chars)}

# Prepare sequences
seq_length = 40
step = 3
sequences = []
next_chars = []

for i in range(0, len(text) - seq_length, step):
    sequences.append(text[i:i+seq_length])
    next_chars.append(text[i+seq_length])

# Vectorize
X = np.zeros((len(sequences), seq_length, len(chars)), dtype=bool)
y = np.zeros((len(sequences), len(chars)), dtype=bool)

for i, seq in enumerate(sequences):
    for t, char in enumerate(seq):
        X[i, t, char_to_idx[char]] = 1
    y[i, char_to_idx[next_chars[i]]] = 1

# Build model
model = keras.Sequential([
    layers.LSTM(128, input_shape=(seq_length, len(chars))),
    layers.Dense(len(chars), activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X, y, batch_size=128, epochs=30)

# Generate text
def generate_text(model, start_string, length=100, temperature=1.0):
    generated = start_string
    
    for _ in range(length):
        # Prepare input
        x = np.zeros((1, seq_length, len(chars)))
        for t, char in enumerate(generated[-seq_length:]):
            x[0, t, char_to_idx[char]] = 1
        
        # Predict
        preds = model.predict(x, verbose=0)[0]
        preds = np.log(preds) / temperature
        exp_preds = np.exp(preds)
        preds = exp_preds / np.sum(exp_preds)
        
        # Sample next character
        next_idx = np.random.choice(len(chars), p=preds)
        next_char = idx_to_char[next_idx]
        generated += next_char
    
    return generated

print(generate_text(model, "The ", length=200, temperature=0.5))

⏰ Time Series Forecasting

# Multivariate time series prediction
# Example: Predict temperature from multiple weather features

# Generate sample data
n_samples = 1000
n_features = 5  # Temperature, humidity, pressure, wind, etc.
data = np.random.randn(n_samples, n_features)

# Create sliding windows
def create_dataset(data, window_size, horizon=1):
    X, y = [], []
    for i in range(len(data) - window_size - horizon + 1):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size:i+window_size+horizon, 0])  # Predict temp
    return np.array(X), np.array(y)

window_size = 24  # 24 hours
horizon = 6       # Predict 6 hours ahead

X, y = create_dataset(data, window_size, horizon)

# Split
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# Build model
model = keras.Sequential([
    layers.LSTM(64, return_sequences=True, input_shape=(window_size, n_features)),
    layers.Dropout(0.2),
    layers.LSTM(32),
    layers.Dropout(0.2),
    layers.Dense(horizon)  # Predict multiple future steps
])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(patience=5, factor=0.5)
    ]
)

# Predict
predictions = model.predict(X_test)

# Visualize
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(y_test[:100, 0], label='Actual')
plt.plot(predictions[:100, 0], label='Predicted')
plt.legend()
plt.title('Time Series Forecast')
plt.show()

🎯 RNN vs LSTM vs GRU

Feature Simple RNN LSTM GRU
Parameters Least Most (4 gates) Middle (2 gates)
Training Speed Fastest Slowest Fast
Long Dependencies Poor Excellent Very Good
Vanishing Gradient Yes No No
Memory Control None Cell state + gates Hidden state + gates
Best For Short sequences Complex patterns Most tasks

💡 Best Practices

⚠️ Common Pitfalls

🎯 RNN Applications

Natural Language Processing

  • Sentiment analysis
  • Machine translation
  • Text generation
  • Named entity recognition
  • Question answering

Time Series

  • Stock price prediction
  • Weather forecasting
  • Energy consumption prediction
  • Anomaly detection

Other Applications

  • Speech recognition
  • Music generation
  • Video analysis
  • Handwriting recognition

🔮 Modern Alternatives: Transformers

Note: For many NLP tasks, Transformer architectures (BERT, GPT) have largely replaced RNNs due to:

However, RNNs are still useful for:

🎯 Key Takeaways