Skip to content

6.4.1 RNN Roadmap: Process Sequences Step by Step

RNNs are built for ordered data: text, time series, clicks, sensor readings, and any input where earlier steps affect later steps.

RNN sequence model chapter relationship diagram

RNN hidden state rolling memory map

ConceptFirst meaning
sequence lengthhow many time steps
input sizefeatures per step
hidden staterolling memory
LSTM / GRUgated memory control
batch firstshape style [batch, seq_len, features]

Create rnn_first_loop.py and run it after installing torch.

import torch
sequence = torch.randn(2, 3, 5)
gru = torch.nn.GRU(input_size=5, hidden_size=4, batch_first=True)
outputs, hidden = gru(sequence)
print("sequence_shape:", tuple(sequence.shape))
print("outputs_shape:", tuple(outputs.shape))
print("hidden_shape:", tuple(hidden.shape))

Expected output:

Terminal window
sequence_shape: (2, 3, 5)
outputs_shape: (2, 3, 4)
hidden_shape: (1, 2, 4)

Read this as two sequences, three steps each, five features per step. The GRU returns a hidden representation of size 4.

OrderReadWhat to practice
16.4.2 RNN Basicssequence input, hidden state, shape
26.4.3 LSTM and GRUgates, long dependency, memory control
36.4.4 Sequence Practicesliding windows, train/eval loop

Keep one sequence shape note:

Input
[batch, seq_len, features]
Outputs
one hidden representation per step
Hidden
compressed rolling memory
Gate Reason
LSTM/GRU help preserve or discard information
Baseline
compare sequence model against a simple naive rule

You pass this roadmap when you can read [batch, seq_len, features], explain hidden state as rolling memory, and know why LSTM/GRU were introduced for longer dependencies.

Check reasoning and explanation
  1. A passing answer connects tensors, model layers, loss, backward(), and optimizer updates into one training loop.
  2. The evidence should include a runnable mini experiment, tensor-shape checks, and a loss or validation curve you can explain.
  3. A good self-check names one failure mode such as shape mismatch, no loss decrease, overfitting, data leakage, or using Attention/Transformer words without explaining the data flow.