Skip to main content

6.1.4 Forward and Backward Propagation

Neural network forward and backward propagation diagram

Section Overview

Training a neural network is a loop: predict, measure error, compute gradients, update parameters, repeat.

What You Will Build

This lesson runs one tiny PyTorch example that shows:

  • a forward pass;
  • binary cross-entropy loss;
  • gradients created by loss.backward();
  • parameter updates created by optimizer.step();
  • a mini training loop with decreasing loss.

Backpropagation error responsibility allocation diagram

Setup

python -m pip install -U torch

Run the Complete Lab

Create forward_backward_lab.py:

import torch
import torch.nn as nn


torch.manual_seed(42)

x = torch.tensor([[1.0, 2.0]])
y = torch.tensor([[1.0]])
model = nn.Sequential(nn.Linear(2, 1), nn.Sigmoid())
loss_fn = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

print("one_training_step")
with torch.no_grad():
before = model(x)
print("prediction_before=", round(float(before.item()), 3))

pred = model(x)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()

linear = model[0]
print("loss_before=", round(float(loss.item()), 4))
print("weight_grad=", [[round(float(v), 4) for v in row] for row in linear.weight.grad.tolist()])
print("bias_grad=", [round(float(v), 4) for v in linear.bias.grad.tolist()])
optimizer.step()

with torch.no_grad():
after = model(x)
new_loss = loss_fn(after, y)
print("prediction_after=", round(float(after.item()), 3))
print("loss_after=", round(float(new_loss.item()), 4))

print("mini_training_loop")
for step in range(1, 6):
pred = model(x)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"step={step} loss={loss.item():.4f} pred={pred.item():.3f}")

Run it:

python forward_backward_lab.py

Expected output:

one_training_step
prediction_before= 0.825
loss_before= 0.1927
weight_grad= [[-0.1753, -0.3505]]
bias_grad= [-0.1753]
prediction_after= 0.888
loss_after= 0.1183
mini_training_loop
step=1 loss=0.1183 pred=0.888
step=2 loss=0.0861 pred=0.918
step=3 loss=0.0678 pred=0.934
step=4 loss=0.0560 pred=0.945
step=5 loss=0.0478 pred=0.953

Forward and backward lab result map

Read the Five Steps

NumPy to PyTorch training loop comparison diagram

One training step has a fixed order:

StepCodeMeaning
forwardpred = model(x)compute prediction
lossloss = loss_fn(pred, y)measure error
clearoptimizer.zero_grad()remove old gradients
backwardloss.backward()compute gradients
updateoptimizer.step()change parameters

The order matters. If you forget zero_grad(), gradients accumulate from previous steps. If you forget step(), the model never updates.

Forward Propagation

Forward propagation means data moves from input to output:

pred = model(x)

Here the model is:

nn.Sequential(nn.Linear(2, 1), nn.Sigmoid())

The linear layer computes a score, and Sigmoid turns it into a probability-like value.

Loss Function

The target is 1.0, and the prediction starts at 0.825, so the model is close but not perfect:

loss_before= 0.1927

BCELoss means binary cross-entropy. It is suitable here because the output is a binary probability after Sigmoid.

For later PyTorch work, remember this pairing:

Output styleLoss
final Sigmoid probabilitynn.BCELoss()
raw logits without Sigmoidnn.BCEWithLogitsLoss()
multi-class raw logitsnn.CrossEntropyLoss()

Backward Propagation

loss.backward() fills gradient fields:

weight_grad= [[-0.1753, -0.3505]]
bias_grad= [-0.1753]

A gradient tells the optimizer how changing a parameter would change the loss. You do not manually derive every gradient in PyTorch; autograd builds the computation graph during forward pass and uses it during backward pass.

Optimizer Step

After optimizer.step(), the prediction moves closer to the target:

prediction_before= 0.825
prediction_after= 0.888
loss_after= 0.1183

That is training in miniature: parameters changed, the prediction improved, and loss decreased.

Practical Debugging Checklist

SymptomLikely causeFix
loss never changesforgot optimizer.step()call step() after backward()
gradients keep growing strangelyforgot zero_grad()clear gradients every step
grad is Nonetensor not connected to loss or no backward()check computation graph
binary loss errorsoutput/target shape mismatchmake both [batch, 1] here
loss becomes nanlearning rate too high or invalid valueslower LR, inspect inputs

Practice

  1. Change lr=0.5 to 0.05 and 1.0. How does loss change?
  2. Remove optimizer.zero_grad() and print gradients. What accumulates?
  3. Replace nn.BCELoss() with nn.BCEWithLogitsLoss() and remove nn.Sigmoid().
  4. Add another sample to x and y, then verify shapes.
  5. Print model weights before and after optimizer.step().

Pass Check

You are done when you can explain:

  • forward pass computes predictions;
  • loss measures error;
  • backward pass computes gradients;
  • optimizer step updates parameters;
  • zero_grad() prevents old gradients from accumulating.