6.1.4 Forward and Backward Propagation

Neural network forward and backward propagation diagram

Section Overview

Training a neural network is a loop: predict, measure error, compute gradients, update parameters, repeat.

What You Will Build

This lesson runs one tiny PyTorch example that shows:

a forward pass;
binary cross-entropy loss;
gradients created by loss.backward();
parameter updates created by optimizer.step();
a mini training loop with decreasing loss.

Backpropagation error responsibility allocation diagram

Setup

python -m pip install -U torch

Run the Complete Lab

Create forward_backward_lab.py:

import torch
import torch.nn as nn


torch.manual_seed(42)

x = torch.tensor([[1.0, 2.0]])
y = torch.tensor([[1.0]])
model = nn.Sequential(nn.Linear(2, 1), nn.Sigmoid())
loss_fn = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

print("one_training_step")
with torch.no_grad():
    before = model(x)
print("prediction_before=", round(float(before.item()), 3))

pred = model(x)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()

linear = model[0]
print("loss_before=", round(float(loss.item()), 4))
print("weight_grad=", [[round(float(v), 4) for v in row] for row in linear.weight.grad.tolist()])
print("bias_grad=", [round(float(v), 4) for v in linear.bias.grad.tolist()])
optimizer.step()

with torch.no_grad():
    after = model(x)
    new_loss = loss_fn(after, y)
print("prediction_after=", round(float(after.item()), 3))
print("loss_after=", round(float(new_loss.item()), 4))

print("mini_training_loop")
for step in range(1, 6):
    pred = model(x)
    loss = loss_fn(pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    print(f"step={step} loss={loss.item():.4f} pred={pred.item():.3f}")

Run it:

python forward_backward_lab.py

Expected output:

one_training_step
prediction_before= 0.825
loss_before= 0.1927
weight_grad= [[-0.1753, -0.3505]]
bias_grad= [-0.1753]
prediction_after= 0.888
loss_after= 0.1183
mini_training_loop
step=1 loss=0.1183 pred=0.888
step=2 loss=0.0861 pred=0.918
step=3 loss=0.0678 pred=0.934
step=4 loss=0.0560 pred=0.945
step=5 loss=0.0478 pred=0.953

Forward and backward lab result map

Read the Five Steps

NumPy to PyTorch training loop comparison diagram

One training step has a fixed order:

Step	Code	Meaning
forward	`pred = model(x)`	compute prediction
loss	`loss = loss_fn(pred, y)`	measure error
clear	`optimizer.zero_grad()`	remove old gradients
backward	`loss.backward()`	compute gradients
update	`optimizer.step()`	change parameters

The order matters. If you forget zero_grad(), gradients accumulate from previous steps. If you forget step(), the model never updates.

Forward Propagation

Forward propagation means data moves from input to output:

pred = model(x)

Here the model is:

nn.Sequential(nn.Linear(2, 1), nn.Sigmoid())

The linear layer computes a score, and Sigmoid turns it into a probability-like value.

Loss Function

The target is 1.0, and the prediction starts at 0.825, so the model is close but not perfect:

loss_before= 0.1927

BCELoss means binary cross-entropy. It is suitable here because the output is a binary probability after Sigmoid.

For later PyTorch work, remember this pairing:

Output style	Loss
final `Sigmoid` probability	`nn.BCELoss()`
raw logits without Sigmoid	`nn.BCEWithLogitsLoss()`
multi-class raw logits	`nn.CrossEntropyLoss()`

Backward Propagation

loss.backward() fills gradient fields:

weight_grad= [[-0.1753, -0.3505]]
bias_grad= [-0.1753]

A gradient tells the optimizer how changing a parameter would change the loss. You do not manually derive every gradient in PyTorch; autograd builds the computation graph during forward pass and uses it during backward pass.

Optimizer Step

After optimizer.step(), the prediction moves closer to the target:

prediction_before= 0.825
prediction_after= 0.888
loss_after= 0.1183

That is training in miniature: parameters changed, the prediction improved, and loss decreased.

Practical Debugging Checklist

Symptom	Likely cause	Fix
loss never changes	forgot `optimizer.step()`	call `step()` after `backward()`
gradients keep growing strangely	forgot `zero_grad()`	clear gradients every step
`grad` is `None`	tensor not connected to loss or no `backward()`	check computation graph
binary loss errors	output/target shape mismatch	make both `[batch, 1]` here
loss becomes `nan`	learning rate too high or invalid values	lower LR, inspect inputs

Practice

Change lr=0.5 to 0.05 and 1.0. How does loss change?
Remove optimizer.zero_grad() and print gradients. What accumulates?
Replace nn.BCELoss() with nn.BCEWithLogitsLoss() and remove nn.Sigmoid().
Add another sample to x and y, then verify shapes.
Print model weights before and after optimizer.step().

Pass Check

You are done when you can explain:

forward pass computes predictions;
loss measures error;
backward pass computes gradients;
optimizer step updates parameters;
zero_grad() prevents old gradients from accumulating.

What You Will Build​

Setup​

Run the Complete Lab​

Read the Five Steps​

Forward Propagation​

Loss Function​

Backward Propagation​

Optimizer Step​

Practical Debugging Checklist​

Practice​

Pass Check​