Skip to content

6.1.1 Neural Network Roadmap: Linear Layer, Activation, Loss, Update

Neural networks are not magic. A layer first does a weighted sum, then an activation changes the shape of the signal, then training adjusts weights to reduce loss.

Neural network basics chapter relationship diagram

Keep this loop:

inputweighted sumactivationlossgradientupdate weights
WordFirst meaning
neuronweighted sum plus bias
activationnonlinearity such as ReLU
forward passcompute prediction
backward passcompute responsibility for error
optimizerupdate weights using gradients

Create nn_first_loop.py and run it after installing torch.

import torch
x = torch.tensor([[1.0, -2.0, 3.0]])
weights = torch.tensor([[0.5], [-1.0], [0.25]])
bias = torch.tensor([0.1])
linear_output = x @ weights + bias
activated = torch.relu(linear_output)
print("linear_output:", round(linear_output.item(), 3))
print("relu_output:", round(activated.item(), 3))

Expected output:

Terminal window
linear_output: 3.35
relu_output: 3.35

If the linear output were negative, ReLU would turn it into 0. That small gate is what lets stacked layers model nonlinear patterns.

OrderReadWhat to focus on
16.1.2 ML to DL Bridgewhat changes after sklearn
26.1.3 Neurons and Activationweighted sum, bias, ReLU
36.1.4 Forward and Backwardprediction, loss, gradient
46.1.5 OptimizersSGD, Momentum, Adam intuition
56.1.6 Regularizationoverfitting controls
66.1.7 Weight Initializationstable starting points
76.1.8 Optional Historywhy backprop, CNN, RNN, Attention, and Transformer appeared

By the end of 6.1, keep one short note with these four lines:

One Layer
input @ weights + bias
Nonlinearity
activation lets stacked layers model curved patterns
Training
forward loss backward optimizer step
Debug First
check shape, loss, gradient, update

This note becomes the pocket map for PyTorch, CNN, RNN, and Transformer later in Chapter 6.

You pass this roadmap when you can explain one layer as input @ weights + bias, describe what an activation does, and connect loss, gradient, and optimizer into one training loop.

Check reasoning and explanation
  1. A passing answer connects tensors, model layers, loss, backward(), and optimizer updates into one training loop.
  2. The evidence should include a runnable mini experiment, tensor-shape checks, and a loss or validation curve you can explain.
  3. A good self-check names one failure mode such as shape mismatch, no loss decrease, overfitting, data leakage, or using Attention/Transformer words without explaining the data flow.