Skip to content

4.3.1 Calculus Roadmap: How Models Learn by Reducing Loss

Calculus explains how a model changes its parameters. The first goal is intuition: measure change, move in a better direction, repeat.

Calculus and Optimization Learning Map

The training flow is:

Relationship diagram of calculus and optimization sections

IdeaFirst meaning in AI
derivativehow fast one value changes
gradienthow many parameters should change together
gradient descentupdate parameters toward lower loss
chain ruleconnect changes across steps
backpropagationcompute many gradients efficiently

When you later see loss.backward() and optimizer.step(), this chapter is the background.

Create gradient_descent_first_loop.py. It finds a number close to 3 by reducing (w - 3)^2.

w = 0.0
learning_rate = 0.2
for step in range(1, 7):
gradient = 2 * (w - 3)
w = w - learning_rate * gradient
loss = (w - 3) ** 2
print(step, "w=", round(w, 3), "loss=", round(loss, 3))

Expected output:

Terminal window
1 w= 1.2 loss= 3.24
2 w= 1.92 loss= 1.166
3 w= 2.352 loss= 0.42
4 w= 2.611 loss= 0.151
5 w= 2.767 loss= 0.054
6 w= 2.86 loss= 0.02

The number moves toward 3, and the loss gets smaller. That is the training idea before the neural network becomes large.

OrderReadWhat to focus on first
14.3.2 Derivativesrate of change
24.3.3 Partial Derivatives and Gradientsmany parameters changing together
34.3.4 Gradient Descentupdate loop, learning rate, loss curve
44.3.5 Backpropagationchain rule, loss.backward() intuition

Keep this page’s proof of learning as a small evidence card:

Function
objective, loss, derivative, gradient, or chain-rule expression
Calculation
numeric derivative, gradient step, or backprop trace
Output
slope, gradient vector, updated parameter, or loss change
Failure Check
sign error, learning rate too large, local slope misunderstanding, or broken chain
Expected Output
calculation trace showing how a parameter changes

You pass this roadmap when you can explain why gradient descent repeats “compute loss -> compute gradient -> update parameter,” and why a learning rate that is too large can make training unstable.

Check reasoning and explanation
  • The calculus route is passed when you can explain derivative as local change, gradient as multi-parameter direction, and gradient descent as repeated loss-reducing updates.
  • Keep a derivative plot, one gradient vector, one loss curve, and one manual-versus-autograd comparison as evidence.
  • The safest habit is to always ask: if the parameter moves a little in this direction, does the loss go up or down?