Skip to content

4.3.2 Derivatives: The Intuition of Rate of Change

Derivative tangent slope illustration

  • Intuitively understand derivative = tangent slope = rate of change
  • Use everyday scenarios (speed, stock prices) to understand derivatives
  • Master common differentiation rules
  • Use Python for numerical differentiation and visualization

First, set a very important learning expectation

Section titled “First, set a very important learning expectation”

This section is not meant to make you “someone who can derive every derivative” right away, but to help you truly understand:

  • What a derivative is describing
  • Why it is directly related to how a model updates its parameters

If you finish one reading and still can’t confidently solve lots of derivative problems, that is completely normal. What matters more is:

  • Can you explain a derivative as a “rate of change”?
  • Can you connect it to later topics like loss changes and parameter updates?

It’s best to place this section back into the context of the whole chapter:

Derivative as a bridge for rate of change

So this section is not an isolated math topic; it is laying the foundation for the optimization storyline that follows.

ScenarioVariableRate of change (derivative)
DrivingDistance changes over timeSpeed (km/h)
StocksStock price changes over timeRise/fall speed
LearningScore changes over practice timeLearning efficiency
AI trainingLoss changes over training stepsConvergence speed

A derivative = the speed at which some quantity changes at a particular moment.

If “tangent slope” still feels a bit abstract, you can first think of a derivative as:

  • the “current speed” on a dashboard

For example, when driving:

  • total distance is an accumulated value
  • speed is how fast you are changing right now

So a derivative is not asking “how much has it changed in total,” but instead asking:

At this moment, how fast is it changing?

import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']
plt.rcParams['axes.unicode_minus'] = False
# Function f(x) = x²
def f(x):
return x ** 2
# Tangent line at x=1
x0 = 1
slope = 2 * x0 # f'(x) = 2x → f'(1) = 2
x = np.linspace(-1, 3, 200)
tangent = slope * (x - x0) + f(x0)
plt.figure(figsize=(8, 6))
plt.plot(x, f(x), 'steelblue', linewidth=2, label='f(x) = x²')
plt.plot(x, tangent, 'r--', linewidth=2, label=f'Tangent line (slope = {slope})')
plt.plot(x0, f(x0), 'ro', markersize=10, zorder=5)
plt.annotate(f'x={x0}, slope={slope}', xy=(x0, f(x0)),
xytext=(x0+0.5, f(x0)+1.5), fontsize=12,
arrowprops=dict(arrowstyle='->', color='gray'))
plt.xlim(-1, 3)
plt.ylim(-1, 8)
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Derivative = Tangent Slope')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Interpretation: The derivative of f(x) = x² at x = 1 is 2, which means “when x increases a little near 1, f(x) increases by about twice as much.”

Numerical differentiation — approximate with Python

Section titled “Numerical differentiation — approximate with Python”

You don’t need to know the formula; as long as you can compute function values, you can compute derivatives:

f’(x) ≈ (f(x + h) - f(x - h)) / (2h) (take h as a very small number)

def numerical_derivative(f, x, h=1e-7):
"""Compute the numerical derivative using the central difference method"""
return (f(x + h) - f(x - h)) / (2 * h)
# Test: the derivative of f(x) = x² should be 2x
f = lambda x: x ** 2
for x0 in [0, 1, 2, 3]:
approx = numerical_derivative(f, x0)
exact = 2 * x0
print(f"x={x0}: numerical derivative={approx:.6f}, exact derivative={exact}")

Keep this page’s proof of learning as a small evidence card:

Function
objective, loss, derivative, gradient, or chain-rule expression
Calculation
numeric derivative, gradient step, or backprop trace
Output
slope, gradient vector, updated parameter, or loss change
Failure Check
sign error, learning rate too large, local slope misunderstanding, or broken chain
Expected Output
calculation trace showing how a parameter changes

You don’t need to memorize all the rules. Just get familiar with the most common ones:

FunctionDerivativeExample
Constant c0(5)’ = 0
x to the n-th powern × x to the (n-1)-th power(x³)’ = 3x²
e to the x powere to the x power(eˣ)’ = eˣ
ln(x)1/x(ln x)’ = 1/x
sin(x)cos(x)(sin x)’ = cos x
# Verify common differentiation rules
functions = [
("", lambda x: x**3, lambda x: 3*x**2),
("", lambda x: np.exp(x), lambda x: np.exp(x)),
("ln(x)", lambda x: np.log(x), lambda x: 1/x),
("sin(x)", lambda x: np.sin(x), lambda x: np.cos(x)),
]
print(f"{'Function':<10} {'x':<5} {'Numerical Derivative':<15} {'Analytical Derivative':<15} {'Error':<15}")
print("-" * 60)
for name, f, f_prime in functions:
x0 = 1.0
numerical = numerical_derivative(f, x0)
analytical = f_prime(x0)
error = abs(numerical - analytical)
print(f"{name:<10} {x0:<5} {numerical:<15.8f} {analytical:<15.8f} {error:<15.2e}")

Visualization: functions and their derivatives

Section titled “Visualization: functions and their derivatives”
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
cases = [
('f(x) = x²', lambda x: x**2, lambda x: 2*x),
('f(x) = x³', lambda x: x**3, lambda x: 3*x**2),
('f(x) = sin(x)', np.sin, np.cos),
('f(x) = eˣ', np.exp, np.exp),
]
for ax, (name, f, f_prime) in zip(axes.flat, cases):
x = np.linspace(-2, 2, 200)
ax.plot(x, f(x), 'steelblue', linewidth=2, label='f(x)')
ax.plot(x, f_prime(x), 'coral', linewidth=2, linestyle='--', label="f'(x)")
ax.axhline(y=0, color='gray', linewidth=0.5)
ax.set_title(name, fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)
plt.suptitle('Functions (blue) and derivatives (red)', fontsize=14)
plt.tight_layout()
plt.show()

The derivative of the loss function = the direction of optimization

Section titled “The derivative of the loss function = the direction of optimization”
flowchart LR
L["Loss function L(w)"] --> D["Compute derivative dL/dw"]
D --> U["Update parameters<br/>w = w - lr × dL/dw"]
U --> L
style L fill:#ffebee,stroke:#c62828,color:#333
style D fill:#e3f2fd,stroke:#1565c0,color:#333
style U fill:#e8f5e9,stroke:#2e7d32,color:#333

A derivative tells you: which direction should the parameters move so that the loss becomes smaller. This is the core idea of gradient descent (we’ll explain it in detail in the next subsection).

# Sigmoid function and its derivative
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
s = sigmoid(x)
return s * (1 - s)
# ReLU function and its derivative
def relu(x):
return np.maximum(0, x)
def relu_derivative(x):
return (x > 0).astype(float)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
x = np.linspace(-5, 5, 200)
# Sigmoid
axes[0].plot(x, sigmoid(x), 'steelblue', linewidth=2, label='sigmoid(x)')
axes[0].plot(x, sigmoid_derivative(x), 'coral', linewidth=2, linestyle='--', label="sigmoid'(x)")
axes[0].set_title('Sigmoid and its derivative')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# ReLU
axes[1].plot(x, relu(x), 'steelblue', linewidth=2, label='ReLU(x)')
axes[1].plot(x, relu_derivative(x), 'coral', linewidth=2, linestyle='--', label="ReLU'(x)")
axes[1].set_title('ReLU and its derivative')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Problem with the Sigmoid derivative: when x is far from 0, the derivative approaches 0 (“vanishing gradient”), which is why deep networks more often use ReLU.


After learning this, what question should you bring to the next section?

Section titled “After learning this, what question should you bring to the next section?”

After reading about derivatives, the most valuable questions to carry forward are:

  1. If a function has more than one variable, how should its rate of change be represented?
  2. If there are many parameters, in which direction should the model change them together?
  3. Why does “the derivative of one variable” naturally grow into “the gradient of multiple variables”?

These three questions will naturally lead you to:


ConceptIntuitionPython implementation
DerivativeThe rate of change of a function at a point(f(x+h) - f(x-h)) / (2h)
Tangent slopeGeometric meaning of a derivativeVisualize by drawing a tangent line
Common rulesPower, exponential, logarithmic, trigonometric functionsVerify with numerical derivatives
Role in AIDerivatives indicate the direction of optimizationFoundation of gradient descent

What should you take away from this section?

Section titled “What should you take away from this section?”
  • The most important intuition about derivatives is “the current rate of change”
  • Numerical differentiation helps you see change first, instead of forcing you to memorize derivations first
  • The most crucial role of derivatives in AI is telling the model which direction to adjust its parameters

Use the numerical_derivative function to compute the derivative of the following functions at x=2, and compare with the exact values:

  1. f(x) = 3x² + 2x - 1
  2. f(x) = 1/x
  3. f(x) = x × sin(x)

Plot f(x) = x³ - 3x and its derivative f’(x) = 3x² - 3 over the range [-3, 3]. Observe: where f’(x) = 0 (x = ±1), what feature of f(x) does that correspond to?

Plot the derivative of Sigmoid, find the maximum value of the derivative and where it occurs. Explain why this leads to the “vanishing gradient” problem.

Reference implementation and walkthrough
  • At x=2, the derivatives are 14 for 3x^2+2x-1, -0.25 for 1/x, and sin(2)+2cos(2)≈0.0770 for x sin(x).
  • For f(x)=x^3-3x, the derivative is zero at x=-1 and x=1, corresponding to a local maximum and local minimum on the plotted curve.
  • The sigmoid derivative is largest at x=0, with value 0.25. Far from zero, the derivative gets close to zero, which makes gradient-based updates very small.