I Use This When...
I need to answer, "if I change this parameter a tiny bit, what happens?" That question shows up everywhere in ML because learning is just repeated parameter updates.
Why It Exists
The "why" chain is:
- A model has parameters.
- Parameters affect predictions.
- Predictions affect loss.
- To reduce loss, we need local sensitivity.
- The derivative is that local sensitivity.
In one dimension you get a derivative. In many dimensions you get a gradient vector. That vector points uphill, so optimization moves the other way.
Visual Intuition
For a 1D curve, the derivative is the slope of the tangent line at one point. Positive means the curve is rising to the right. Negative means it is falling.
For a 2D or higher-dimensional surface, you cannot summarize local change with one number. You need one partial derivative per axis. Stack those together and you get the gradient vector.
In the linear regression demo, the two axes of that parameter space are w and
b. The gradient tells you how changing each one will change loss.
-> Interactive Demo: Linear Regression
The Math Inside
For a scalar function f(x), the derivative is:
f'(x) = lim(h -> 0) (f(x + h) - f(x)) / h
This is the instantaneous rate of change.
For a multivariable function f(x_1, x_2, ..., x_n), the gradient is:
grad(f) = [df/dx_1, df/dx_2, ..., df/dx_n]
Each component answers a separate question:
- if only
x_1changes, how fast doesfchange? - if only
x_2changes, how fast doesfchange? - and so on
For linear regression, the loss is a function of w and b:
L(w, b) = (1/n) * sum((y_i - (wx_i + b))^2)
So the gradient is:
grad(L) = [dL/dw, dL/db]
That is the object used by gradient descent.
Examples
If f(x) = x^2, then f'(x) = 2x.
- at
x = 3, the slope is6 - at
x = 0, the slope is0 - at
x = -2, the slope is-4
So the derivative tells you both direction and steepness.
Code
def derivative_of_square(x):
return 2 * x
def gradient_of_quadratic(x, y):
# f(x, y) = x^2 + 3y^2
return (2 * x, 6 * y)
Used In
- Gradient Descent — Following the gradient
- MLP & Backprop — Computing gradients
- Partial Derivatives — Gradients are vectors of partials