← Back to timeline
Statistics1805interactive demo live

Least Squares

Linear Regression

Turn a cloud of noisy points into a predictive line by minimizing squared error.

Linear Regression Demo

Fit by hand with the slope and intercept sliders, then let gradient descent do the same job from a deliberately bad start. The blue squares are the squared errors gradient descent is shrinking.

live demo
02468100246810scatter + fitted linexy
Current linestep 0/80

y = -0.70x + 7.80

Loss (MSE)
13.842
Loss drop
0.000
Fit by hand
-2.0-0.702.0
-2.07.8010.0
Or let gradient descent fit it
smaller = safer0.030
more steps = closer fit80

Loss curve

MSE — y-axis fixed to the starting loss so different learning rates look different
13.840lower is better

Why the line moves

Each parameter has a slope. If increasing that parameter makes loss go up, the derivative is positive, so gradient descent moves in the negative direction instead.

Why squares

Each blue square has side equal to the residual, so its area is the squared error itself. MSE is the average of those areas. Big misses hurt much more than many small ones — that is what gradient descent shrinks first.