Statistics1805interactive demo live

Least Squares

Linear Regression

Turn a cloud of noisy points into a predictive line by minimizing squared error.

Linear Regression Demo

Fit by hand with the slope and intercept sliders, then let gradient descent do the same job from a deliberately bad start. The blue squares are the squared errors gradient descent is shrinking.

live demo

Current linestep 0/80

y = -0.70x + 7.80

Loss (MSE)

13.842

Loss drop

0.000

show squared-error squares

Fit by handSlope

-2.0-0.702.0

Intercept

-2.07.8010.0

Or let gradient descent fit itLearning rate

smaller = safer0.030

Iterations

more steps = closer fit80

Loss curve

MSE — y-axis fixed to the starting loss so different learning rates look different

Why the line moves

Each parameter has a slope. If increasing that parameter makes loss go up, the derivative is positive, so gradient descent moves in the negative direction instead.

Why squares

Each blue square has side equal to the residual, so its area is the squared error itself. MSE is the average of those areas. Big misses hurt much more than many small ones — that is what gradient descent shrinks first.

Ask deeper

Linear Regression Gradient Descent Derivatives and Gradients Loss Functions