Statistics1805interactive demo live
Least Squares
Linear Regression
Turn a cloud of noisy points into a predictive line by minimizing squared error.
Linear Regression Demo
Fit by hand with the slope and intercept sliders, then let gradient descent do the same job from a deliberately bad start. The blue squares are the squared errors gradient descent is shrinking.
live demo
Current linestep 0/80
y = -0.70x + 7.80
Loss (MSE)
13.842
Loss drop
0.000
Fit by hand
-2.0-0.702.0
-2.07.8010.0
Or let gradient descent fit it
smaller = safer0.030
more steps = closer fit80
Loss curve
MSE — y-axis fixed to the starting loss so different learning rates look differentWhy the line moves
Each parameter has a slope. If increasing that parameter makes loss go up, the derivative is positive, so gradient descent moves in the negative direction instead.
Why squares
Each blue square has side equal to the residual, so its area is the squared error itself. MSE is the average of those areas. Big misses hurt much more than many small ones — that is what gradient descent shrinks first.