Interactive Timeline

Why this model existed, and what broke before it

This route follows the exact chain from the plan: practical model, visual intuition, then the math underneath. Click any node to drill into a model page. The first live demo is linear regression, and the rest of the timeline is already wired so the history can grow without dead ends.

Flow

I use this when...

Why it exists, what failed before it, what changed after it.

See it move, inspect the math, then jump to the wiki.

Status

3 live demo, 15 planned nodes.

Timeline

mobile stack

1805

Statisticslive demo

Least Squares

Linear Regression

Turn a cloud of noisy points into a predictive line by minimizing squared error.

Prediction was not enough. The next question was whether a machine could separate classes too.

1901

Statisticsroadmap

PCA

Project to directions of variance

Principal Component Analysis compresses high-dimensional data by projecting it onto the directions that preserve the most variance.

Compression and visualization mattered, but many unlabeled problems still needed a way to discover groups without any targets.

1957

Classical MLlive demo

k-Means

Cluster around centroids

k-Means finds groups in unlabeled data by alternating between cluster assignment and centroid updates.

Unsupervised structure discovery mattered even before neural networks, especially when labels did not exist at all.

1967

Classical MLroadmap

k-NN

Classify by neighborhood

k-Nearest Neighbors skips parameter fitting and predicts from the labels of nearby examples instead.

Local voting was intuitive, but later models tried to learn more global representations and decision rules from data.

1986

Deep Learningroadmap

Backpropagation

MLP and chain rule

Backprop made deep networks trainable by pushing error signals backward through each layer.

Neural nets came back, but many teams still needed models that were easier to explain and debug.

1986

Classical MLroadmap

Decision Trees

Split by questions

Instead of weights, learn a sequence of human-readable if/then splits that reduce uncertainty.

Greedy rules were practical, but researchers wanted stronger geometry and cleaner optimization theory.

1989

Classical MLroadmap

Q-Learning

Learn from reward, not labels

Q-Learning estimates how valuable each action is in each state and improves behavior through trial, error, and delayed reward.

Reinforcement learning opened a second path beyond labeled supervision, while classical ML still pushed toward cleaner geometric classifiers.

1992

Classical MLroadmap

SVM

Maximum margin geometry

Support Vector Machines look for the widest possible separating boundary, not just any boundary.

Classical ML sharpened geometry, but sequence problems still needed models that could carry context over time.

1997

Deep Learningroadmap

LSTM

Sequence memory with gates

LSTM made recurrent networks much better at carrying information across long sequences by controlling what to keep, write, and forget.

Sequence models finally had memory, while tree ensembles were about to become the strongest default weapon for structured tabular problems.

1999

Classical MLroadmap

Gradient Boosting

Fix the previous tree

Gradient boosting builds trees sequentially so each new learner focuses on the residual mistakes of the current ensemble.

Structured data kept rewarding smarter tree ensembles, while deep learning was about to explode in vision at scale.

2012

Deep Learningroadmap

AlexNet

Deep learning wins at scale

Convolutional nets plus GPU training shattered ImageNet benchmarks and reset the field.

Deep learning was now winning perception, and the next question was whether neural nets could generate convincingly from scratch.

2014

Deep Learningroadmap

GAN

Generate by adversarial play

GANs trained a generator and discriminator in opposition, making neural generation vivid but notoriously unstable.

Adversarial generation was powerful, but sequence modeling was about to be reorganized around attention instead of recurrence.

2017

LLM Eraroadmap

Transformer

Attention replaces recurrence

Self-attention made long-range context easier to model and training much more parallel.

Once attention scaled, the next move was to pretrain giant language models and reuse them everywhere.

2018

LLM Eraroadmap

BERT

Pretrain understanding

Bidirectional pretraining changed NLP from task-specific models to one large reusable foundation.

Understanding was powerful, but generation at scale ended up reshaping the user interface of AI.

2020

LLM Eraroadmap

GPT

Next-token prediction at scale

Scaling a simple objective on huge text corpora produced flexible general-purpose behavior.

Language took off first. Generative image models soon followed with a very different training story.

2020

LLM Eraroadmap

Diffusion

Generate by denoising

Learn how to reverse noise, then turn that reverse process into image generation.

Generation spread beyond text, but language models still needed preference shaping to become useful assistants.