What Is AI/ML?
Machine Learning is a subset of AI where systems learn patterns from data instead of being explicitly programmed. Deep Learning is a subset of ML using neural networks with many layers.
AI (Artificial Intelligence)
└── ML (Machine Learning)
├── Supervised Learning labeled data → predict
│ ├── Regression predict a number
│ └── Classification predict a category
├── Unsupervised Learning no labels → find structure
│ ├── Clustering group similar data
│ └── Dimensionality Reduction compress data
├── Deep Learning neural networks with depth
│ ├── CNN images
│ ├── RNN / LSTM sequences
│ ├── Transformer attention-based (LLMs)
│ ├── GAN generate data
│ └── Diffusion generate images
└── Reinforcement Learning learn by trial and error
├── Q-Learning value-based
├── Policy Gradient policy-based
└── RLHF align AI with humans
The Historical Arc
Every model exists because the previous one had a limitation. See Timeline for the full story.
Statistics (1800s)
→ Perceptron (1958) — "mimic the brain"
→ AI Winter — "it doesn't work"
→ Decision Trees, SVM (1980-90s) — "try something else"
→ Backprop revival (1986) — "neural nets can learn"
→ Deep Learning (2012) — "GPUs change everything"
→ Transformer (2017) — "attention is all you need"
→ LLMs (2020s) — "scale is all you need"
Learning Path
Start from what you use, drill down to why it works:
- I use it — What does this model do in practice?
- Why it exists — What problem did it solve that others couldn't?
- Visualize — Interactive demo, see it work
- Code — Implement it yourself
- Math inside — The equation it optimizes
- Math foundations — The prerequisite knowledge
Topics
Supervised Learning
- Linear Regression — Where it all began
- Polynomial / Ridge / Lasso — When a straight line isn't enough
- Logistic Regression — Classification with probabilities
- k-NN — Simplest classifier: follow your neighbors
- Naive Bayes — Bayes' theorem in action
- Decision Tree — Human-readable rules
- Random Forest — Wisdom of crowds
- Gradient Boosting / XGBoost — Kaggle's weapon of choice
- SVM — The mathematically optimal boundary
Unsupervised Learning
- k-Means — Find groups without labels
- Hierarchical Clustering — Tree-shaped grouping
- DBSCAN — Density-based clustering
- GMM — Probabilistic clustering
- PCA — Compress dimensions
- t-SNE — Visualize high dimensions
- UMAP — Faster, better t-SNE
- Autoencoder — Neural compression
Deep Learning
- Perceptron — The first neuron
- MLP & Backpropagation — Learning to learn
- CNN — See like a machine
- RNN / LSTM / GRU — Remember sequences
- Transformer — Attention is all you need
- GAN — Create by competing
- Diffusion Models — Create from noise
- AlexNet → ResNet — The ImageNet revolution
- BERT — Understanding language
- GPT — Generating language
- ViT — Transformer meets vision
Reinforcement Learning
- Q-Learning — Learn by reward
- DQN — Q-Learning + Neural Network
- Policy Gradient / PPO — Learn the strategy directly
- RLHF — Aligning AI with humans
Foundations (Cross-cutting)
- Bias-Variance Tradeoff
- Loss Functions
- Optimization (SGD → Adam)
- Regularization
- Model Evaluation
- Ensemble Methods