I Use This When...
<!-- Practical use case -->History
Mnih et al. (DeepMind, 2013/2015). Played Atari games from raw pixels. Published in Nature. DeepMind acquired by Google for $500M shortly after.
Why It Exists
Q-table doesn't scale — can't store Q-values for every possible state (e.g., every possible screen pixel combination). Replace the table with a neural network.
How It Works
Visual Intuition
<!-- 3B1B-style animation description -->Step by Step
<!-- Algorithm walkthrough -->Code
# Implementation sketch
The Math Inside
Neural net approximates Q: Q(s,a;θ) ≈ Q*(s,a). Key tricks: experience replay (store transitions, sample randomly) + target network (stabilize training by updating slowly).
Math Prerequisites
<!-- Links to math wiki -->Related
- Q-Learning — The tabular foundation
- Policy Gradient / PPO — Policy-based alternative
- MLP & Backprop — The neural network inside