I Use This When...
I have many features and I want a smaller representation that still preserves most of the structure. PCA is useful for 2D/3D visualization, noise reduction, feature compression, and preprocessing before clustering or other models.
History
Karl Pearson (1901), Harold Hotelling (1933). Over 120 years old and still one of the most used techniques in data science.
Why It Exists
The "why" chain is:
- High-dimensional data is hard to see and hard to compute with.
- Some directions in feature space matter much more than others.
- We want to keep the informative directions and discard the weak ones.
- Variance gives a way to measure which directions are carrying structure.
PCA exists because projection is only useful if we project onto the right axes.
How It Works
Visual Intuition
Imagine a cloud of points stretched diagonally across a plane.
- the original x-axis and y-axis are not the best summary
- PCA rotates to a new axis that follows the longest spread
- projecting onto that axis keeps most of the information with fewer dimensions
The timeline node is already wired here:
Step by Step
- Center the data by subtracting the mean
- Compute the covariance matrix
- Find eigenvectors and eigenvalues of that covariance matrix
- Sort directions by eigenvalue size
- Project data onto the top
kdirections
Those top directions are the principal components.
Code
X_centered = X - X.mean(axis=0)
C = X_centered.T @ X_centered / len(X_centered)
eigenvalues, eigenvectors = eig(C)
X_reduced = X_centered @ eigenvectors[:, :k]
The Math Inside
If X is mean-centered data, the covariance matrix is
C = (1/n) X^T X
Then solve
C v = lambda v
v: eigenvector, a direction in feature spacelambda: eigenvalue, how much variance lies along that direction
The principal components are the eigenvectors with the largest eigenvalues.
Projection onto the top k components:
X_reduced = X V_k
Explained variance ratio:
lambda_i / sum_j lambda_j
So PCA is really "find the directions where the data varies most, then keep those directions."
Math Prerequisites
- Vectors & Matrices - data matrix representation
- Eigenvalues & Eigenvectors - the core idea behind principal directions
- Matrix Decomposition - PCA can also be computed via SVD
Related
- Eigenvalues — The core math
- t-SNE — Non-linear alternative
- Autoencoder — Neural network version