Principal Component Analysis (PCA)

I Use This When...

I have many features and I want a smaller representation that still preserves most of the structure. PCA is useful for 2D/3D visualization, noise reduction, feature compression, and preprocessing before clustering or other models.

History

Karl Pearson (1901), Harold Hotelling (1933). Over 120 years old and still one of the most used techniques in data science.

Why It Exists

The "why" chain is:

High-dimensional data is hard to see and hard to compute with.
Some directions in feature space matter much more than others.
We want to keep the informative directions and discard the weak ones.
Variance gives a way to measure which directions are carrying structure.

PCA exists because projection is only useful if we project onto the right axes.

How It Works

Visual Intuition

Imagine a cloud of points stretched diagonally across a plane.

the original x-axis and y-axis are not the best summary
PCA rotates to a new axis that follows the longest spread
projecting onto that axis keeps most of the information with fewer dimensions

The timeline node is already wired here:

-> MLViz Node: PCA

Step by Step

Center the data by subtracting the mean
Compute the covariance matrix
Find eigenvectors and eigenvalues of that covariance matrix
Sort directions by eigenvalue size
Project data onto the top k directions

Those top directions are the principal components.

Code

X_centered = X - X.mean(axis=0)
C = X_centered.T @ X_centered / len(X_centered)
eigenvalues, eigenvectors = eig(C)
X_reduced = X_centered @ eigenvectors[:, :k]

The Math Inside

If X is mean-centered data, the covariance matrix is

C = (1/n) X^T X

Then solve

C v = lambda v

v: eigenvector, a direction in feature space
lambda: eigenvalue, how much variance lies along that direction

The principal components are the eigenvectors with the largest eigenvalues.

Projection onto the top k components:

X_reduced = X V_k

Explained variance ratio:

lambda_i / sum_j lambda_j

So PCA is really "find the directions where the data varies most, then keep those directions."

Math Prerequisites

Vectors & Matrices - data matrix representation
Eigenvalues & Eigenvectors - the core idea behind principal directions
Matrix Decomposition - PCA can also be computed via SVD

Eigenvalues — The core math
t-SNE — Non-linear alternative
Autoencoder — Neural network version