I Use This When...
I want the simplest possible learnable classifier: weighted inputs, one decision boundary, one yes/no output. It is mostly a teaching model today, but it is the right place to understand what a neuron is before adding depth, nonlinearities, and backpropagation.
History
Frank Rosenblatt (1958) at Cornell. Built a physical machine (Mark I Perceptron) that could learn to recognize shapes. The New York Times headline: 'New Navy Device Learns by Doing.' Killed by Minsky & Papert (1969) proving it can't learn XOR.
Why It Exists
The "why" chain is:
- We want a machine that can make a yes/no decision from data.
- A weighted sum already gives a score.
- We can turn that score into a hard decision with a threshold.
- If the decision is wrong, adjust the weights toward the correct answer.
That is the perceptron: the smallest possible learning machine that still feels like a neuron.
How It Works
Visual Intuition
Picture points on a plane with two classes. The perceptron draws one straight
line. Everything on one side becomes class 1, everything on the other side
becomes class 0.
Each input feature casts a weighted vote. The perceptron adds those votes and fires only if the total passes a threshold.
This is powerful enough to separate many simple patterns, but not patterns like XOR, where no single straight line can do the job.
Step by Step
- Start with weights
wand biasb - Compute the score
z = w . x + b - Apply a step function to get a hard prediction
- Compare prediction to the true label
- If wrong, shift the weights toward the correct class
The learning is local and simple: misclassified points push the boundary.
Code
def step(z):
return 1 if z >= 0 else 0
def predict(x, w, b):
z = sum(w_i * x_i for w_i, x_i in zip(w, x)) + b
return step(z)
def update(x, y_true, w, b, alpha=0.1):
y_pred = predict(x, w, b)
error = y_true - y_pred
w = [w_i + alpha * error * x_i for w_i, x_i in zip(w, x)]
b = b + alpha * error
return w, b
The Math Inside
Prediction:
y_hat = step(w . x + b)
x: input vectorw: weight vectorb: biasstep(.): returns1if the score is nonnegative, else0
Learning rule:
w <- w + alpha (y - y_hat) x
b <- b + alpha (y - y_hat)
Why does this work?
- if a positive point is predicted as
0, the update pushes weights toward that point - if a negative point is predicted as
1, the update pulls weights away from that point
But there is a hard limit:
- the perceptron only learns linearly separable patterns
- XOR is the famous counterexample
- that limitation is exactly what leads to multi-layer networks
Math Prerequisites
- Dot Product - what
w . xmeans geometrically - Logistic Regression - a probabilistic linear classifier
- MLP & Backprop - what fixes the XOR limitation
Related
- MLP & Backprop — Solves the XOR limitation
- SVM — Another linear classifier, but optimal
- Timeline — The full story