I Use This When...
I want to simplify a probabilistic model by saying two variables stop informing each other once a third variable is known. This assumption is what makes Naive Bayes tractable and what lets larger graphical models factorize cleanly.
Why It Exists
The "why" chain is:
- Joint distributions become hard very quickly.
- Modeling every interaction between variables is expensive.
- Sometimes a hidden or known variable explains the correlation.
- Once that variable is known, the remaining dependence disappears.
Conditional independence exists because many complicated distributions become manageable when you know what common cause or label variable to condition on.
Visual Intuition
Imagine two observations:
- whether a person carries an umbrella
- whether the street is wet
These two are related overall. But if you also know whether it rained, much of that dependence is explained away.
That is the basic intuition:
- marginally dependent
- conditionally independent once a third variable is known
Naive Bayes uses this idea in a strong form: features are assumed independent once you know the class label.
How It Works
- Start with variables
A,B, andC - Ask whether the link between
AandBremains after conditioning onC - If not, factorize the joint distribution using that simplification
- Use the simpler factorization for inference or learning
This can dramatically reduce the number of parameters you need to estimate.
The Math
Conditional independence means:
P(A, B | C) = P(A | C) P(B | C)
Equivalent statement:
P(A | B, C) = P(A | C)
Interpretation:
- once
Cis known,Bgives no extra information aboutA
Why ML cares:
- Naive Bayes factorizes
P(x_1, ..., x_n | y)into a product of per-feature terms - graphical models rely on conditional independence structure to stay tractable
Examples
In Naive Bayes sentiment classification:
C= sentiment labelA= whether the word "great" appearsB= whether the word "amazing" appears
These words are not truly independent in raw language, but the model assumes they are conditionally independent given the sentiment label so the classifier stays simple.
Code
def naive_bayes_likelihood(feature_probs):
total = 1.0
for prob in feature_probs:
total *= prob
return total
Used In
- Naive Bayes — The 'naive' assumption
- Bayes' Theorem — The framework