Bayes’ Theorem: Study Notes for Further Mathematics (9665)
Hello future statistician! This chapter dives into one of the most powerful and fascinating concepts in probability: Bayes’ Theorem. It’s the mathematical tool we use to update our beliefs when we receive new evidence.
Don't worry if this sounds complicated—we will break it down using clear steps and visual aids like tree diagrams. Mastering Bayes’ Theorem is essential not just for passing your FS1 exam, but for understanding how probability works in the real world, from medical diagnostics to AI!
1. Revisiting Conditional Probability (The Foundation)
Bayes’ Theorem is all about conditional probability, but in reverse. Let’s quickly review the basic formula:
The probability of event \(A\) happening given that event \(B\) has already happened is:
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
Where:
- \(P(A|B)\) is the probability of \(A\) given \(B\).
- \(P(A \cap B)\) is the probability that both \(A\) and \(B\) occur (the intersection).
- \(P(B)\) is the probability of event \(B\).
The Problem Bayes' Solves
In many real-life scenarios, we often know \(P(B|A)\) (the probability of an effect given a cause) but we need to find \(P(A|B)\) (the probability of the cause given the effect). These are not the same!
Analogy: Imagine a factory. It’s easy to know the probability that a machine produces a faulty item (\(P(\text{Faulty} | \text{Machine A})\)). It is much harder, but far more useful, to know the probability that an item came from Machine A, given that it is faulty (\(P(\text{Machine A} | \text{Faulty})\)). This is what Bayes’ Theorem allows us to calculate.
✅ Quick Review: Key Difference
Do not confuse \(P(A|B)\) with \(P(B|A)\)! They are almost always different values. Bayes' Theorem is the method used to link them.
2. Total Probability and Tree Diagrams
The syllabus requires you to use tree diagrams, which are excellent tools for visualizing and calculating conditional probabilities, especially when dealing with the denominator in Bayes’ formula, which is often calculated using the Law of Total Probability.
Construction of Tree Diagrams
A tree diagram starts with initial events that are mutually exclusive (cannot happen simultaneously) and exhaustive (cover all possible outcomes).
Let the initial events be \(A_1\) and \(A_2\). A secondary event \(B\) can occur after either \(A_1\) or \(A_2\).
- Step 1: Initial Branches
The first set of branches represents the unconditional probabilities, e.g., \(P(A_1)\) and \(P(A_2)\).
- Step 2: Secondary Branches
The second set of branches represents the conditional probabilities, e.g., \(P(B|A_1)\) and \(P(B'|A_1)\).
The Multiplication Rule (Along Branches)
To find the probability of the intersection (going along a path), you multiply:
$$P(A_1 \cap B) = P(A_1) \times P(B|A_1)$$
The Law of Total Probability (Summing Branches)
To find the total probability of event \(B\), you sum the probabilities of all paths that lead to \(B\). This is the critical step needed for the denominator of Bayes' Theorem.
If \(A_1\) and \(A_2\) partition the sample space:
$$P(B) = P(A_1 \cap B) + P(A_2 \cap B)$$
Substituting the multiplication rule gives the formal Law of Total Probability:
$$P(B) = P(B|A_1)P(A_1) + P(B|A_2)P(A_2)$$
Syllabus Note: You must be able to apply this method to problems involving at most three events (e.g., three different machines, \(A_1, A_2, A_3\), producing the item \(B\)).
The British Presbyterian minister Thomas Bayes developed this theorem in the 18th century, but it wasn't published until after his death! It remained relatively obscure until the 20th century, when complex computing made it essential for modern data analysis and machine learning.
3. The Formal Bayes’ Theorem
Bayes’ Theorem connects the two conditional probabilities, \(P(A|B)\) and \(P(B|A)\). It is derived simply by substituting the Total Probability rule into the standard conditional probability formula.
The Formula
For two events \(A\) and \(B\):
$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$
In the context where \(A\) is one of a set of mutually exclusive and exhaustive events (\(A_i\)) that lead to \(B\):
$$P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_j P(B|A_j)P(A_j)}$$
This looks intimidating, but remember the structure:
$$P(\text{Cause} | \text{Effect}) = \frac{P(\text{Effect} | \text{Cause}) \times P(\text{Cause})}{\text{Total Probability of Effect}}$$
Understanding the Terms
- \(P(A_i)\): The Prior Probability
This is your initial belief or knowledge about the cause \(A_i\) before any new evidence (\(B\)) is introduced.
- \(P(B|A_i)\): The Likelihood
This is how likely the evidence \(B\) is, assuming the cause \(A_i\) is true.
- \(P(A_i|B)\): The Posterior Probability
This is the updated probability of the cause \(A_i\) after observing the evidence \(B\). This is what you are solving for.
- \(P(B)\): The Evidence (or Marginal Probability)
This is the total probability of observing the event \(B\), calculated using the Law of Total Probability.
4. Step-by-Step Application of Bayes’ Theorem
Bayes' Theorem problems are procedural. If you follow these steps, you will usually find the solution, even if the numbers look messy.
Example Scenario: Medical Testing
A rare disease affects 1% of the population. A test is 90% accurate (it detects the disease 90% of the time, and gives a negative result 90% of the time if the person is healthy).
If a person tests positive, what is the probability they actually have the disease?
Goal: Find \(P(\text{Disease} | \text{Positive Test})\)
Step 1: Define Events and Assign Priors
- \(D\): Has the Disease. \(P(D) = 0.01\) (The Prior)
- \(D'\): Does Not have the Disease (Healthy). \(P(D') = 1 - 0.01 = 0.99\)
- \(T\): Positive Test Result.
Step 2: Assign Likelihoods (Conditional Probabilities)
- Test detects disease (True Positive): \(P(T|D) = 0.90\)
- Test misses disease (False Negative): \(P(T'|D) = 1 - 0.90 = 0.10\)
- Test is negative when healthy (True Negative): \(P(T'|D') = 0.90\)
- Test is positive when healthy (False Positive): \(P(T|D') = 1 - 0.90 = 0.10\)
Step 3: Calculate the Numerator (Intersection)
We need the probability of having the disease AND testing positive:
$$P(D \cap T) = P(T|D)P(D) = 0.90 \times 0.01 = 0.009$$
Step 4: Calculate the Denominator (Total Probability of Positive Test, \(P(T)\))
A positive test can happen in two ways: True Positive (Path 1) OR False Positive (Path 2).
- Path 1: \(P(D \cap T) = 0.009\) (from Step 3)
- Path 2: \(P(D' \cap T) = P(T|D')P(D') = 0.10 \times 0.99 = 0.099\)
$$P(T) = P(D \cap T) + P(D' \cap T) = 0.009 + 0.099 = 0.108$$
Step 5: Apply Bayes’ Formula (Calculate the Posterior)
$$P(D|T) = \frac{P(D \cap T)}{P(T)} = \frac{0.009}{0.108} \approx 0.0833$$
Key Takeaway: Even though the test is 90% accurate, the probability that a person who tests positive actually has the disease is only about 8.33%! This is because the disease is very rare (low prior probability), meaning the number of healthy people getting false positives (0.099) vastly outweighs the number of sick people getting true positives (0.009).
Common Mistakes to Avoid
- Not Using the Law of Total Probability: The biggest error is forgetting to calculate the denominator \(P(B)\) by summing all possible ways event \(B\) could have happened.
- Mixing up Conditionals: Always double-check if you are given \(P(A|B)\) or \(P(B|A)\) in the question text. The structure is crucial.
- Non-Exhaustive Events: Ensure your initial events (\(A_1, A_2, \dots\)) cover 100% of the possibilities. If they don't sum to 1, your Total Probability calculation will be wrong.
5. Bayes’ Theorem with Multiple Events (At Most Three)
In Further Maths (9665), you may encounter problems involving up to three mutually exclusive and exhaustive events, say \(A_1, A_2\), and \(A_3\). This often happens when goods are produced by three different sources (factories, shifts, machines).
If we want to find the probability that an observed outcome \(B\) came specifically from source \(A_1\), we use the extended formula:
$$P(A_1|B) = \frac{P(B|A_1)P(A_1)}{P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)P(A_3)}$$
Process using Tree Diagrams for Three Events
The method remains the same—just with an extra branch!
- The Total Probability Denominator \(P(B)\) will be the sum of three products:
Path 1: \(P(B|A_1)P(A_1)\)
Path 2: \(P(B|A_2)P(A_2)\)
Path 3: \(P(B|A_3)P(A_3)\)
Example context: Three shifts (Morning \(A_1\), Afternoon \(A_2\), Night \(A_3\)) produce bolts. We know their production percentages (\(P(A_i)\)) and their respective fault rates (\(P(B|A_i)\)). If a random faulty bolt (\(B\)) is found, we can use the formula above to determine the probability it came from the Night Shift, \(P(A_3|B)\).
💯 Key Takeaway on Bayes' Theorem
Bayes' Theorem is the rule for inverting conditional probability. It calculates \(P(\text{Cause} | \text{Effect})\) when you are given \(P(\text{Effect} | \text{Cause})\). Always structure your solution using the tree diagram method: calculate the probability along the required branch (numerator), then calculate the total probability of the outcome (denominator) by summing all paths.