M1 Chapter: Probability Distribution, Expectation and Variance
Hello everyone! Welcome to one of the most interesting topics in statistics. Don't worry if probability has seemed a bit abstract before. In this chapter, we're going to make it concrete by exploring probability distributions. We'll learn how to predict the "average" outcome of a random event using expectation and measure how spread out the results can be using variance.
Why is this important? It's the mathematics behind games of chance, insurance policies, and even financial investments. By the end of this, you'll be able to analyse a simple game and decide if it's worth playing!
1. Discrete Probability Distributions: The Rulebook for Random Events
What is a Discrete Random Variable?
Think of a "random variable" as a variable (we usually call it X) whose value is a number determined by the outcome of a random experiment. The word "discrete" just means it can only take on specific, countable values. You can't have "half" a result.
- Example: Let X be the number you get when you roll a standard six-sided die. X can be 1, 2, 3, 4, 5, or 6. It can't be 2.5.
- Example: Let Y be the number of heads you get when you flip a coin 3 times. Y can be 0, 1, 2, or 3.
- Analogy: A discrete variable is like the number of people in a room. You can have 3 people or 4 people, but not 3.5 people.
What is a Probability Distribution?
A probability distribution is simply a table, graph, or formula that links every possible value of a discrete random variable with its probability. It's the complete "rulebook" for that random variable.
There are two golden rules for any probability distribution:
- The probabilities must be between 0 and 1. For any value x, $$0 \le P(X=x) \le 1$$.
- The sum of all probabilities must be exactly 1. $$\sum P(X=x) = 1$$. (This means we have accounted for all possible outcomes).
Think of it like a pizza: all the slices (probabilities) must add up to one whole pizza!
Representing a Probability Distribution
The most common way to show a probability distribution is with a simple table.
Example: A biased coin
Imagine a biased coin where the probability of getting a Head is 0.4. Let X be the number of heads in two flips. The possible values for X are 0, 1, or 2.
P(X=0) = P(TT) = 0.6 * 0.6 = 0.36
P(X=1) = P(HT or TH) = (0.4 * 0.6) + (0.6 * 0.4) = 0.24 + 0.24 = 0.48
P(X=2) = P(HH) = 0.4 * 0.4 = 0.16
We can represent this in a distribution table:
x 0 1 2
P(X = x) 0.36 0.48 0.16
Let's check the rules: All probabilities are between 0 and 1. And if we sum them up: $$0.36 + 0.48 + 0.16 = 1.00$$. It works!
Key Takeaway for Section 1
A discrete probability distribution is a list of all possible numerical outcomes of a random event and their corresponding probabilities. The probabilities must all add up to 1.
2. Expectation (E[X]): The Long-Run Average
What is Expectation?
The expectation or expected value, written as E[X], is the average value we would expect to get if we repeated the experiment an infinite number of times. It's a "weighted average," where more likely outcomes have a bigger influence.
Important: The expected value might be a number you can't actually get in a single trial! For example, the expected value of a single die roll is 3.5, but you can't roll a 3.5.
Analogy: Imagine a simple game. You win $10 with a probability of 0.1, and you win $0 with a probability of 0.9. If you play this game 100 times, you'd expect to win 10 times, for a total of $100. Your average winning per game would be $100 / 100 = $1. The expected value is $1.
Calculating Expectation
The formula is simple and intuitive. You multiply each outcome by its probability, and then add them all up.
Formula: $$E[X] = \sum x \cdot P(X=x)$$
Step-by-Step Example
Let's find the expected number of heads from our biased coin example above.
x 0 1 2
P(X = x) 0.36 0.48 0.16
- Multiply each x by its P(X=x):
(0 * 0.36) = 0
(1 * 0.48) = 0.48
(2 * 0.16) = 0.32 - Sum the results:
$$E[X] = 0 + 0.48 + 0.32 = 0.8$$
So, on average, we expect to get 0.8 heads per trial (of two flips).
Properties of Expectation (Useful Shortcuts!)
These rules make calculations much faster. Let a and b be constants.
Property 1: $$E[aX + b] = aE[X] + b$$
In words: If you multiply every outcome by 'a' and add 'b', the expectation also gets multiplied by 'a' and added to 'b'.
Example: A game's payout in dollars is X, and we found that E[X] = $5. The game organiser decides to double the payout and add a $2 service fee. The new payout is Y = 2X - 2. What's the new expected payout?
$$E[Y] = E[2X - 2] = 2E[X] - 2 = 2(5) - 2 = $8$$ No need to recalculate the whole distribution!
Expectation of a Function, E[g(X)]
Sometimes we are interested in the expectation of a function of X, like $$X^2$$. The method is very similar.
Formula: $$E[g(X)] = \sum g(x) \cdot P(X=x)$$
To find $$E[X^2]$$, we just square each x-value before multiplying by the probability. This will be VERY important for calculating variance later.
Example: Finding E[X²]
Using our biased coin example again:
$$E[X^2] = (0^2 \cdot 0.36) + (1^2 \cdot 0.48) + (2^2 \cdot 0.16)$$ $$E[X^2] = (0 \cdot 0.36) + (1 \cdot 0.48) + (4 \cdot 0.16)$$ $$E[X^2] = 0 + 0.48 + 0.64 = 1.12$$
Common Mistake Alert: Notice that $$E[X^2] = 1.12$$ is NOT the same as $$(E[X])^2 = (0.8)^2 = 0.64$$. This is a crucial distinction!
Key Takeaway for Section 2
Expectation (E[X]) is the theoretical long-term average of a random variable. Calculate it by summing up `(outcome * probability)` for all outcomes. The properties, especially $$E[aX + b] = aE[X] + b$$, are powerful shortcuts.
3. Variance (Var(X)): Measuring Spread and Risk
What is Variance?
While expectation tells us the "center" of a distribution, variance tells us how "spread out" the outcomes are from that center. It measures the volatility or risk.
- A low variance means most outcomes are clustered tightly around the expected value. The result is predictable and consistent.
- A high variance means outcomes are spread far from the expected value. The result is unpredictable and risky.
Analogy: Two students, Amy and Ben, both have an average test score (expectation) of 80.
- Amy's scores: 79, 80, 81, 80. (Low Variance - very consistent)
- Ben's scores: 100, 60, 100, 60. (High Variance - all over the place!)
They have the same average, but their performance is very different. Variance captures this difference.
Calculating Variance
There are two formulas for variance. One is good for understanding the concept, and the other is your best friend for actually calculating it.
Formula 1: The Definition (Good for theory)
Let $$\mu = E[X]$$. The variance is the expected value of the squared differences from the mean. $$Var(X) = E[(X-\mu)^2] = \sum (x-\mu)^2 \cdot P(X=x)$$ This formula is a bit slow to use in practice.
Formula 2: The Computational Formula (Use this one!)
This is almost always faster and easier.
Formula: $$Var(X) = E[X^2] - (E[X])^2$$
Memory Aid: "The mean of the squares, minus the square of the mean."
Step-by-Step Example (using the fast formula)
Let's calculate the variance for our biased coin example. We already did the hard work in the previous section!
- Recall our previous results:
$$E[X] = 0.8$$
$$E[X^2] = 1.12$$ - Plug them into the formula:
$$Var(X) = E[X^2] - (E[X])^2$$ $$Var(X) = 1.12 - (0.8)^2$$ $$Var(X) = 1.12 - 0.64 = 0.48$$
The variance of the number of heads is 0.48.
Standard Deviation
Notice that the units of variance are squared (e.g., dollars-squared, heads-squared), which can be strange to interpret. To fix this, we use the standard deviation, which is simply the square root of the variance.
Formula: $$SD(X) = \sigma = \sqrt{Var(X)}$$
For our example, the standard deviation is $$\sqrt{0.48} \approx 0.693$$. This value is in the original units ("heads") and gives a more direct sense of the spread.
Properties of Variance (More Awesome Shortcuts!)
Just like with expectation, these rules save a lot of time. Let a and b be constants.
Property 1: $$Var(aX + b) = a^2 Var(X)$$
Let's break this down:
- Why does the '+ b' disappear? Adding a constant 'b' shifts the entire distribution, but it doesn't change the spread. Imagine adding 10 points to everyone's test score. The average goes up by 10, but the gap between the highest and lowest score remains the same. The spread is unchanged, so the variance is unchanged.
- Why is it 'a²'? Variance is based on squared distances. If you multiply all outcomes by 'a', the distances from the mean are also scaled by 'a'. When you square these distances, the scaling factor becomes 'a²'.
Example:
A game has a payout X with Var(X) = 4. The organiser changes the game so the new payout is Y = 3X + 10. What is the new variance?
$$Var(Y) = Var(3X + 10)$$ $$= 3^2 Var(X)$$ $$= 9 \cdot 4 = 36$$
The new variance is 36. Notice the "+ 10" had no effect.
Key Takeaway for Section 3
Variance (Var(X)) measures the spread or risk of a distribution. Always use the computational formula $$Var(X) = E[X^2] - (E[X])^2$$ to calculate it. Remember the key property: $$Var(aX + b) = a^2Var(X)$$. Standard Deviation is just the square root of the variance.