📚 Further Maths Study Notes: Linear Combinations of Discrete Random Variables (FS1.5)
Welcome to one of the most practical and powerful topics in Further Statistics! This chapter is all about combining two or more random variables (like profits from different product lines or measurements from different experiments) into a single, new variable.
You already know how to find the mean (\(E(X)\)) and variance (\(Var(X)\)) of a single variable. Here, we learn how these properties behave when we scale, shift, and mix variables together. This skill is essential for modelling real-world uncertainty!
1. Quick Review: Mean and Variance Basics
Before we combine variables, let's refresh the essentials for a single discrete random variable \(X\):
- Expected Value (Mean), \(E(X)\) or \(\mu\): This is the long-term average value of the variable.
- Variance, \(Var(X)\) or \(\sigma^2\): This measures the spread or dispersion of the variable around the mean. A high variance means the values are very spread out.
1.1 The Effects of Scaling and Shifting
If we create a new variable \(W\) by transforming \(X\) (e.g., \(W = aX + b\)), here is how the mean and variance change:
Expected Value Rule:
\[E(aX + b) = a E(X) + b\]
(Example: If you double all scores (a=2) and add 5 (b=5), the average score also doubles and increases by 5.)Variance Rule:
\[Var(aX + b) = a^2 Var(X)\]
(Example: Shifting the data (adding b) does not change the spread, so \(b\) disappears. But scaling the data by \(a\) scales the spread by \(a^2\).)⚠ Accessibility Tip: Remember that constants only affect the mean (they shift the distribution), but they never affect the variance (they don't change how spread out the distribution is!).
2. Linear Combinations of Two Variables
A linear combination of two discrete random variables, \(X\) and \(Y\), is usually written in the form:
\[W = aX + bY + c\]
where \(a\), \(b\), and \(c\) are constants (coefficients).
Imagine a coffee shop where X is the daily sales of Lattes and Y is the daily sales of Cappuccinos. If Lattes cost £3 (a=3) and Cappuccinos cost £4 (b=4), the total revenue (W) is \(W = 3X + 4Y\).2.1 Calculating the Expected Value of a Combination
The rules for the expected value are delightfully simple. Whether \(X\) and \(Y\) are related (dependent) or not (independent), the rule is the same:
Rule for Expected Value of a Sum/Difference:
\[E(aX + bY + c) = a E(X) + b E(Y) + c\]
Key Takeaway: The expected value of a combination is just the combination of the expected values.
3. Introducing Dependence: Covariance and Correlation
When we move on to variance, things get complicated quickly—unless the variables are independent. We must consider how \(X\) and \(Y\) relate to each other.
3.1 What is Covariance?
Covariance, denoted \(Cov(X, Y)\), tells us the direction of the relationship between \(X\) and \(Y\).
- If \(Cov(X, Y) > 0\), then when \(X\) is high, \(Y\) tends to be high (and vice versa). They move together. (Positive relationship)
- If \(Cov(X, Y) < 0\), then when \(X\) is high, \(Y\) tends to be low (and vice versa). They move in opposite directions. (Negative relationship)
- If \(Cov(X, Y) = 0\), there is no linear relationship. This is always true if \(X\) and \(Y\) are independent.
⚠ Common Mistake Alert: If \(Cov(X, Y) = 0\), it means there is no *linear* relationship, but it doesn't automatically prove independence (though in most Further Maths contexts, if they are independent, you assume covariance is zero). Always check the problem context!
3.2 Correlation
The Correlation Coefficient (\(\rho\) or \(r\)) is a standardized measure derived from covariance. It tells you the strength and direction of the linear relationship, always falling between -1 and 1.
\[\rho = \frac{Cov(X, Y)}{\sqrt{Var(X)Var(Y)}}\]
The syllabus requires you to understand and apply these concepts, but typically you will be *given* the covariance or correlation value, or asked to find it using the definition formula, not derivations.
Did you know? Covariance is measured in the units of \(X\) multiplied by the units of \(Y\), which is often not intuitive. Correlation fixes this by turning the measure into a pure number between -1 and 1.
4. Variance of a Linear Combination (The Critical Formula)
Calculating the variance of a combination requires incorporating the covariance term if the variables are dependent. This is the general, universal rule:
4.1 The General Rule (Dependent Variables)
For two discrete random variables \(X\) and \(Y\), and a new variable \(W = aX + bY\):
\[Var(W) = Var(aX + bY) = a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X, Y)\]
If there is an added constant \(c\), it is ignored, as constants do not affect variance: \(Var(aX + bY + c) = Var(aX + bY)\).
The 2ab term is essential! It accounts for the way the variables amplify or cancel each other's spread.
- If \(Cov(X, Y)\) is positive, the variance of the combination increases (the spreads add up).
- If \(Cov(X, Y)\) is negative, the variance of the combination decreases (the variables tend to balance each other out).
Step-by-Step for Calculating Variance:
- Identify the coefficients \(a\) and \(b\).
- Square the coefficients for the variance terms: \(a^2 Var(X)\) and \(b^2 Var(Y)\).
- Calculate the covariance term: \(2ab Cov(X, Y)\).
- Add the three terms together.
4.2 The Rule for Independent Variables (The Simplification)
If \(X\) and \(Y\) are independent, their covariance must be zero: \(Cov(X, Y) = 0\). This simplifies the formula dramatically:
\[Var(aX + bY) = a^2 Var(X) + b^2 Var(Y)\]
Memory Aid (VAA): Variance Always Adds! When combining independent variables, you always add their variances (or their scaled variances, \(a^2 Var(X)\) and \(b^2 Var(Y)\)).
4.3 Handling Subtraction: \(Var(X - Y)\)
Subtraction is a common trap. Consider \(W = X - Y\). Here, \(a=1\) and \(b=-1\).
General Rule for Subtraction (Dependent):
\[Var(X - Y) = (1)^2 Var(X) + (-1)^2 Var(Y) + 2(1)(-1) Cov(X, Y)\]
\[Var(X - Y) = Var(X) + Var(Y) - 2 Cov(X, Y)\]
Rule for Subtraction (Independent):
If \(X\) and \(Y\) are independent, \(Cov(X, Y)=0\). The variances still add:
\[Var(X - Y) = Var(X) + Var(Y)\]
⚠ Crucial Point: Whether you add or subtract the variables, their variances always contribute positively to the total spread (because spread is measured by squares, \(a^2\) and \(b^2\)). If you subtract variables, you only subtract the covariance term.
🔖 Quick Review: The Variance Checklist
When calculating \(Var(aX + bY)\), ask yourself one question:
- Are X and Y independent?
- YES: Use \(a^2 Var(X) + b^2 Var(Y)\).
- NO: Use \(a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X, Y)\). (You must find or be given \(Cov(X, Y)\) first!)
5. Applications and Generalization
The rules extend smoothly when dealing with more than two variables or repeated identical variables.
5.1 Sum of \(n\) Independent Variables
If you have \(n\) independent and identically distributed random variables, \(X_1, X_2, \dots, X_n\), and you form their sum \(S = X_1 + X_2 + \dots + X_n\):
Expected Value:
\[E(S) = E(X_1) + E(X_2) + \dots + E(X_n) = n E(X)\]
(If you flip a fair coin 10 times, the expected number of heads is 10 times the expected number of heads in one flip.)Variance:
Since they are independent, the covariances are zero. We just add the variances:
\[Var(S) = Var(X_1) + Var(X_2) + \dots + Var(X_n) = n Var(X)\]
(This principle is crucial for understanding distributions like the Binomial distribution, which is just the sum of many independent Bernoulli trials.)
5.2 Example Application: Packaging Items
Let \(X\) be the weight of a bottle of juice (in kg), with \(E(X) = 1.2\) and \(Var(X) = 0.04\).
Let \(P\) be the weight of the packaging, a constant \(P = 0.05\) kg.
A crate holds 5 bottles. The total weight \(W\) is \(W = X_1 + X_2 + X_3 + X_4 + X_5 + 5P\). Assume the bottle weights are independent.
1. Find the Expected Total Weight:
\[E(W) = E(X_1) + \dots + E(X_5) + E(5P)\]
Since \(E(X) = 1.2\):
\[E(W) = 5(1.2) + 5(0.05) = 6.0 + 0.25 = 6.25 \text{ kg}\]
2. Find the Variance of the Total Weight:
Since the bottles are independent, and \(5P\) is a constant (variance is 0):
\[Var(W) = Var(X_1) + \dots + Var(X_5) + Var(5P)\]
\[Var(W) = 5(0.04) + 0 = 0.20\]
The standard deviation is \(\sqrt{0.20}\).
Key Takeaway: Always separate the independent variables (whose variances add up) from the constants (which affect the mean but not the variance).
6. A Note on Correlation and Decision Making
When solving application problems, understanding correlation is vital for interpreting the result, even if you don't calculate \(\rho\).
- If two variables you are adding (like investment returns) are positively correlated, the total risk (variance) of your combination is higher than if they were independent.
- If they are negatively correlated (e.g., when one product sells well, the other sells poorly), the covariance term is negative, leading to a smaller overall variance. This is why diversification (choosing negatively correlated assets) reduces risk!
Summary: The Power of the Formula
The linear combination rules allow us to model complex systems—whether they involve independent components (like a chain of manufacturing steps) or highly interconnected components (like the interdependent sales figures of two competing products)—using just the means, variances, and covariance of the individual parts.