Study Notes: 5.5 The Normal Distribution
Hello! Welcome to the final chapter of Probability & Statistics 1. The Normal Distribution is perhaps the most important continuous distribution in all of statistics. Why? Because so many things in the real world—from people's heights to measurement errors—follow this pattern. Mastering this topic means you can solve complex problems about these real-world variables. Don't worry if this seems tricky at first; we'll break it down step by step!
1. Understanding the Normal Distribution \(X \sim N(\mu, \sigma^2)\)
The Normal Distribution is used to model a Continuous Random Variable (CRV). A CRV is a variable that can take any value within a given range (e.g., height, temperature, time).
The Bell Curve
The graph of the Normal Distribution is often called the bell curve because of its distinctive shape.
- It is perfectly symmetrical around the mean, \(\mu\).
- The mean (\(\mu\)), median, and mode all occur at the same central point.
- The total area under the curve is always 1 (representing 100% probability).
Notation and Parameters
We describe a normal distribution using two key parameters:
The notation for a random variable \(X\) that is normally distributed is:
$$X \sim N(\mu, \sigma^2)$$
- \(\mu\) (Mu): This is the mean (or expectation). It locates the centre of the curve.
- \(\sigma^2\) (Sigma squared): This is the variance. It measures the spread of the data.
- \(\sigma\) (Sigma): This is the standard deviation. It is the square root of the variance and is often easier to interpret regarding spread.
Key Takeaway: The mean \(\mu\) tells you where the bell curve is centered, and the standard deviation \(\sigma\) tells you how wide or narrow it is.
2. The Standard Normal Distribution (\(Z\))
Every normal distribution looks slightly different depending on its \(\mu\) and \(\sigma\). To avoid needing infinite tables, we convert any Normal variable \(X\) into a standard form, called the Standard Normal Distribution.
The standard normal random variable is denoted by \(Z\).
$$Z \sim N(0, 1)$$This distribution has a mean \(\mu = 0\) and a variance \(\sigma^2 = 1\).
Using the Normal Distribution Tables (\(\Phi(z)\))
The tables provided in the MF19 booklet give values for the standard normal distribution, denoted by \(\Phi(z)\).
\(\Phi(z) = P(Z < z)\)
This means the table always gives you the area (probability) to the left of a given \(Z\)-value.
Memory Aid: Think of \(\Phi\) (Phi) as the cumulative probability—it gathers the probability up to that point.
3. Standardising: The Z-Formula
To turn a variable \(X\) from any normal distribution \(N(\mu, \sigma^2)\) into a standard variable \(Z\), we use the process of Standardisation:
$$Z = \frac{X - \mu}{\sigma}$$- \((X - \mu)\) calculates the distance of \(X\) from the mean.
- Dividing by \(\sigma\) measures this distance in terms of standard deviations.
Example Analogy: Imagine a score of 70 on a test. Is that good? It depends!
If the mean (\(\mu\)) is 50 and the standard deviation (\(\sigma\)) is 10:
\(Z = \frac{70 - 50}{10} = 2\). The score is 2 standard deviations above average—very good!
The \(Z\)-score tells you exactly how far above or below the average any score is.
Step-by-Step Standardisation Process:
- Identify \(\mu\) and \(\sigma\) (Remember: \(\sigma\) is the square root of the variance \(\sigma^2\)).
- Identify the \(x\)-value you are interested in.
- Use the formula \(Z = \frac{X - \mu}{\sigma}\) to convert \(X\) into \(Z\).
- Sketch the curve! This is crucial for determining which area to calculate.
Quick Review: Important Properties
Because the Normal Distribution is continuous:
$$P(X < x) = P(X \leq x)$$
The probability of hitting an exact single value is always zero.
4. Solving Probability Problems using Z-Tables
When solving problems, you must always translate the required area on the original \(X\) curve into an area on the \(Z\) curve that you can look up in the \(\Phi(z)\) table.
Scenario 1: \(P(Z < a)\) where \(a > 0\) (Area to the left)
This is a direct table look-up:
$$P(Z < a) = \Phi(a)$$
Scenario 2: \(P(Z > a)\) where \(a > 0\) (Area to the right)
Since the total area is 1, the area to the right is 1 minus the area to the left:
$$P(Z > a) = 1 - P(Z < a) = 1 - \Phi(a)$$
Scenario 3: \(P(Z < -a)\) where \(-a < 0\) (Area to the left of a negative value)
The tables only show positive \(Z\). Due to symmetry, the area to the left of \(-a\) is the same as the area to the right of \(a\).
$$P(Z < -a) = P(Z > a) = 1 - \Phi(a)$$
(Syllabus Note: The tables specify this relationship: \( \Phi(-z) = 1 - \Phi(z) \))
Scenario 4: \(P(Z > -a)\) where \(-a < 0\) (Area to the right of a negative value)
Due to symmetry, the area to the right of \(-a\) covers the entire positive half plus the area $P(0 < Z < a)$. This is equivalent to the entire area $P(Z < a)$:
$$P(Z > -a) = P(Z < a) = \Phi(a)$$
Scenario 5: \(P(a < Z < b)\) (Area between two values)
Subtract the cumulative probability of the smaller value from the cumulative probability of the larger value.
$$P(a < Z < b) = P(Z < b) - P(Z < a) = \Phi(b) - \Phi(a)$$
Crucial Tip for Struggling Students: Always sketch the bell curve and shade the required region. This visually confirms which formula (1 - \(\Phi\), or just \(\Phi\)) you need to use.
Key Takeaway: All normal probability problems rely on manipulating the desired area until you can express it using the basic cumulative function \(\Phi(z)\).
5. Reverse Normal Distribution Problems
Sometimes, you are given the probability (the area) and asked to find the corresponding value of \(X\) or the parameters \(\mu\) or \(\sigma\). This is called Reverse Standardisation.
Step-by-Step Reverse Process:
- Find the \(Z\)-score: Use the given probability (area) and the Z-table (working backwards) to find the corresponding \(z\)-value.
- Determine the Sign:
- If the probability area given is less than 0.5, the \(z\)-value must be negative.
- If the probability area given is greater than 0.5, the \(z\)-value must be positive.
- Use the Standardisation Formula: Substitute the known values into \(Z = \frac{X - \mu}{\sigma}\) and solve for the unknown parameter (\(X\), \(\mu\), or \(\sigma\)).
Example: If \(P(X < x_1) = 0.1587\). Since 0.1587 is less than 0.5, \(x_1\) must be to the left of the mean, and its corresponding \(Z\)-score, \(z_1\), must be negative.
We look up the area \(1 - 0.1587 = 0.8413\) in the table. This gives \(z = 1.00\).
Therefore, the actual \(Z\)-score for \(x_1\) is \(\mathbf{z_1 = -1.00}\).
Common Mistake to Avoid: Not adjusting the sign of the Z-score in reverse problems when the probability is less than 0.5.
6. Normal Approximation to the Binomial Distribution
The Normal Distribution, being continuous, can sometimes be used to estimate probabilities for the Binomial Distribution, which is discrete. This is useful when the number of trials, \(n\), is very large, making direct calculation difficult.
Conditions for Approximation
The normal approximation to the Binomial distribution \(X \sim B(n, p)\) is considered appropriate when:
- The number of trials, \(n\), is large.
- Both \(\mathbf{np > 5}\) and \(\mathbf{nq > 5}\) (where \(q = 1 - p\)).
Parameters for the Approximation
If the conditions are met, we approximate \(X\) using the Normal Distribution \(N(\mu, \sigma^2)\) with:
$$ \mu = np $$
$$ \sigma^2 = npq $$
The Continuity Correction (CC) - THIS IS ESSENTIAL!
Since we are switching from a discrete distribution (Binomial, where outcomes are integers) to a continuous one (Normal), we must apply a Continuity Correction.
The continuity correction involves adjusting the boundary of the integer value by 0.5. Think of each integer \(x\) as covering the interval from \((x - 0.5)\) to \((x + 0.5)\) on the continuous scale.
Summary of Continuity Corrections:
| Discrete Binomial Probability | Continuous Normal Approximation |
|---|---|
| \(P(X = x)\) | \(P(x - 0.5 < X < x + 0.5)\) |
| \(P(X \leq x)\) | \(P(X < x + 0.5)\) |
| \(P(X < x)\) | \(P(X < x - 0.5)\) |
| \(P(X \geq x)\) | \(P(X > x - 0.5)\) |
| \(P(X > x)\) | \(P(X > x + 0.5)\) |
Example: Suppose you want to find the probability of exactly 10 successes, \(P(X=10)\).
On the continuous scale, "exactly 10" is represented by the interval from 9.5 up to 10.5.
Approximation: \(P(9.5 < X < 10.5)\).
Example: If a question asks for "less than 15", meaning \(X \leq 14\).
The largest integer allowed is 14. The continuous boundary must go up to 14.5.
Approximation: \(P(X < 14.5)\).
Step-by-Step Approximation Process:
- Check the conditions: \(np > 5\) and \(nq > 5\).
- Calculate \(\mu = np\) and \(\sigma^2 = npq\).
- Apply the Continuity Correction to the required integer boundary (add or subtract 0.5).
- Standardise using \(Z = \frac{X - \mu}{\sigma}\).
- Solve using the \(Z\)-tables as in Section 4.
Did you know? The normal distribution appears so frequently because of the Central Limit Theorem (a topic for Paper 6/S2), which basically states that the sum or average of many independent random variables will tend to follow a normal distribution, regardless of the individual variables' original distributions!
Key Takeaway: When approximating Binomial with Normal, remember the two crucial steps: calculate the correct \(\mu\) and \(\sigma^2\) from the Binomial parameters, and ALWAYS apply the continuity correction (adding or subtracting 0.5).