Study Notes: S2.4 The Normal Distribution

Hello! Welcome to the Normal Distribution. If you want to understand how things like human height, test scores, or manufacturing errors are naturally distributed in the real world, this is the chapter for you. It's one of the most important concepts in all of statistics because it models so many real-life phenomena.

Don't worry if the tables and formulas look intimidating at first. We’ll break down every step, focusing on the core ideas of symmetry and standardization. Once you master the Z-score, everything else falls into place!

1. Defining the Normal Distribution

1.1 Key Characteristics and Notation

The Normal Distribution is used for continuous random variables. Unlike discrete variables (like counts), continuous variables can take any value within a range (like height or time).

  • The shape of the distribution is a distinct, symmetrical, bell-shaped curve.
  • It is completely defined by two parameters: the mean (\(\mu\)) and the variance (\(\sigma^2\)).
  • It is symmetrical about the mean, meaning Mean = Median = Mode.
  • The tails of the curve extend infinitely, but the probability quickly approaches zero.

Key Notation:

If a random variable \(X\) follows a Normal Distribution with mean \(\mu\) and variance \(\sigma^2\), we write:

\( X \sim N(\mu, \sigma^2) \)

Common Mistake Warning!
Always remember that the second number in the notation \( N(\mu, \sigma^2) \) is the variance (\(\sigma^2\)). If the question gives you the standard deviation (\(\sigma\)), you must square it before using the notation or specific formulas, and you must use \(\sigma\) (the square root) when calculating the Z-score.

1.2 Properties and the Empirical Rule

The total area under the curve is 1 (or 100%), representing the total probability.

The syllabus requires knowledge of how the data is spread relative to the standard deviation (\(\sigma\)):

  • Approximately \(\frac{2}{3}\) of the observations (about 68%) lie within one standard deviation of the mean: \( \mu \pm \sigma \).
  • Approximately 95% lie within two standard deviations: \( \mu \pm 2\sigma \).
  • Almost all (about 99.7%) lie within three standard deviations: \( \mu \pm 3\sigma \).

Did you know? The Normal Distribution is often called the "Gaussian Distribution" after the mathematician Carl Friedrich Gauss.

Quick Review: The Normal Curve
  • Shape: Bell-shaped and Symmetrical.
  • Defined by: Mean (\(\mu\)) and Variance (\(\sigma^2\)).
  • Area: Total area under the curve = 1.

2. Standardization: The Z-Transformation

Since a Normal Distribution can have any mean and any standard deviation, we can't create a table for every possibility. Instead, we use a trick: we convert every Normal variable \(X\) into a Standard Normal Variable, \(Z\).

2.1 The Standard Normal Distribution

The Standard Normal Distribution is a specific normal distribution where the mean is 0 and the variance is 1.

\( Z \sim N(0, 1) \)

2.2 Calculating the Z-Score

The Z-score tells us exactly how many standard deviations an observation (\(X\)) is away from the mean (\(\mu\)).

The Z-Transformation Formula:

\( Z = \frac{X - \mu}{\sigma} \)

Where:
\(X\) is the observation value.
\(\mu\) is the mean.
\(\sigma\) is the standard deviation (NOT variance).

Step-by-Step Standardization:

  1. Identify the known values: \(X\), \(\mu\), and \(\sigma\).
  2. Calculate the difference between the observation and the mean: \( X - \mu \).
  3. Divide this difference by the standard deviation: \(\frac{X - \mu}{\sigma}\).
  4. The resulting \(Z\)-value (rounded to two decimal places, as per syllabus guidance) is what you use with the tables.

Analogy: Think of the Z-score as a universal language. No matter whether you are measuring height (in cm) or weight (in kg), the Z-score translates the measurement into a standard unit (how far from average).

3. Calculating Probabilities using Tables

The Standard Normal Distribution tables (often called the \(\Phi\) table) give us the area under the curve to the left of a given Z-score. This is written as \(\Phi(z)\), which means \( P(Z < z) \).

3.3 Rules of Symmetry and Area

Because the Normal Distribution is perfectly symmetrical, we can use the tables to find any probability we need, even those involving negative Z-scores or areas to the right.

Case 1: Area to the Left (Direct Reading)

For \( P(Z < z) \) where \(z\) is positive, read the value \(\Phi(z)\) directly from the table.

Case 2: Area to the Right

The total area is 1. If we want the area to the right of \(z\), we subtract the area to the left from 1:

\( P(Z > z) = 1 - P(Z < z) = 1 - \Phi(z) \)

Case 3: Negative Z-Scores

If we have a negative Z-score, say \(-z\), the area to its left is the same as the area to the right of \(z\), by symmetry.

\( P(Z < -z) = P(Z > z) = 1 - \Phi(z) \)

Case 4: Area Between Two Z-Scores

To find the probability between two scores \(a\) and \(b\):

\( P(a < Z < b) = P(Z < b) - P(Z < a) = \Phi(b) - \Phi(a) \)

Memory Aid for Probability Calculations

Always draw a sketch! Shade the area you want. This immediately shows you whether you need \( \Phi(z) \), \( 1 - \Phi(z) \), or a subtraction.

  • Left Area: \(\Phi(z)\)
  • Right Area: \(1 - \Phi(z)\)
  • Inner Area (Symmetrical): \( \Phi(z) - \Phi(-z) = 2\Phi(z) - 1 \)

4. Reverse Problems: Finding Unknown Parameters

Sometimes, the probability (area) is given, and you need to find the specific value of \(X\) or the unknown mean (\(\mu\)) or standard deviation (\(\sigma\)).

Step-by-Step for Reverse Problems:

  1. Find the Critical Z-score: Use the given probability (e.g., the top 10% or the middle 50%) to look up the corresponding Z-score in the table (you may use the special Percentage Points Table for common critical values).
  2. Determine the Sign: If the area relates to a value below the mean, the Z-score must be negative. If it's above the mean, the Z-score is positive.
  3. Use the Formula: Substitute the Z-score, along with any known values for \(X\), \(\mu\), or \(\sigma\), into the standardization formula: \( Z = \frac{X - \mu}{\sigma} \).
  4. Solve: Solve the resulting equation for the unknown parameter.

Example Scenario: If you know that 90% of students scored below 75 marks, you use 0.90 to find the corresponding Z-score (\(z\)), then set up: \( z = \frac{75 - \mu}{\sigma} \).

5. Sums and Differences of Independent Normal Variables

This section deals with combining two or more independent random variables that are normally distributed. This is a powerful concept used when, for instance, calculating the total weight of two randomly selected components.

If \( X_1 \sim N(\mu_1, \sigma_1^2) \) and \( X_2 \sim N(\mu_2, \sigma_2^2) \) are independent, then their sum or difference is also Normally Distributed.

5.1 Combining Means (Expectation)

The mean of the sum or difference is simply the sum or difference of the individual means.

For a sum: \( E(X_1 + X_2) = \mu_1 + \mu_2 \)

For a difference: \( E(X_1 - X_2) = \mu_1 - \mu_2 \)

5.2 Combining Variances (The Golden Rule)

When dealing with independent normal variables, the variances always add, regardless of whether you are finding the probability of a sum (\(X_1 + X_2\)) or a difference (\(X_1 - X_2\)).

For both sum and difference:

\( Var(X_1 \pm X_2) = Var(X_1) + Var(X_2) = \sigma_1^2 + \sigma_2^2 \)

Crucial Point: When calculating the combined distribution for a difference (e.g., \( X_1 - X_2 \)), you subtract the means but you still add the variances. You must then take the square root of the combined variance to find the new standard deviation (\(\sigma_{new}\)) for your Z-score calculation.

5.3 The Resulting Distribution

If \( X_1 \) and \( X_2 \) are independent and Normally distributed:

For Sum: \( X_1 + X_2 \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) \)

For Difference: \( X_1 - X_2 \sim N(\mu_1 - \mu_2, \sigma_1^2 + \sigma_2^2) \)

Key Takeaway

The Normal Distribution is defined by its mean and variance. The entire key to solving normal distribution problems is the Z-score transformation, which allows you to use the standard tables. Remember the symmetry rules for finding probabilities and the rule that variance always adds when combining independent normal variables, whether you are summing or subtracting them.