Welcome to The Normal Distribution!

Hey there, Statistician! Get ready to dive into perhaps the most important distribution in all of statistics: The Normal Distribution. Sometimes called the Gaussian distribution, it describes countless phenomena in the real world, from human height and IQ scores to manufacturing errors and measurement tolerances.

Understanding this chapter is crucial because it gives you the tools to model continuous data and calculate probabilities for almost anything that follows that famous "bell curve" shape. Don's worry if some of the concepts seem theoretical—we'll break them down step-by-step, making sure you ace those table look-ups!


1. Defining the Normal Distribution

1.1 Characteristics of the Bell Curve

The Normal distribution is a continuous probability distribution defined by two key parameters.

Imagine measuring the height of every student in a large school. Most students cluster around the average height, and very few are extremely short or extremely tall. When you plot this data, you get the characteristic bell shape.

  • Symmetry: The curve is perfectly symmetrical around its center.
  • Mean = Median = Mode: The highest point of the curve (the mode) is also the average (mean) and the middle value (median).
  • Asymptotic: The curve approaches the horizontal axis but never actually touches it (it extends to infinity in both directions).
  • Total Area: The total area under the curve is exactly 1 (since the total probability must equal 1).

1.2 Normal Distribution Notation

We use a special notation to describe a variable \(X\) that follows a Normal distribution:

$$X \sim N(\mu, \sigma^2)$$

Let's break down those parameters:

  1. \(\mu\) (Mu): This is the mean (or average) of the distribution. It determines the center of the bell curve.
  2. \(\sigma^2\) (Sigma Squared): This is the variance. It measures the spread of the data.
  3. \(\sigma\) (Sigma): The square root of the variance, the standard deviation. This determines the shape of the curve. A larger \(\sigma\) means the curve is wider and flatter (more spread out).

Quick Tip: Always remember the notation uses variance (\(\sigma^2\)), but almost all calculations use the standard deviation (\(\sigma\)). If you are given \(\sigma^2 = 25\), you must use \(\sigma = 5\) in your formulas!


Key Takeaway: The Normal distribution is symmetrical and defined entirely by its mean (\(\mu\)) and variance (\(\sigma^2\)).



2. The Standard Normal Distribution (Z-Scores)

2.1 Why Standardize?

Imagine comparing a height of 180 cm (from a population with mean 170 cm) to an IQ score of 115 (from a population with mean 100). These are different variables with different means and standard deviations. How can we compare them objectively?

We need a universal scale! This scale is the Standard Normal Distribution.

The Standard Normal Distribution, often denoted by the variable \(Z\), is a special Normal distribution where:

  • The mean is 0: \(\mu = 0\)
  • The standard deviation is 1: \(\sigma = 1\) (and variance \(\sigma^2 = 1\))

We write this as: $$Z \sim N(0, 1)$$

2.2 The Z-Score Formula (Standardisation)

Standardisation is the process of converting any Normal variable \(X\) into the standard variable \(Z\).

The formula for the Z-score (also called the standardised score) is:

$$Z = \frac{X - \mu}{\sigma}$$

What does the Z-score tell you?
The Z-score tells you exactly how many standard deviations (\(\sigma\)) the value \(X\) is away from the mean (\(\mu\)).

Example: If a student scores \(X=80\) on a test where \(\mu=60\) and \(\sigma=10\).
$$Z = \frac{80 - 60}{10} = 2$$ This means the student scored 2 standard deviations above the mean.

Analogy: Think of Z-scores as a universal measuring tape. If you standardise two scores, you can directly compare how extreme they are relative to their own population's average and spread.


Key Takeaway: Standardisation converts any X value into a Z-score, allowing us to use the universal Z-tables to find probabilities. \(Z = \frac{X - \mu}{\sigma}\).



3. Using the Standard Normal Tables

Once you have standardized your value \(X\) to a Z-score, you look up the probability in the Normal Distribution tables provided in your exam formula booklet.

3.1 Understanding the Tables

The tables give the probability that a standardized variable \(Z\) is less than or equal to a positive value \(z\). This is often denoted as \(\Phi(z)\) (Phi of z).

The shaded area shown in the table's diagram always represents: $$P(Z \le z)$$

Important Rule: The table only works for positive Z-scores (\(z \ge 0\)). We use the symmetry of the curve for negative Z-scores.

3.2 The Symmetry Rules (Essential!)

When calculating probabilities, you will encounter three main situations:

Case 1: Finding the area BELOW a positive Z-score.

$$P(Z \le z)$$

Action: Look up \(z\) directly in the table.

Case 2: Finding the area ABOVE a positive Z-score.

$$P(Z > z)$$

Because the total area under the curve is 1, this is the remaining area:

Action: Calculate \(1 - P(Z \le z)\).

Memory Aid: "Greater than means 1 minus."

Case 3: Finding the area BELOW a negative Z-score.

$$P(Z \le -z)$$

Because the curve is symmetrical, the area in the far left tail is the same as the area in the far right tail:

$$P(Z \le -z) = P(Z > z)$$

Action: Use the rule for Case 2: \(1 - P(Z \le z)\).

Case 4: Finding the area BETWEEN two Z-scores.

$$P(a < Z < b)$$

Action: Find the area up to \(b\) and subtract the area up to \(a\): $$P(Z \le b) - P(Z \le a)$$

Common Mistake to Avoid: Not drawing a sketch! Always draw a quick sketch of the bell curve, shade the area you need, and decide which rule (1, 2, or 3) applies. This prevents sign errors.


Worked Example of Symmetry

Suppose you calculate \(z = 1.50\). Find \(P(Z > 1.50)\).

1. Look up \(P(Z \le 1.50)\) in the table: \(0.9332\).
2. Apply Case 2: \(P(Z > 1.50) = 1 - P(Z \le 1.50) = 1 - 0.9332 = 0.0668\).

Find \(P(Z < -1.50)\).

1. Apply Case 3: \(P(Z < -1.50) = P(Z > 1.50)\).
2. This is the same result: \(0.0668\).


Key Takeaway: The tables only give \(P(Z \le z)\). Use the total probability of 1 and the symmetry of the curve to find probabilities for areas above \(z\) or below \(-z\).



4. Inverse Normal Calculations (Working Backwards)

Sometimes you are given the probability (the area) and asked to find the actual value \(X\) or the standardized score \(Z\). This is called the Inverse Normal Distribution problem.

4.1 Step-by-Step Inverse Process

1. Draw and Adjust: Sketch the curve and shade the given probability. Determine the positive Z-score (\(z\)) corresponding to the area from the left (the value that the table gives). If the given probability is in the tail, you must adjust it (often using 1 minus) so that it matches the format \(P(Z \le z)\).

2. Find \(z\): Use the Inverse Normal table (or the main table in reverse) to find the Z-score, \(z\), that corresponds to that cumulative probability.

3. Determine Sign: If the required value \(X\) is below the mean, the Z-score must be negative (e.g., \(Z = -z\)). If \(X\) is above the mean, it must be positive (\(Z = +z\)).

4. Un-standardise: Use the standardization formula rearranged to find \(X\): $$\mathbf{X = \mu + Z\sigma}$$


Example: Finding X

The masses of apples are normally distributed with \(\mu=150\)g and \(\sigma=10\)g. Find the mass \(k\) such that 90% of apples weigh less than \(k\).

1. Draw and Adjust: We are looking for \(P(X < k) = 0.90\). Since 0.90 is greater than 0.5, \(k\) must be above the mean, so \(Z\) is positive.

2. Find \(z\): Look inside the table for the probability closest to 0.9000.
(Using standard tables, 0.9000 corresponds to \(z \approx 1.28\)).

3. Determine Sign: Since \(k\) is above the mean, \(Z = +1.28\).

4. Un-standardise: $$k = \mu + Z\sigma$$ $$k = 150 + (1.28)(10)$$ $$k = 150 + 12.8 = 162.8 \text{ grams}$$

Did You Know? The Z-scores \(Z=1.645\) (for 95% cumulative area) and \(Z=2.326\) (for 99% cumulative area) are very common and often appear in the critical values table section of your formula sheet.


Key Takeaway: Inverse problems involve finding the Z-score first using the table (based on the cumulative probability), and then converting back to the original units using \(X = \mu + Z\sigma\).



5. Solving for Unknown Parameters (\(\mu\) or \(\sigma\))

The most challenging type of Normal distribution problem involves finding the unknown mean \(\mu\) or the unknown standard deviation \(\sigma\), or both. These problems almost always require setting up equations using the Z-score formula.

5.1 The Two-Point Problem

If you are asked to find both \(\mu\) and \(\sigma\), you must be given two pieces of probability information (two different X-scores and their corresponding probabilities).

Step-by-Step Process:

  1. Standardise the first point: For the first piece of information (\(X_1\)), convert the given probability into a Z-score (\(Z_1\)) using the Inverse Normal method (Symmetry rules and tables).
  2. Form Equation 1: Substitute \(X_1\), \(Z_1\), \(\mu\), and \(\sigma\) into the Z-score formula, rearranged: $$X_1 = \mu + Z_1\sigma$$
  3. Standardise the second point: Repeat the process for the second piece of information (\(X_2\)) to find \(Z_2\).
  4. Form Equation 2: $$X_2 = \mu + Z_2\sigma$$
  5. Solve Simultaneously: Solve the two linear equations for the two unknowns, \(\mu\) and \(\sigma\).

Encouragement: Don't worry if the signs of your Z-scores look complicated—just be extremely careful. Remember that any score \(X\) below the mean must result in a negative Z-score, and any score \(X\) above the mean must result in a positive Z-score.

5.2 Common Error: Z-Score Sign

If you are told that 5% of scores are less than 12 (i.e., \(P(X < 12) = 0.05\)):

  • Since 0.05 is a small probability (< 0.5), the score \(X=12\) is in the left tail, meaning it is below the mean.
  • The Z-score corresponding to a cumulative area of 0.05 must be negative.
  • (If \(P(Z < -z) = 0.05\), then \(P(Z > z) = 0.05\). The positive z-score is 1.645. Therefore, the required Z-score is \(Z = -1.645\).)
  • Your equation must be: \(12 = \mu - 1.645\sigma\)

Quick Review: Normal Distribution Checklist
  1. Identify \(\mu\) and \(\sigma\) (careful with variance vs. standard deviation).
  2. Draw a sketch! (Essential for visualising the area and determining Z-score signs).
  3. Standardise: Convert \(X\) to \(Z\) using \(Z = \frac{X - \mu}{\sigma}\).
  4. Use Tables: Adjust the probability using symmetry (1 minus rules) if necessary.
  5. Inverse Problems: Go from Probability \(\to\) Z-score \(\to\) X.
  6. Parameter Problems: Set up simultaneous equations \(X = \mu + Z\sigma\).