Welcome to The Normal Distribution!

Hello future statisticians! You are about to dive into one of the most famous and fundamental concepts in all of statistics: The Normal Distribution.

Don't worry if this chapter seems tricky at first. It’s all about standardisation and using symmetry. We will break down every concept into clear, simple steps. By the end of this, you’ll be masters of the classic "bell curve"!

Why is the Normal Distribution important?

Many things in the real world naturally follow this distribution: the heights of adults, the scores on a large exam, the length of time taken to complete a task, and even measurement errors. If a variable is Normally distributed, we can accurately predict the probability of specific outcomes occurring.


1. Properties of the Normal Distribution

The Characteristics of \(X \sim N(\mu, \sigma^2)\)

When we say a random variable \(X\) is Normally distributed, we write it using the notation:

\(X \sim N(\mu, \sigma^2)\)

Here is what the symbols mean:

  • \(X\): The random variable (e.g., height, temperature, score).
  • \(N\): Stands for "Normal Distribution".
  • \(\mu\) (pronounced 'mu'): This is the mean (average) of the distribution. It controls the position of the centre of the curve.
  • \(\sigma^2\) (pronounced 'sigma squared'): This is the variance.
  • \(\sigma\): The square root of the variance, known as the standard deviation. This measures the spread or width of the curve.

!!! Common Mistake Alert !!!
Always check the notation! Sometimes the question gives you the variance (\(\sigma^2\)), and sometimes the standard deviation (\(\sigma\)). If you are given \(\sigma^2\), remember to take the square root to get \(\sigma\) before using the standardisation formula (Section 3).

Key Features of the Normal Curve

  1. Symmetry: The curve is perfectly symmetrical around the mean \(\mu\).
  2. Central Tendency: The mean, median, and mode are all equal and located at the centre peak of the curve.
  3. Bell Shape: It has a characteristic "bell" shape.
  4. Asymptotic: The tails of the curve never actually touch the horizontal axis, they extend indefinitely (though the probability becomes tiny very quickly).
  5. Area: The total area under the curve is always 1 (or 100%), representing the total probability.
Quick Review: Shape and Spread

If two normal distributions have the same mean \(\mu\), the one with the larger standard deviation (\(\sigma\)) will be flatter and wider, indicating the data is more spread out. The one with the smaller \(\sigma\) will be taller and narrower.


2. The Empirical Rule (68-95-99.7)

Because the Normal Distribution is standardised, we can always know certain probabilities based on the standard deviation (\(\sigma\)). This is sometimes called the Empirical Rule.

  • Approximately 68% of the data falls within 1 standard deviation of the mean (i.e., between \(\mu - \sigma\) and \(\mu + \sigma\)).
  • Approximately 95% of the data falls within 2 standard deviations of the mean (i.e., between \(\mu - 2\sigma\) and \(\mu + 2\sigma\)).
  • Approximately 99.7% of the data falls within 3 standard deviations of the mean (i.e., between \(\mu - 3\sigma\) and \(\mu + 3\sigma\)).

This rule is fantastic for quickly checking if your answer makes sense. If you calculate the probability of a value falling 4 standard deviations above the mean, you know it should be a very, very small probability!


3. Standardisation: The Z-Score

Imagine you have two different exams: Math (mean 70, standard deviation 5) and Physics (mean 60, standard deviation 10). If you score 75 on both, which result is better?

We can't compare the raw scores directly because the tests have different spreads. We need a standard measure. This is where the Z-score comes in!

What is a Z-score?

The Z-score (or standard score) tells us exactly how many standard deviations a particular value (\(X\)) is above or below the mean (\(\mu\)).

The formula for standardisation is:

\(Z = \frac{X - \mu}{\sigma}\)

  • If \(X\) is above the mean, \(Z\) will be positive.
  • If \(X\) is below the mean, \(Z\) will be negative.
  • If \(X\) equals the mean, \(Z\) will be 0.

Analogy: Think of the Z-score as a universal currency converter. No matter what the original distribution was (dollars, euros, points), standardisation converts it all into the universal Z-currency so we can use one common table to calculate probabilities.

Key Takeaway: Before you can use the Normal Distribution tables, you must convert your random variable \(X\) into a \(Z\)-score.


4. The Standard Normal Distribution \(Z \sim N(0, 1)\)

When we standardise any Normal variable \(X\), it becomes the variable \(Z\), which always follows the Standard Normal Distribution.

\(Z \sim N(0, 1)\)

This means the Standard Normal Distribution always has:

  • Mean \(\mu = 0\)
  • Variance \(\sigma^2 = 1\) (and Standard Deviation \(\sigma = 1\))

The probabilities for the Standard Normal Distribution are found using statistical tables (or calculators).

Understanding the Normal Distribution Tables

The tables provided in your exam materials give you the value of \(\Phi(z)\) (pronounced 'Phi of z').

\(\Phi(z) = P(Z \le z)\)

This is the probability that the standardised variable \(Z\) is less than or equal to a specific value \(z\). Crucially, the table only shows the area to the LEFT of the Z-score.

Because the Normal Distribution is continuous, remember that:

\(P(X < x) = P(X \le x)\)


5. Using Symmetry and the Tables

Since the tables only give us areas to the left of a positive \(Z\)-score, we must use the properties of symmetry and the fact that the total area is 1 to find other probabilities.

Case 1: Finding \(P(Z > z)\) (Area to the Right)

If you want the area to the right of \(z\), you need to subtract the area to the left (which the table gives you) from the total area (1).

\(P(Z > z) = 1 - P(Z < z) = 1 - \Phi(z)\)

Example: If the table gives \(P(Z < 1.5) = 0.9332\), then \(P(Z > 1.5) = 1 - 0.9332 = 0.0668\).

Case 2: Finding \(P(Z < -z)\) (Area in the Left Tail)

The tables usually don't list negative Z-scores, but we don't need them! Because the curve is symmetrical around zero:

The area far to the left of a negative value (\(P(Z < -z)\)) is exactly the same as the area far to the right of the corresponding positive value (\(P(Z > z)\)).

\(P(Z < -z) = P(Z > z) = 1 - \Phi(z)\)

Case 3: Finding \(P(Z > -z)\) (Area to the Right of a Negative Z)

This is the mirror image of Case 2. If you want the area to the right of a negative score (which is a large area, including the whole curve above 0):

\(P(Z > -z) = P(Z < z) = \Phi(z)\)

Case 4: Finding \(P(z_1 < Z < z_2)\) (Area Between Two Scores)

To find the area between two scores, calculate the area to the left of the bigger score and subtract the area to the left of the smaller score.

\(P(z_1 < Z < z_2) = P(Z < z_2) - P(Z < z_1)\)

Tip for Struggling Students: Draw a Sketch!

Always sketch the normal curve, mark the mean (0), and shade the area you are trying to find. This visual aid will immediately tell you whether your probability should be big (close to 1) or small (close to 0) and will guide your choice of formula (1 minus the table value, or just the table value).


6. Solving Complete Problems: Finding Probabilities

Step-by-Step Procedure

Suppose \(X \sim N(50, 4^2)\). Find \(P(X < 58)\).

Step 1: Identify Parameters.
\(\mu = 50\). \(\sigma^2 = 4^2 = 16\). Therefore, \(\sigma = 4\).

Step 2: Standardise the variable \(X\) into a \(Z\)-score.
Use \(Z = \frac{X - \mu}{\sigma}\).

\(Z = \frac{58 - 50}{4} = \frac{8}{4} = 2.00\)

So, \(P(X < 58)\) is the same as \(P(Z < 2.00)\).

Step 3: Look up the probability in the Normal Tables.
Find \(\Phi(2.00)\).

\(P(Z < 2.00) = 0.9772\)

Step 4: Check against the context (Optional but Recommended).
Since 58 is two standard deviations above the mean, the Empirical Rule tells us the area below it should be very large (more than 95%), so 0.9772 is a reasonable answer.


7. Inverse Problems: Finding X Given a Probability

Often, you are given a probability (a percentage or area) and asked to find the actual score or value \(X\) that corresponds to that area. These are often called Inverse Problems.

Step-by-Step Procedure for Inverse Problems

Suppose \(X \sim N(50, 4^2)\). Find the score \(x\) such that \(P(X > x) = 0.10\).

Step 1: Convert the required probability to the 'Area to the Left'.
The tables give \(P(Z < z)\). If \(P(X > x) = 0.10\), then the area to the left is \(P(X < x) = 1 - 0.10 = 0.90\).

Step 2: Use the Inverse Tables (or the main table backwards) to find the Z-score (\(z\)).
We are looking for \(z\) such that \(\Phi(z) = 0.90\). Looking up 0.9000 in the main body of the tables gives approximately \(z = 1.282\). (Since the probability 0.90 is greater than 0.5, we know the Z-score must be positive).

Step 3: Convert the Z-score back into the original score \(X\).
Rearrange the standardisation formula:

\(X = \mu + Z\sigma\)

Substitute the values: \(\mu = 50\), \(\sigma = 4\), \(Z = 1.282\).

\(X = 50 + (1.282)(4)\)
\(X = 50 + 5.128 = 55.128\)

Step 4: Conclusion.
The score \(x\) required is 55.13 (3 s.f.).

Handling Inverse Problems with Negative Z-scores

What if the question asked for the score \(x\) such that \(P(X < x) = 0.10\)?

The area to the left is 0.10. Since 0.10 is less than 0.5, the score \(x\) must be below the mean (\(Z\) must be negative).

1. We look up the area to the right, \(1 - 0.10 = 0.90\), to find the magnitude of the Z-score, \(z_0 = 1.282\). 2. Because our required probability (0.10) is in the left tail, the actual Z-score we need is negative: \(Z = -1.282\). 3. Calculate \(X\): \(X = 50 + (-1.282)(4) = 50 - 5.128 = 44.872\).

Memory Aid: If \(P < 0.5\), \(Z\) is negative. If \(P > 0.5\), \(Z\) is positive.


Final Summary and Key Takeaways

You have successfully tackled the Normal Distribution! Remember these essential facts:

  • The notation is \(X \sim N(\mu, \sigma^2)\). Be careful with variance (\(\sigma^2\)) vs. standard deviation (\(\sigma\)).
  • To solve any problem, you must standardise using \(Z = \frac{X - \mu}{\sigma}\).
  • The Standard Normal Distribution \(Z \sim N(0, 1)\) is what the tables measure.
  • The tables give you the area to the left, \(\Phi(z) = P(Z < z)\).
  • Use symmetry and \(1 - \Phi(z)\) to find areas outside of the direct table lookup.
  • For inverse problems, find the Z-score first, then convert back to \(X\) using \(X = \mu + Z\sigma\).

Keep practicing your standardisation and symmetry rules. You've got this!