Introduction: Guessing with Confidence!
Hi everyone! Ever wondered about the average height of all Form 6 students in Hong Kong, or the average amount of time people spend on Instagram each day? It's impossible to ask every single person, right? That would take forever!
So, what do we do? We take a sample (a smaller group) and calculate its average. This is called a point estimate. But here's the catch: our sample's average is probably not exactly the same as the true average of the whole population. It's just a single, best guess.
This is where Confidence Intervals come to the rescue! Instead of giving just one number, we create a range of values and say, "We are pretty confident the true average is somewhere in this range." It's like switching from trying to hit a tiny target with one dart to throwing a hoop over it. Much better, right?
In this chapter, you will learn:
- The difference between a point estimate and an interval estimate.
- What a "confidence level" really means (it's not what you might think!).
- How to calculate a confidence interval for the population mean (μ) in two different situations.
Don't worry if this seems tricky at first. We'll break it down step-by-step with simple examples. Let's get started!
Section 1: The Building Blocks - Parameters vs. Statistics
Quick Recap: Population vs. Sample
Before we build our intervals, let's remember these crucial terms. Think of making a large pot of soup...
- Population: This is the entire group you are interested in.
Example: All the soup in the pot.
We use Greek letters for population values, called parameters:- $$ \mu $$ (mu) = population mean
- $$ \sigma $$ (sigma) = population standard deviation
- Sample: This is a small part of the population that you actually collect data from.
Example: The spoonful of soup you taste to check the seasoning.
We use regular letters for sample values, called statistics:- $$ \bar{x} $$ ("x-bar") = sample mean
- $$ s $$ = sample standard deviation
Point Estimates: Our Best Single Guess
When we use a sample statistic to estimate a population parameter, it's called a point estimate. It's our single best guess.
- We use the sample mean $$ \bar{x} $$ as a point estimate for the population mean $$ \mu $$.
- We use the sample variance $$ s^2 $$ as a point estimate for the population variance $$ \sigma^2 $$.
The Problem: A point estimate is almost never exactly correct! The spoonful of soup might be slightly saltier or less salty than the whole pot. We need a way to account for this uncertainty.
Key Takeaway
We use sample statistics ($$ \bar{x}, s $$) to estimate unknown population parameters ($$ \mu, \sigma $$). A single guess ($$ \bar{x} $$) is a point estimate, but a range of values (a confidence interval) is much more informative.
Section 2: What is a Confidence Interval?
The Fishing Net Analogy
Imagine the true population mean, $$ \mu $$, is a single, invisible fish swimming in a huge lake.
- A point estimate ($$ \bar{x} $$) is like trying to catch the fish with a spear. You have to be incredibly accurate (and lucky!) to hit it. You will probably miss.
- A confidence interval is like using a fishing net. You cast your net in the area where you think the fish is. You might not know its exact location, but you can be pretty confident you've caught it inside your net!
The confidence interval gives us a range of plausible values for the true population mean $$ \mu $$.
Understanding the Confidence Level
You'll see phrases like "a 95% confidence interval". What does 95% actually mean?
This is a very common point of confusion, so read carefully!
Incorrect Meaning: "There is a 95% probability that the true mean $$ \mu $$ is inside my calculated interval." (This is wrong because once you've calculated an interval, the true mean is either in it or it's not. The probability is 1 or 0).
Correct Meaning: "I am 95% confident in the method I used to create this interval."
Let's go back to the fishing net analogy. A 95% confidence level means that if we took 100 different random samples from the population and created 100 different "nets" (intervals), we would expect about 95 of those nets to successfully capture the true mean $$ \mu $$.
Confidence Level: The success rate of the method (e.g., 90%, 95%, 99%).
Significance Level ($$ \alpha $$): The failure rate of the method. It's simply $$ 1 - \text{Confidence Level} $$.
- For a 95% confidence level, $$ \alpha = 1 - 0.95 = 0.05 $$.
- For a 99% confidence level, $$ \alpha = 1 - 0.99 = 0.01 $$.
Key Takeaway
A confidence interval is a range estimate for $$ \mu $$. The confidence level tells us how reliable our interval-building procedure is over many repeated samples.
Section 3: Constructing the Confidence Interval - The Formula!
The General Structure
All confidence intervals for a mean have the same basic structure. It's a formula you should memorize!
Confidence Interval = Point Estimate ± Margin of Error
Let's break that down:
- Point Estimate: Our best guess for $$ \mu $$, which is the sample mean $$ \bar{x} $$.
- Margin of Error (E): How much "give or take" we add to our point estimate to create the range. It determines the width of our interval.
The Margin of Error itself has a formula:
Margin of Error (E) = (Critical Value) × (Standard Error of the Mean)
Finding the Critical Value ($$ z_{\alpha/2} $$)
The critical value is a z-score from the standard normal distribution. It's determined by your confidence level. We write it as $$ z_{\alpha/2} $$ because the "error" probability $$ \alpha $$ is split equally between the two tails of the normal curve.
You don't need to calculate these from scratch every time. Just memorize the common ones!
Quick Review: Common Critical Values
- For 90% confidence: $$ \alpha = 0.10 $$, $$ \alpha/2 = 0.05 $$. The critical value is $$ z_{0.05} \approx 1.645 $$
- For 95% confidence: $$ \alpha = 0.05 $$, $$ \alpha/2 = 0.025 $$. The critical value is $$ z_{0.025} \approx 1.96 $$
- For 99% confidence: $$ \alpha = 0.01 $$, $$ \alpha/2 = 0.005 $$. The critical value is $$ z_{0.005} \approx 2.576 $$
Memory Aid: In statistics, 95% and 1.96 are best friends. You'll see them together all the time!
Now, let's look at the two specific scenarios you need to know for the HKDSE syllabus.
Section 4: Case 1 - We KNOW the Population Variance ($$ \sigma^2 $$)
The Situation
This is the first scenario you'll encounter. The key conditions are:
- The population is assumed to be normally distributed.
- The population variance $$ \sigma^2 $$ (and therefore the standard deviation $$ \sigma $$) is KNOWN.
(In real life, this is rare. If you don't know the population mean $$ \mu $$, why would you know its variance $$ \sigma^2 $$? But it's a perfect starting point for learning!)
The Formula
The 100(1-α)% confidence interval for $$ \mu $$ is given by:
$$ \left( \bar{x} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right) $$Where:
- $$ \bar{x} $$ is the sample mean.
- $$ z_{\alpha/2} $$ is the critical value for your confidence level.
- $$ \sigma $$ is the known population standard deviation.
- $$ n $$ is the sample size.
Step-by-Step Example
The weights of a certain type of apple are normally distributed with a population standard deviation $$ \sigma = 20 $$ grams. A random sample of $$ n=16 $$ apples is taken, and the sample mean weight is found to be $$ \bar{x} = 150 $$ grams. Construct a 95% confidence interval for the true mean weight of all such apples.
Step 1: Identify all your values.
$$ \bar{x} = 150 $$, $$ \sigma = 20 $$, $$ n = 16 $$
Step 2: Find your critical value.
Confidence level = 95%. This means $$ \alpha = 0.05 $$, so we need $$ z_{\alpha/2} = z_{0.025} $$.
From our table, $$ z_{0.025} = 1.96 $$.
Step 3: Calculate the Margin of Error (E).
$$ E = z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 1.96 \times \frac{20}{\sqrt{16}} = 1.96 \times \frac{20}{4} = 1.96 \times 5 = 9.8 $$
Step 4: Construct the interval.
Interval = $$ (\bar{x} - E, \bar{x} + E) $$
$$ (150 - 9.8, 150 + 9.8) = (140.2, 159.8) $$
Step 5: Write your conclusion.
We are 95% confident that the true mean weight of all such apples is between 140.2 grams and 159.8 grams.
Key Takeaway for Case 1
When the population is normal and you are GIVEN the value for $$ \sigma $$, this is the formula to use. It's the simplest case.
Section 5: Case 2 - Population Variance ($$ \sigma^2 $$) is UNKNOWN
The Situation
This is a much more realistic scenario. The key conditions are:
- The population variance $$ \sigma^2 $$ is UNKNOWN.
- The sample size $$ n $$ is sufficiently large.
Why does "large n" matter? Because of the amazing Central Limit Theorem (CLT)! The CLT tells us that if the sample size `n` is large enough, the distribution of sample means ($$ \bar{x} $$) will be approximately normal, regardless of the original population's distribution. This allows us to still use the z-distribution!
Since we don't know $$ \sigma $$, what do we do? We use our best estimate for it: the sample standard deviation, $$ s $$.
The Formula
The 100(1-α)% confidence interval for $$ \mu $$ is given by:
$$ \left( \bar{x} - z_{\alpha/2} \frac{s}{\sqrt{n}}, \bar{x} + z_{\alpha/2} \frac{s}{\sqrt{n}} \right) $$Did you notice? The ONLY change from the first formula is that we replaced the unknown $$ \sigma $$ with the known $$ s $$!
Step-by-Step Example
A school principal wants to estimate the mean number of hours M1 students study per week. A large random sample of $$ n=100 $$ students is selected. The sample mean is $$ \bar{x} = 15.5 $$ hours, and the sample standard deviation is $$ s = 2.5 $$ hours. Construct a 99% confidence interval for the true mean study time.
Step 1: Identify all your values.
$$ \bar{x} = 15.5 $$, $$ s = 2.5 $$, $$ n = 100 $$
Step 2: Find your critical value.
Confidence level = 99%. This means $$ \alpha = 0.01 $$, so we need $$ z_{\alpha/2} = z_{0.005} $$.
From our table, $$ z_{0.005} \approx 2.576 $$.
Step 3: Calculate the Margin of Error (E).
$$ E = z_{\alpha/2} \frac{s}{\sqrt{n}} = 2.576 \times \frac{2.5}{\sqrt{100}} = 2.576 \times \frac{2.5}{10} = 2.576 \times 0.25 = 0.644 $$
Step 4: Construct the interval.
Interval = $$ (\bar{x} - E, \bar{x} + E) $$
$$ (15.5 - 0.644, 15.5 + 0.644) = (14.856, 16.144) $$
Step 5: Write your conclusion.
We are 99% confident that the true mean number of hours M1 students study per week is between 14.856 and 16.144 hours.
Common Mistakes to Avoid!
- Using $$ \sigma $$ when it's unknown: If the question gives you $$ s $$ (sample standard deviation), use the second formula. Don't mix them up!
- Forgetting the $$ \sqrt{n} $$: A very common error is to just divide by $$ s $$ or $$ \sigma $$. The margin of error depends on the standard error of the mean, which is always $$ \frac{s}{\sqrt{n}} $$ or $$ \frac{\sigma}{\sqrt{n}} $$.
- Using the wrong z-value: Double-check if the question asks for 90%, 95%, or 99% confidence and use the correct $$ z_{\alpha/2} $$.
Key Takeaway for Case 2
When $$ \sigma $$ is unknown and `n` is large, simply replace $$ \sigma $$ with $$ s $$ in the formula. Everything else is the same!
Section 6: What Affects the Width of the Interval?
Think about our fishing net. Sometimes we want a very precise estimate (a small net), and sometimes we need to be more certain (a big net). The width of the interval is simply `2 × Margin of Error`. What makes it wider or narrower?
1. The Confidence Level
- Higher Confidence Level $$ \rightarrow $$ Larger $$ z_{\alpha/2} $$ $$ \rightarrow $$ Wider Interval.
- Analogy: If you want to be more confident that you'll catch the fish, you need a bigger net!
2. The Sample Size (n)
- Larger Sample Size (n) $$ \rightarrow $$ Smaller Denominator ($$ \sqrt{n} $$) $$ \rightarrow $$ Narrower Interval.
- Analogy: The more information (data) you have, the more precise your estimate can be. A bigger sample reduces uncertainty.
3. The Standard Deviation ($$ \sigma $$ or $$ s $$)
- Larger Standard Deviation $$ \rightarrow $$ Wider Interval.
- Analogy: If the population is very spread out (high variability), it's harder to pinpoint the true mean, so you need a wider net to be confident.
Chapter Summary & Final Tips
You've made it! Confidence intervals are a fundamental concept in statistics. Here's a simple decision process to help you on an exam:
Decision Flowchart:
- Read the question carefully. What is the sample mean ($$ \bar{x} $$), sample size (n), and confidence level?
- Ask: Is the population standard deviation $$ \sigma $$ KNOWN?
- YES: Use the first formula with $$ \sigma $$.
$$ \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} $$ - NO: The question will give you the sample standard deviation $$ s $$ and state that `n` is large. Use the second formula with $$ s $$.
$$ \bar{x} \pm z_{\alpha/2} \frac{s}{\sqrt{n}} $$
- YES: Use the first formula with $$ \sigma $$.
- Calculate the Margin of Error, then add and subtract it from the sample mean $$ \bar{x} $$.
Final encouraging words: The best way to master confidence intervals is through practice. Work through past paper questions. Pay close attention to the wording to determine which of the two cases you're dealing with. You can do this!