M1 Study Notes: Sampling Distribution and Point Estimates

Hello everyone! Welcome to a new chapter in statistics. Don't worry if the chapter title sounds a bit scary. We're going to break it all down. In this topic, we'll learn how we can use a small group (a sample) to make a smart guess about a very large group (a population). This is super useful in real life, from predicting election results to checking if a batch of iPhones is good quality without testing every single one. Let's get started!


1. Population vs. Sample: The Big Picture

To understand statistics, we first need to know the difference between a 'population' and a 'sample'.

What is a Population?

A population is the entire group that you want to study or know something about. It’s everyone or everything.

Example: If you want to know the average height of all Secondary 6 students in Hong Kong, the population is EVERY SINGLE S6 student in Hong Kong.

What is a Sample?

A sample is a small part of the population that you actually collect data from. Since it's often impossible or too expensive to study everyone in a population, we take a sample instead.

Example: You can't measure all 50,000 S6 students. So, you randomly select 200 students from different schools and measure their heights. This group of 200 is your sample.

Analogy: Think of tasting soup. The whole pot of soup is the population. The spoonful you taste is the sample. You use the taste of the sample to guess what the whole pot tastes like!

Parameters vs. Statistics

Now, let's add two more important terms. We use different symbols for the population and the sample.

Population Parameters: These are numbers that describe the whole population. They are usually unknown because we can't measure everyone. We often use Greek letters for them.

  • Population Mean (μ): The true average of the entire population.
  • Population Variance (σ²): A measure of how spread out the data is for the entire population.

Sample Statistics: These are numbers calculated from your sample data. You can always calculate these. We use them to estimate the population parameters.

  • Sample Mean (x̄): The average of your sample. Pronounced "x-bar".
  • Sample Variance (s²): A measure of how spread out the data is in your sample.
Quick Review: Key Terms & Symbols

This table is your new best friend for this chapter! Make sure you know it.

Concept
Population (The Whole Group)
Sample (A Small Part)

Mean
Parameter: $$ \mu $$ (mu)
Statistic: $$ \bar{x} $$ (x-bar)

Variance
Parameter: $$ \sigma^2 $$ (sigma-squared)
Statistic: $$ s^2 $$

Size
Parameter: N
Statistic: n


Calculating Population Variance

The syllabus requires you to recognise the formula for population variance. If you magically knew the data for every single person in a population of size N, the formula would be:

Population Variance:

$$ \sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N} $$

This means: for each person, find the difference between their value (x_i) and the population mean (μ), square it, add all those squares up, and finally divide by the total population size (N).

Key Takeaway: We study a sample (with statistics like $$ \bar{x} $$ and $$ s^2 $$) to make an educated guess about the entire population (which has parameters like $$ \mu $$ and $$ \sigma^2 $$).


2. The Sampling Distribution of the Sample Mean (X̄)

This sounds complicated, but the idea is actually quite cool. Don't worry if this seems tricky at first, we'll use an analogy.

Imagine we want to know the true average height (μ) of all S6 students. We know we can't measure them all. So, what do we do?

1. Take a random sample of 30 students and calculate their average height, $$ \bar{x}_1 $$. Maybe we get 168 cm.
2. Do it again! Take a different random sample of 30 students and calculate their average, $$ \bar{x}_2 $$. Maybe this time we get 171 cm.
3. Do this again and again, maybe thousands of times. We will get a long list of different sample means: {168, 171, 169.5, 170, 167.8, ...}.

The sampling distribution of the sample mean is the probability distribution of all these possible sample means. If we made a histogram of our list of sample means, we'd see this distribution.

Two Magical Properties You MUST Know

For a random sample of size n taken from a population with mean μ and variance σ², the distribution of sample means ($$\bar{X}$$) has two very important properties:

1. The Mean of the Sample Means
$$ E[\bar{X}] = \mu $$

In plain English: The average of all the possible sample means you could ever take is equal to the true population mean. This is great news! It means our sample mean is, on average, "on target" to estimate the population mean.

2. The Variance of the Sample Means
$$ Var(\bar{X}) = \frac{\sigma^2}{n} $$

In plain English: This formula tells us how spread out the sample means are. Notice the 'n' on the bottom. This is super important!

  • As the sample size (n) gets bigger, the variance of the sample means gets smaller.
  • This means that with a larger sample, your sample mean ($$\bar{x}$$) is much more likely to be very close to the true population mean (μ). It makes sense, right? A bigger sample gives you a more reliable estimate.

The standard deviation of this distribution is called the standard error of the mean: $$ \sigma_{\bar{X}} = \sqrt{Var(\bar{X})} = \frac{\sigma}{\sqrt{n}} $$

What if the Original Population is Normal?

If the original population you are sampling from is already normally distributed, so $$ X \sim N(\mu, \sigma^2) $$, then the sampling distribution of the sample mean will also be perfectly normal, no matter the sample size.

Result: If $$ X \sim N(\mu, \sigma^2) $$, then $$ \bar{X} \sim N(\mu, \frac{\sigma^2}{n}) $$

Key Takeaway: The sampling distribution of the mean is the distribution we get by taking many samples and looking at their means. Its mean is $$ \mu $$ and its variance is $$ \frac{\sigma^2}{n} $$. Bigger samples lead to less spread-out sample means.


3. The Central Limit Theorem (CLT)

This is one of the most important and amazing theorems in all of statistics! It's like a superpower.

So, what happens if the original population is NOT normally distributed? What if it's skewed, or bimodal, or just a weird shape?

The Central Limit Theorem (CLT) states:

For a sufficiently large sample size (n), the sampling distribution of the sample mean ($$\bar{X}$$) will be approximately normal, regardless of the shape of the original population's distribution.

How cool is that?! Even if we start with a weird-looking population, the distribution of its sample means will look like a nice, familiar bell curve (a normal distribution) if our sample is big enough.

How large is "sufficiently large"?

A common rule of thumb used in statistics is:

n ≥ 30

If your sample size is 30 or more, you can usually assume the Central Limit Theorem applies.

Putting it all together (The BIG Result):

If n is sufficiently large (e.g., n ≥ 30), then by the Central Limit Theorem:

$$ \bar{X} \approx N(\mu, \frac{\sigma^2}{n}) $$

Note the "approximately" symbol ($$\approx$$) because it's an approximation, not an exact distribution (unless the original population was normal).

Did you know? The CLT is why the normal distribution is so common in the real world. Many things, like the total weight of a bag of 50 apples, are the result of adding up many small random effects. The CLT predicts that such sums and averages will tend to follow a normal distribution.

Key Takeaway: The Central Limit Theorem is our secret weapon. It allows us to use the normal distribution to solve problems involving the sample mean, as long as our sample size is large enough (n ≥ 30), even if we have no idea what the original population looks like.


4. Point Estimates: Our Best Guess

We've talked a lot about using samples to understand populations. A point estimate is the simplest way to do this. It's a single number that we use as our "best guess" for an unknown population parameter.

Analogy: If someone asks you to estimate the temperature, you give a single number like "25 degrees". You don't say "it's between 24 and 26". That single number is a point estimate.

Estimating the Population Mean (μ)

What's our best guess for the unknown population mean, μ?

The sample mean ($$\bar{x}$$) is the best point estimate for the population mean (μ).

Example: If the average height in your sample of 200 students ($$\bar{x}$$) is 170.5 cm, then your best point estimate for the true average height of all S6 students in Hong Kong (μ) is 170.5 cm.

Estimating the Population Variance (σ²)

What's our best guess for the unknown population variance, σ²?

The sample variance ($$s^2$$) is the best point estimate for the population variance (σ²). But be careful with the formula!

The Crucial Formula for Sample Variance (s²)

When we calculate variance from a sample to estimate the population variance, we use a slightly different formula. We divide by n-1, not n.

$$ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1} $$
Why divide by n-1? The Idea of an Unbiased Estimator

This is a key concept. Dividing by n-1 makes $$s^2$$ an unbiased estimator of $$σ^2$$.

Simple explanation: The data in a sample tends to be slightly less spread out than the data in the whole population. If we divided by 'n', our estimate for the variance ($$s^2$$) would, on average, be a little too small. By dividing by a smaller number (n-1), we make our answer a little bigger, which corrects for this tendency. It gives us a more accurate estimate in the long run.

You don't need to prove this, but you DO need to remember to use n-1 for the sample variance $$s^2$$!

Common Mistake to Avoid!

Do not mix up the formulas for population variance and sample variance.

  • Population Variance $$ \sigma^2 $$: Divide by N. You use this when you have data for the ENTIRE population. (Rare in reality).
  • Sample Variance $$ s^2 $$: Divide by n-1. You use this when you have sample data and want to ESTIMATE the population variance. (Very common).

Key Takeaway: A point estimate is a single value guess for a parameter. The sample mean ($$\bar{x}$$) estimates the population mean (μ). The sample variance ($$s^2$$, with n-1 in the denominator) is an unbiased estimator for the population variance ($$σ^2$$).