Further hypothesis testing - Further Mathematics (9665) - Oxford AQA A-Level

* thinka提供的內容由AI生成，可能並非總是準確或最新。請將其用作輔助資源，並與官方材料進行核實。

👋 Welcome to Further Hypothesis Testing!

Hello future statistician! In your AS Mathematics course, you mastered the basics of hypothesis testing: checking if a single population mean or proportion was different from a claim. This chapter is where we level up!

In Further Mathematics, we move into more complex, real-world scenarios. We will learn how to compare two populations (Are the salaries in Country A really higher than in Country B?) and how to test if data fits a specific distribution (Does this die truly behave randomly?).

This topic requires you to choose the right statistical tool for the job—whether it’s a Z-test, a t-test, a $\chi^2$ test, or an F-test. Don’t worry, we'll break down which test to use and when!

1. Evaluating the Test: The Power of a Test

When we perform a hypothesis test, we risk making a mistake. You already know about the two types of errors, but let's quickly review them because they are essential for understanding Power.

1.1 Reviewing Errors (The Courtroom Analogy)

Type I Error ($\alpha$): Rejecting the Null Hypothesis ($H_0$) when it is actually true.
Analogy: A jury convicts an innocent person.
Type II Error ($\beta$): Failing to reject $H_0$ when the Alternative Hypothesis ($H_1$) is actually true.
Analogy: A jury lets a guilty person go free.

The Significance Level ($\alpha$) is the maximum probability of making a Type I Error.

1.2 Definition of Power

The Power of a test is the probability of correctly rejecting a false null hypothesis.

$$ \text{Power} = 1 - P(\text{Type II Error}) = 1 - \beta $$

Interpretation: A powerful test is very good at spotting a difference when a difference truly exists. We want the power to be high!

1.3 Calculating $P(\text{Type II Error})$ and Power

Calculating $\beta$ (and thus Power) is only possible when the alternative hypothesis ($H_1$) is a simple alternative. This means $H_1$ specifies a single value for the population parameter, e.g., $H_1: \mu = 105$ (instead of $H_1: \mu > 100$).

Step-by-Step: Calculating $\beta$

Step 1: Set up the rejection region (under $H_0$): Find the critical value(s) ($C$) using the Null Hypothesis ($H_0$) and the significance level ($\alpha$). This critical value is usually in terms of the sample mean ($\bar{X}$).
Step 2: Calculate $\beta$: Assume the alternative hypothesis ($H_1$) is true. Use the critical value ($C$) found in Step 1, but now calculate the probability that the test statistic falls outside the rejection region, assuming the distribution defined by $H_1$ is correct.
Step 3: Calculate Power: $1 - \beta$.

Quick Review: Power tells you how good your test is. If the true mean is far from the null hypothesis mean, the power will be high (easy to detect). If the true mean is very close, the power will be low (hard to detect).

2. Tests for Comparing Two Population Means ($\mu_1$ vs $\mu_2$)

In Further Maths, we frequently compare the means of two different populations, $\mu_1$ and $\mu_2$. The null hypothesis is usually $H_0: \mu_1 = \mu_2$, or equivalently, $H_0: \mu_1 - \mu_2 = 0$.

2.1 Independent Samples with Known Variances (Z-Test)

If both population variances ($\sigma_1^2$ and $\sigma_2^2$) are known, or if both samples are large ($n_1 > 30$ and $n_2 > 30$), we use the Z-test based on the normal distribution.

The test statistic is: $$ Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} $$

Note: If samples are large but variances are unknown, we substitute the sample variances ($S_1^2, S_2^2$) for the population variances ($\sigma_1^2, \sigma_2^2$).

2.2 Independent Small Samples with Unknown, but Equal, Variances (Pooled t-Test)

This is the trickiest case. If the samples are small ($n < 30$), the population variances are unknown, AND we assume the populations have the same variance ($\sigma_1^2 = \sigma_2^2$), we must use the pooled $t$-test.

Why "Pooling"?

Since we assume $\sigma_1^2 = \sigma_2^2$, it makes sense to combine the information from both samples to get a better overall estimate of this common variance. This combined estimate is called the pooled estimate of variance, $S_p^2$.

The formula for the pooled variance is: $$ S_p^2 = \frac{(n_1 - 1) S_1^2 + (n_2 - 1) S_2^2}{n_1 + n_2 - 2} $$

The resulting $t$-test statistic is: $$ T = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{S_p^2 \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} $$

The Degrees of Freedom ($v$) for this test is $ n_1 + n_2 - 2 $.

Common Mistake: Students often forget to use the pooled variance when samples are small and variances are assumed equal. Look for keywords like "assume population variances are the same."

2.3 Paired Samples (Non-Independent Data)

If the data points are paired (i.e., they are related, like 'weight before a diet' and 'weight after a diet'), the two samples are not independent.

In this case, we do not compare the means directly. Instead, we calculate the difference (D) for each pair and test if the mean difference ($\mu_D$) is zero.

$H_0: \mu_D = 0$
We use the standard one-sample $t$-test formula on the difference data $D$.
$T = \frac{\bar{D} - \mu_D}{S_D / \sqrt{n}}$
Degrees of Freedom: $ n - 1 $ (where $n$ is the number of pairs).

Analogy: Imagine testing two types of tyres. If you put Type A on one car and Type B on a different car, they are independent. If you put Type A on the left side of 10 cars and Type B on the right side of those 10 cars, they are paired—the differences in the cars themselves are cancelled out.

3. Tests for Variance: $\chi^2$ and $F$ Distributions

Sometimes, the variability (variance) within a population is just as important as the mean. For example, a quality control team wants to ensure the size of manufactured parts does not vary too much.

3.1 Testing a Single Variance ($\sigma^2$)

To test if a single population variance ($\sigma^2$) is equal to a specific value ($\sigma_0^2$), we use the Chi-Squared distribution ($\chi^2$).

Did you know?

The $\chi^2$ distribution is used because variances (squared standard deviations) cannot be negative. The resulting distribution is skewed, unlike the normal distribution.

The test statistic is: $$ \chi^2 = \frac{(n-1) S^2}{\sigma^2} $$

$S^2$ is the sample variance.
$\sigma^2$ is the hypothesized population variance (under $H_0$).
The Degrees of Freedom ($v$) is $ n-1 $.

We compare the calculated $\chi^2$ value to critical values found in the $\chi^2$ tables. Since the distribution is non-symmetrical, you must check both tails for a two-tailed test.

3.2 Testing the Ratio of Two Variances ($F$-Test)

If you need to compare the variances of two independent normal populations ($\sigma_1^2$ vs $\sigma_2^2$), you use the $F$-distribution.

This test is often used as a preliminary check before running the pooled $t$-test (Section 2.2).

$H_0: \sigma_1^2 = \sigma_2^2$ (The ratio is 1)

The test statistic is: $$ F = \frac{S_1^2}{S_2^2} $$

Convention: When performing an $F$-test, it is standard practice to place the larger sample variance ($S^2$) in the numerator. This ensures $F \ge 1$.
This converts the test into a one-tailed test (as we only check the right tail of the $F$-distribution).
The $F$-distribution has two sets of degrees of freedom: $ v_1 $ (numerator) and $ v_2 $ (denominator). If $S_1^2$ is on top, then $v_1 = n_1 - 1$ and $v_2 = n_2 - 1$.

Quick Review:

To test ONE variance: use $\chi^2$ (df = n-1).
To test the RATIO of TWO variances: use $F$ (df = $n_1-1$, $n_2-1$).

4. Chi-Squared Tests for Categorical Data and Fit

The $\chi^2$ statistic is also used extensively when dealing with non-numerical, categorical data, or when we want to see if observed data matches a known probability distribution.

4.1 Goodness of Fit (GoF) Tests

A GoF test checks if observed frequencies ($O_i$) from a sample align well with the expected frequencies ($E_i$) derived from a specific theoretical distribution (e.g., Uniform, Poisson, Normal).

The test statistic measures the discrepancy: $$ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} $$

Crucial Rules for GoF Tests

Expected Frequencies ($E_i$): It is a standard convention that all expected frequencies $E_i$ must be greater than 5.
Pooling: If an $E_i$ value is 5 or less, you must pool (combine) that category with an adjacent category until the combined expected frequency is greater than 5.
Degrees of Freedom ($v$): $v = (\text{Number of cells after pooling}) - 1 - (\text{Number of parameters estimated})$.
Example: If you test for a Poisson distribution, you must estimate the mean ($\lambda$) from the data, so $p=1$. If testing for a Normal distribution, you estimate the mean ($\mu$) and standard deviation ($\sigma$), so $p=2$. If no parameters are estimated (e.g., Uniform distribution), $p=0$.

Analogy: You are comparing your actual cake (Observed) against the recipe's picture (Expected). The $\chi^2$ value tells you how far off your cake is from the ideal.

4.2 Contingency Tables (Test of Independence)

This test is used to determine if there is an association between two different classifications (variables) collected from a single population.

$H_0$: The two classifications are independent.
$H_1$: The two classifications are dependent (associated).

The $\chi^2$ test statistic formula remains the same: $$ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} $$

How to calculate $E_i$: $$ E_{i} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} $$

The Degrees of Freedom ($v$) for a contingency table with $R$ rows and $C$ columns is: $$ v = (R - 1)(C - 1) $$

4.3 Yates' Correction for $2 \times 2$ Tables

When dealing with a small contingency table (a $2 \times 2$ table, 1 degree of freedom), the continuous $\chi^2$ distribution is a very rough approximation of the discrete data. To improve the accuracy of the approximation, we use Yates' correction for continuity.

This correction reduces the magnitude of the difference between the observed and expected frequencies by 0.5 before squaring: $$ \chi_{\text{corrected}}^2 = \sum \frac{(|O_i - E_i| - 0.5)^2}{E_i} $$

Remember: Use Yates' correction only for $2 \times 2$ tables, and remember that $E_i > 5$ convention still applies.

🧠 Key Takeaways from Further Hypothesis Testing

Power (1 - $\beta$): The probability of detecting a real effect. Calculated by finding the non-rejection region under $H_0$ and then checking the probability of falling within that region under $H_1$.
Comparing Means: Use $Z$ (large samples/known $\sigma^2$), t with pooling (small samples, $\sigma^2$ unknown but assumed equal), or t on differences (paired samples).
Testing $\sigma^2$: Use $\chi^2$ for a single variance, and $F$ for the ratio of two variances. Always put the larger $S^2$ in the numerator for the $F$-test.
Categorical $\chi^2$ Tests: Used for Goodness of Fit (GoF) or Testing Independence (Contingency).
- Ensure all expected frequencies $E_i > 5$ (pool categories if necessary).
- Adjust degrees of freedom for estimated parameters in GoF tests.
- Use Yates' correction for $2 \times 2$ contingency tables.