Unit S2: Statistics 2 - Hypothesis Tests

Hello future statistician! This chapter is where everything you learned about probability distributions comes together. Hypothesis testing is perhaps the most practical and exciting part of statistics because it allows us to formally use data to prove or disprove a claim about the real world.
Don't worry if this seems tricky at first; we will break down the process step-by-step. By the end, you’ll be able to confidently test claims made by scientists, politicians, or even your local shopkeeper!

1. The Core Concept: What is a Hypothesis Test?

A hypothesis test is a formal procedure for deciding whether to reject a statistical claim (the null hypothesis) based on evidence gathered from a sample.

Analogy: The Courtroom

Think of a hypothesis test like a criminal trial:

  • The default position is that the defendant is innocent (This is the status quo).
  • The prosecution (the claimant) needs evidence to prove guilt.
  • If the evidence is strong enough (beyond reasonable doubt, or statistically significant), we reject the default position.

Key Terminology You Must Know

You cannot perform a test without mastering this vocabulary:

  • Null Hypothesis (\(H_0\)): This is the status quo, the current widely accepted belief, or the assumption we start with. It always includes an equality sign (\(p = 0.5\), \(\lambda = 10\)).
  • Alternative Hypothesis (\(H_1\)): This is the claim being tested, suggesting the parameter has changed. It never includes an equality sign (\(p < 0.5\), \(\lambda > 10\), or \(p \ne 0.5\)).
  • Population Parameter: The true value being tested (e.g., the true probability \(p\) or the true mean rate \(\lambda\)).
  • Test Statistic: The piece of data you actually measure from your sample (e.g., the number of successes, \(X\)).
  • Significance Level (\(\alpha\)): The threshold probability of rejecting \(H_0\) when it is actually true. Common levels are 5% (0.05) or 1% (0.01).
  • Critical Region (Rejection Region): The range of values for the test statistic that would lead us to reject \(H_0\). If our observed test statistic falls in this region, the result is "too unusual" to support \(H_0\).
  • Acceptance Region: The range of values where we do not reject \(H_0\).

Quick Review: \(H_0\) is boring (equal to); \(H_1\) is exciting (less than, greater than, or not equal to).

2. One-Tailed vs. Two-Tailed Tests

The type of test you perform depends entirely on the question being asked and how you write \(H_1\).

One-Tailed Test (Directional)

This is used when the alternative hypothesis specifies a change in only one direction (e.g., an increase OR a decrease).

Example: A factory claims that the defect rate is \(p = 0.1\). A manager suspects the rate has increased.
$$H_0: p = 0.1$$ $$H_1: p > 0.1$$

Two-Tailed Test (Non-Directional)

This is used when the alternative hypothesis specifies that the parameter has simply changed (i.e., it could be higher or lower).

Example: A company claims that 50% of people prefer their product (\(p = 0.5\)). A researcher suspects this proportion is no longer 50%.
$$H_0: p = 0.5$$ $$H_1: p \ne 0.5$$

The Crucial Step for Two-Tailed Tests: Halving the Significance Level

If you use a two-tailed test at a significance level of \(\alpha\), you must split that risk equally between the two tails.

If \(\alpha = 5\%\), then 2.5% goes into the lower critical region and 2.5% goes into the upper critical region.

Memory Aid:

How many tails do you see in the symbol for \(H_1\)?

  • \(>\) or \(<\) looks like one tail. \(\rightarrow\) One-Tailed Test.
  • \(\ne\) has two ends pointing away. \(\rightarrow\) Two-Tailed Test (split \(\alpha\)).

3. The Standard 5-Step Hypothesis Testing Procedure (Using Critical Regions)

No matter which distribution you use, follow these steps perfectly. Getting the structure right often earns method marks!

  1. Define the Hypotheses and the Model:

    State \(H_0\) and \(H_1\) clearly in terms of the population parameter (e.g., \(p\) or \(\lambda\)).
    Also, define the distribution model and its parameters (e.g., \(X \sim B(n, p)\) or \(X \sim Po(\lambda)\)).

  2. Determine the Critical Region (CR):

    Using the significance level (\(\alpha\)), find the boundary value(s) for the test statistic \(X\).
    This involves using your statistical tables (Binomial or Poisson) to find the probability cut-off point.

  3. State the Test Statistic:

    State the actual observed value from the sample data. Call this \(x\).

  4. Compare and Decide:

    Check if the test statistic \(x\) falls into the Critical Region.

    • If \(x \in CR\), the result is significant. Reject \(H_0\).
    • If \(x \notin CR\), the result is not significant. Do not reject \(H_0\).

  5. Write the Conclusion in Context:

    Translate your statistical decision back into simple English relating to the original problem. (E.g., "There is sufficient evidence to suggest that the proportion of faulty items has increased.")

4. Hypothesis Testing using the Binomial Distribution

This is the most common form of hypothesis test in S2, used when you are testing a proportion or probability \(p\), and you have a fixed number of trials \(n\).

Example Walkthrough: The Faulty Lightbulbs

A company claims that 20% of its lightbulbs are faulty (\(p = 0.2\)). An inspector tests a random sample of \(n=15\) bulbs and finds 6 are faulty. Test this claim at the 5% significance level, suspecting the rate is higher.

Step 1: Hypotheses and Model

We assume the number of faulty bulbs \(X\) follows a Binomial distribution.
$$X \sim B(15, 0.2)$$ $$H_0: p = 0.2$$ $$H_1: p > 0.2 \quad \text{(One-tailed test)}$$

Step 2: Determine the Critical Region

We are looking for the upper tail where \(P(X \ge x) \leq 0.05\). Since tables only give \(P(X \le x)\), we use the complement rule: \(P(X \ge x) = 1 - P(X \le x-1)\).

  • Try \(x=5\): \(P(X \ge 5) = 1 - P(X \le 4) = 1 - 0.8358 = 0.1642\) (Too high, not in CR)
  • Try \(x=6\): \(P(X \ge 6) = 1 - P(X \le 5) = 1 - 0.9389 = 0.0611\) (Too high, not in CR)
  • Try \(x=7\): \(P(X \ge 7) = 1 - P(X \le 6) = 1 - 0.9819 = 0.0181\) (\(\leq 0.05\)! This is in the CR)

The Critical Region is \(X \ge 7\).

Step 3: Test Statistic

The observed number of faulty bulbs is \(x = 6\).

Step 4: Comparison and Decision

Is \(6\) in the Critical Region (\(X \ge 7\))? No.
Decision: Do not reject \(H_0\).

Step 5: Conclusion in Context

There is insufficient evidence at the 5% significance level to conclude that the proportion of faulty lightbulbs has increased above 20%.

Alternative Method: Using the P-Value

Instead of finding the Critical Region boundary first, you can calculate the probability (P-value) of getting the observed result (or more extreme) assuming \(H_0\) is true.

For the example above, the P-value is \(P(X \ge 6) = 0.0611\).
Since \(P\)-value (0.0611) \(> \alpha\) (0.05), we Do Not Reject \(H_0\). (Same result, just a different method).

5. Hypothesis Testing using the Poisson Distribution

The Poisson distribution is used when you are testing a rate of occurrence (\(\lambda\)) in a fixed interval of time or space. The steps are exactly the same as the Binomial test.

Key Consideration: The Rate \(\lambda\)

If the sample period/area is different from the stated rate, you must adjust \(\lambda\) for the test model.

Example: A call centre claims they receive 8 calls per hour (\(\lambda=8\)). They monitor the centre for a single half-hour period and receive 7 calls.

The model must be adjusted for the half-hour period:
$$X \sim Po(4) \quad \text{(Since } 8 \times 0.5 = 4)$$

Key Takeaway for S2: Whether you use Binomial or Poisson, the overall testing structure remains the same. The only thing that changes is the calculation of the probability (using tables or formulas).

6. Hypothesis Testing using the Normal Approximation

When \(n\) is large (for Binomial) or \(\lambda\) is large (for Poisson), calculating probabilities precisely using tables becomes impossible or too complex. We must use the Normal Approximation.

When to Use Normal Approximations in S2:

  1. Normal Approximation to the Binomial: \(X \sim B(n, p)\) is approximated by \(Y \sim N(\mu, \sigma^2)\) if:
    • \(n\) is large (usually \(n > 50\)).
    • \(np > 5\) AND \(n(1-p) > 5\).

    Parameters: \(\mu = np\) and \(\sigma^2 = np(1-p)\).

  2. Normal Approximation to the Poisson: \(X \sim Po(\lambda)\) is approximated by \(Y \sim N(\mu, \sigma^2)\) if:
    • \(\lambda\) is large (usually \(\lambda > 10\) or \(\lambda > 15\)).

    Parameters: \(\mu = \lambda\) and \(\sigma^2 = \lambda\).

The Critical Step: Continuity Correction

Since we are switching from a discrete distribution (\(X\)) to a continuous one (\(Y\)), we must use a continuity correction (CC). This is the single biggest source of errors in these tests!

Discrete Probability (X) Continuous Approximation (Y)
\(P(X \le 10)\) \(P(Y < 10.5)\)
\(P(X < 10)\) or \(P(X \le 9)\) \(P(Y < 9.5)\)
\(P(X \ge 10)\) \(P(Y > 9.5)\)
\(P(X = 10)\) \(P(9.5 < Y < 10.5)\)

Rule of Thumb: Include half the gap to make sure you capture the original whole number. If you are including 10, start/end the correction at 10.5. If you are stopping *before* 10, stop at 9.5.

Procedure using Normal Approximation (Z-test)

Follow the 5-step plan, but Step 4 changes:

  1. Define \(H_0, H_1\), and the Normal model \(Y \sim N(np, np(1-p))\) (or \(N(\lambda, \lambda)\)).
  2. Apply the Continuity Correction to your observed test statistic \(X\).
  3. Standardise the corrected value \(Y\) using the formula: $$Z = \frac{Y - \mu}{\sigma}$$
  4. Compare Z:

    Compare your calculated Z-value to the critical Z-value taken from the standard Normal table for your significance level \(\alpha\).

    • If \(|Z_{calculated}| > Z_{critical}\), Reject \(H_0\).

  5. Conclusion in context.

Did you know? The term 'p-value' comes from 'probability value', and its use was popularised by statistician Ronald Fisher in the 1920s!

7. Understanding Errors (Alpha and Beta)

In any hypothesis test, there is always a risk that you make the wrong decision.

Type I Error (\(\alpha\))

A Type I Error occurs when you Reject \(H_0\), but \(H_0\) was actually TRUE.

In the court analogy: Convicting an innocent person.
The probability of a Type I error is exactly equal to the significance level \(\alpha\). If \(\alpha=5\%\), there is a 5% chance of making this error.

Type II Error (\(\beta\))

A Type II Error occurs when you Do Not Reject \(H_0\), but \(H_0\) was actually FALSE (and \(H_1\) was true).

In the court analogy: Letting a guilty person go free.
The calculation of \(\beta\) (the probability of Type II error) is complex because it depends on the actual (unknown) value of the population parameter, but you must understand the concept.

Common Pitfall

If you try to reduce the risk of a Type I error (by lowering \(\alpha\), say from 5% to 1%), you make your critical region smaller, making it harder to reject \(H_0\). This automatically increases the probability of a Type II error (\(\beta\)). You must balance these risks!

Chapter Key Takeaway Summary
  • Always state \(H_0\) (equal) and \(H_1\) (unequal).
  • Be careful with one-tailed vs. two-tailed tests; remember to split \(\alpha\) for two tails.
  • Critical regions define rejection boundaries; P-values compare observed probability directly to \(\alpha\).
  • When using Normal Approximation, the Continuity Correction is mandatory.
  • The probability of a Type I Error is equal to the significance level \(\alpha\).