Welcome to Hypothesis Tests: Being a Statistical Detective!

Hello future statisticians! Hypothesis testing might sound intimidating, but it is one of the most practical and exciting topics in Statistics. Essentially, you are learning how to use maths to test claims about the world.

This chapter is the cornerstone of Unit S2: Statistics 2. We will focus specifically on testing parameters, especially the probability \(p\) associated with a Binomial Distribution, which is the main type of hypothesis test you will encounter in this unit.

Let's dive in and learn how to prove (or disprove!) a claim using solid statistical evidence!

Key Takeaway from the Introduction

Hypothesis testing allows us to use sample data to make decisions about a parameter (like the mean or a probability) for a whole population.


1. The Foundation: What is a Hypothesis Test?

Imagine someone claims that a coin is fair, meaning the probability of landing heads is \(p = 0.5\). You suspect they are lying and that the coin is weighted. A Hypothesis Test is a formal procedure to check if there is enough evidence to reject the original claim.

There are two main statements involved in any test:

  • The Null Hypothesis (\(H_0\)): This is the status quo or the existing belief. It always states that the parameter (e.g., \(p\), the probability) has a specific value.
  • The Alternative Hypothesis (\(H_1\)): This is the claim you are trying to find evidence for. It challenges the null hypothesis, suggesting the parameter is less than, greater than, or simply different from the value in \(H_0\).

Rule for Setting up \(H_0\)

The Null Hypothesis (\(H_0\)) must always include an equality sign (\(= \)).

Example: If we are testing the claim that the proportion of students who cycle to school is 20%, then:

\(H_0: p = 0.2\)

And if we suspect the proportion is higher:

\(H_1: p > 0.2\)

Quick Review: The Two Hypotheses

\(H_0\) (The Status Quo): Always uses \(= \). This is what you assume is true until proven otherwise.
\(H_1\) (The Challenger): Uses \(< \), \(> \), or \(\ne \). This is what you are testing for.


2. The Significance Level (\(\alpha\)): How Sure Do We Need To Be?

We need a standard for how strong our evidence must be before we reject \(H_0\). This standard is called the Significance Level, denoted by the Greek letter \(\alpha\) (alpha).

The significance level is a measure of risk. It defines the maximum probability of incorrectly rejecting a true \(H_0\).

  • Common significance levels are 10% (0.1), 5% (0.05), or 1% (0.01).

Analogy: Think of the significance level as the "beyond a reasonable doubt" standard in a court of law. If \(\alpha = 0.05\), we are demanding that the evidence against \(H_0\) is so extreme that it would only happen by chance 5% of the time (or less!) if \(H_0\) were actually true.


3. One-Tailed vs. Two-Tailed Tests

The type of alternative hypothesis (\(H_1\)) determines whether your test is one-tailed or two-tailed. This is critical for setting up your critical region later.

3.1 One-Tailed Tests

These tests look for a change in only one direction.

  • If \(H_1\) uses \(> \): We are testing for an increase (Upper Tail Test).
  • If \(H_1\) uses \(< \): We are testing for a decrease (Lower Tail Test).

Example: A manufacturer claims a component lasts 1000 hours. A consumer group suspects it lasts less than 1000 hours.
\(H_0: \mu = 1000\)
\(H_1: \mu < 1000\) (One-tailed, lower tail)

3.2 Two-Tailed Tests

These tests look for a change in either direction (increase OR decrease).

  • If \(H_1\) uses \(\ne \): The parameter is simply different from the stated value.

Example: A coin is claimed to be fair (\(p = 0.5\)). You just want to test if it is not fair (weighted in any way).
\(H_0: p = 0.5\)
\(H_1: p \ne 0.5\) (Two-tailed)

Important Note for Two-Tailed Tests

If the significance level is \(\alpha\), you must split this risk equally between the two tails.

If \(\alpha = 5\%\) (0.05) and the test is two-tailed:
The significance level applied to the upper tail is \(0.05 / 2 = 0.025\) (2.5%).
The significance level applied to the lower tail is \(0.05 / 2 = 0.025\) (2.5%).


4. Hypothesis Testing using the Binomial Distribution (The S2 Core)

In Unit S2, you will often deal with situations where the outcome is a count of "successes" in a fixed number of trials, which follows the Binomial distribution.

We assume the number of successes, \(X\), follows the distribution \(X \sim B(n, p)\).

  • \(n\) is the fixed sample size (number of trials).
  • \(p\) is the probability of success assumed under \(H_0\).

The 5-Step Hypothesis Testing Procedure

Follow these steps for every test you perform:

Step 1: Define Hypotheses and Significance Level

State \(H_0\), \(H_1\), and the significance level \(\alpha\). Define the distribution of your test statistic \(X\).
Example: Test if \(p\) has increased from 0.4, using a sample size of 20 and \(\alpha = 5\%\).
\(H_0: p = 0.4\)
\(H_1: p > 0.4\)
\(X \sim B(20, 0.4)\)

Step 2: Determine the Test Statistic

This is the value actually observed in your sample.
Example: If 13 successes were observed, the test statistic is \(x = 13\).

Step 3: Find the Critical Region (CR) or Calculate the P-value

This step determines whether your observed result is "extreme enough" to reject \(H_0\). You must use the cumulative probabilities from the tables or your calculator.

Step 4: Compare and Make a Decision

Is your observation in the critical region? Or is the p-value less than \(\alpha\)?

Step 5: State the Conclusion in Context

Link your statistical decision back to the original real-world problem.


5. Critical Regions and P-Values (The Decision Tools)

The Critical Region (CR) is the range of values for the test statistic that would lead to the rejection of \(H_0\).

5.1 Method 1: Finding the Critical Region

Since \(X\) is discrete (you can only have whole numbers of successes), we look for the region that is as close as possible to the significance level \(\alpha\), without exceeding it.

Example Walkthrough (One-Tailed Test)

Suppose \(X \sim B(10, 0.8)\) and \(H_1: p < 0.8\). We use \(\alpha = 5\%\) (0.05).

1. We are looking for small values of \(X\). The critical region \(C\) starts at \(X=0\) and goes up to a value \(c\). We want to find \(c\) such that:
\(P(X \le c) \le 0.05\)

2. Using the binomial tables for \(n=10, p=0.8\):

  • \(P(X \le 5) = 0.0328\) (This is less than 0.05 - Good!)
  • \(P(X \le 6) = 0.1209\) (This is greater than 0.05 - Too high!)

3. Therefore, the Critical Region is \(X \le 5\).

If your observed result \(x\) is 5 or less, you reject \(H_0\). If \(x\) is 6 or more, you do not reject \(H_0\).

5.2 Actual Significance Level

Because \(X\) is discrete, we often cannot hit \(\alpha\) exactly. The Actual Significance Level is the probability associated with the critical region you actually found.
In the example above, the Actual Significance Level is \(P(X \le 5) = 0.0328\) or 3.28%.

5.3 Method 2: Calculating the P-Value

The P-value is the probability of observing a result as extreme as (or more extreme than) your observed test statistic, assuming \(H_0\) is true.

Rule of Thumb for P-values:
  • If P-value \(\le \alpha\): Reject \(H_0\) (The result is very unlikely under \(H_0\)).
  • If P-value \(> \alpha\): Do not reject \(H_0\).
Example:

Using the previous example: \(X \sim B(10, 0.8)\), \(H_1: p < 0.8\), \(\alpha = 0.05\). Suppose we observed \(x = 6\).

1. Since \(H_1\) is \(p < 0.8\), the extreme region is the lower tail.
2. Calculate the probability of observing 6 or something more extreme (lower):
P-value \( = P(X \le 6)\)
P-value \( = 0.1209\) (from tables)

3. Compare: \(0.1209 > 0.05\).
4. Conclusion: Since the P-value is greater than \(\alpha\), we do not reject \(H_0\).

Key Takeaway (Decision Rule)

Whether you use the Critical Region or the P-value method, the outcome must be the same! Use the method you find easiest, but be comfortable with both.


6. Making the Conclusion (Step 5)

This step requires clear, non-mathematical language.

If you Reject \(H_0\):

There is sufficient evidence, at the \(\alpha\) significance level, to suggest that [state the claim from \(H_1\) in context].

If you Do Not Reject \(H_0\):

There is insufficient evidence, at the \(\alpha\) significance level, to reject the Null Hypothesis. We conclude there is no significant evidence to suggest that [state the claim from \(H_1\) in context].

A Common Mistake to Avoid!

NEVER say "Accept \(H_0\)." When we fail to reject \(H_0\), it simply means we don't have *enough* evidence to prove \(H_1\). We don't prove \(H_0\) is true; we just didn't prove it false!

Think of the court analogy: A jury can find someone "not guilty" (Do not reject \(H_0\)), but they don't necessarily prove the person is "innocent" (Accept \(H_0\)).


7. Errors in Hypothesis Testing

Since we are using probability and samples, there is always a chance we will make the wrong decision. There are two types of errors:

7.1 Type I Error

This happens when you Reject \(H_0\), but \(H_0\) was actually true.
(You find the coin is biased, but it was actually fair.)

  • The probability of a Type I error is exactly equal to the Significance Level (\(\alpha\)).
    \(P(\text{Type I Error}) = P(\text{Reject } H_0 \mid H_0 \text{ is true}) = \alpha\)

7.2 Type II Error

This happens when you Do Not Reject \(H_0\), but \(H_0\) was actually false (i.e., \(H_1\) was true).
(You find the coin is fair, but it was actually biased.)

  • The probability of a Type II error is denoted by \(\beta\) (beta).
  • Finding \(\beta\) is more complicated because \(H_1\) is a range of values (e.g., \(p > 0.5\)). To calculate \(\beta\), you must be given a specific value under \(H_1\) to test against.

Calculating the Probability of a Type II Error (\(\beta\))

\(\beta = P(\text{Do Not Reject } H_0 \mid H_1 \text{ is true at a specific value } p_1)\)

Process:

  1. First, determine the Acceptance Region (AR) for the original test (the region where you *Do Not Reject* \(H_0\)).
  2. Use the specific value \(p_1\) given under \(H_1\) to set up a new distribution: \(X \sim B(n, p_1)\).
  3. Calculate the probability of falling into the Acceptance Region using this new distribution. That probability is \(\beta\).

Did you know?

There is an inverse relationship between the two errors. If you reduce the risk of a Type I error (\(\alpha\)), you automatically increase the risk of a Type II error (\(\beta\)), and vice versa! Making your test super strict (e.g., \(\alpha = 1\%\)) makes it harder to reject \(H_0\), meaning you are more likely to miss a real effect.

Key Takeaway (Errors)

Type I Error: Rejecting \(H_0\) when it's true. Probability is \(\alpha\). (False Positive)
Type II Error: Not rejecting \(H_0\) when it's false. Probability is \(\beta\). (False Negative)


Chapter Summary Checklist

You are ready to tackle exam questions if you can confidently:

  • State the Null (\(H_0\)) and Alternative (\(H_1\)) hypotheses correctly.
  • Identify whether a test is one-tailed or two-tailed.
  • Split the significance level (\(\alpha\)) correctly for a two-tailed test.
  • Find the Critical Region for a Binomial test using cumulative probabilities.
  • Calculate the P-value and use it to reach a decision.
  • Define and calculate the probability of a Type I error (\(\alpha\)).
  • Calculate the probability of a Type II error (\(\beta\)) given a specific alternative parameter value.

Well done! Hypothesis testing is often seen as challenging, but by following the steps systematically, you can master it! Keep practicing those critical region boundaries!