Welcome to Hypothesis Testing!

Hello! This is one of the most powerful and exciting topics in Statistics. Hypothesis testing is essentially the statistical method for making informed decisions based on sample data. It's like being a detective or a judge, deciding whether there is enough evidence to reject a long-held belief or claim.

In this chapter, we will learn how to formally structure a statistical argument, evaluate evidence, and conclude whether a claim is statistically significant. Don't worry if this seems tricky at first—we'll break it down into simple, manageable steps!


1. The Core Idea: The Statistical Trial

Think of hypothesis testing as a trial in a court of law.

  • The Null Hypothesis (\(H_0\)) is the assumption of innocence (the status quo).
  • The Alternative Hypothesis (\(H_1\)) is the accusation (the claim we suspect is true).
  • The Sample Data is the evidence presented.
  • The Significance Level is how strong the evidence needs to be to convict.

The Rule of Assumption

In statistics, we always assume the Null Hypothesis (\(H_0\)) is true, just like assuming a person is innocent until proven guilty. We then use our sample data to see if it provides enough contradictory evidence to reject \(H_0\).

Key Takeaway

We never "prove" \(H_1\) is true; we only decide whether the evidence is strong enough to reject \(H_0\) in favour of \(H_1\).

2. Setting up Hypotheses (H₀ and H₁)

2.1. The Null Hypothesis (\(H_0\))

The null hypothesis is the starting point, stating that there is no change, no effect, or that the parameter equals a specified value.

  • \(H_0\) always contains an equals sign (\(=\)).
  • Example (Proportion \(p\)): \(H_0: p = 0.5\)
  • Example (Mean \(\mu\) or \(\lambda\)): \(H_0: \mu = 10\)

2.2. The Alternative Hypothesis (\(H_1\))

The alternative hypothesis reflects the claim or suspicion that the parameter has changed. This dictates whether the test is one-tailed or two-tailed.

  • One-Tailed Test: If the change is expected in a specific direction (either higher or lower).
    Example: A manufacturer claims a new process reduces defects.
    \(H_1: p < 0.5\) (Less than the old proportion)
  • Two-Tailed Test: If the change is simply expected to be different, without a specified direction.
    Example: We suspect the average score is no longer 10.
    \(H_1: \mu \neq 10\) (Not equal to 10)

⛔ Common Pitfall Alert!

NEVER put a less than (\(<\)), greater than (\(>\)), or not equal to (\(\neq\)) sign in \(H_0\). \(H_0\) must always specify a single parameter value, e.g., \(H_0: \mu = 5\).

3. Decision Making: The Critical Components

3.1. Significance Level (\(\alpha\))

The significance level (\(\alpha\)), usually 5% (0.05) or 1% (0.01), is the threshold for rejecting \(H_0\).

  • It represents the maximum probability of making a Type I Error (rejecting \(H_0\) when it is actually true).
  • A smaller \(\alpha\) (e.g., 1%) requires stronger evidence to reject \(H_0\).

3.2. Critical Value and Critical Region

The Critical Region (or Rejection Region) is the range of values of the test statistic that would lead to rejecting \(H_0\). The boundary of this region is the Critical Value.

  • If your calculated test statistic falls into the Critical Region, you reject \(H_0\).
  • If the test is two-tailed, the significance level \(\alpha\) must be split equally between both tails. For instance, a 5% test will have a 2.5% rejection region in the upper tail and a 2.5% region in the lower tail.
  • The remaining range of values is the Acceptance Region.

3.3. The p-Value Method

The p-value is an alternative way to make a decision, and it is usually easier, especially with modern calculators.

  • The p-value is the probability of observing a result as extreme as, or more extreme than, the one calculated from the sample data, assuming \(H_0\) is true.
  • Decision Rule: If p-value < \(\alpha\) (the significance level), we reject \(H_0\).

Analogy: If the probability of seeing this evidence by chance (the p-value) is lower than our acceptable risk (the significance level), the evidence is strong enough to reject the assumption (H₀).

4. Errors in Hypothesis Testing (Type I and Type II)

Since we are relying on sample data, we always risk drawing the wrong conclusion.

4.1. Type I Error

A Type I Error occurs when we reject \(H_0\), but \(H_0\) was actually true.

  • Court Analogy: Finding an innocent person guilty.
  • The risk (probability) of making a Type I Error is equal to the significance level, \(\alpha\).

4.2. Type II Error

A Type II Error occurs when we accept \(H_0\) (or fail to reject \(H_0\)), but \(H_0\) was actually false.

  • Court Analogy: Finding a guilty person innocent.
  • The risk of making a Type II Error is usually denoted by \(\beta\).
Did You Know?

If you decrease the probability of a Type I error (e.g., changing \(\alpha\) from 5% to 1%), you generally increase the probability of a Type II error (\(\beta\)), assuming the sample size remains constant. You can't usually reduce both risks simultaneously unless you collect more data!

5. Specific Tests Covered by the Syllabus

The procedure remains the same for all tests: 1. State hypotheses, 2. Define significance level, 3. Calculate test statistic/p-value, 4. Compare and conclude. What changes is the distribution we use for the calculation.

5.1. Test for a Population Proportion (Binomial)

If you are testing a proportion \(p\) (e.g., the proportion of people who favour a candidate) and using a small, finite sample size \(n\), you must use the Binomial distribution.

  • Distribution assumed under \(H_0\): \(X \sim B(n, p_0)\).
  • We use the exact binomial probabilities (cumulative tables or the formula) to find the p-value.

Step-by-Step for Binomial/Poisson Tests (Using p-value):

  1. State \(H_0\) (e.g., \(p = 0.2\)) and \(H_1\) (e.g., \(p > 0.2\)).
  2. Identify the observed frequency \(x\) from the sample.
  3. Calculate the p-value: Find \(P(X \ge x)\) (if \(H_1\) is \(>\)) or \(P(X \le x)\) (if \(H_1\) is \(<\)), using the assumed distribution \(B(n, p_0)\).
  4. If the test is two-tailed (\(H_1: p \neq p_0\)): Find the probability of the observed result or more extreme. Then double this probability to get the two-tailed p-value.
  5. Compare the p-value with \(\alpha\).

5.2. Test for the Mean of a Poisson Distribution

If the data consists of counts (e.g., number of emails received per hour) and we are testing the rate parameter \(\lambda\), we use the Poisson distribution.

  • Distribution assumed under \(H_0\): \(X \sim Po(\lambda_0)\).
  • We use exact Poisson probabilities (cumulative tables or the formula, including calculation of values of \(e^{-\lambda}\)) to find the p-value.

Note: If you are testing the mean rate \(\lambda\) over a specific time period, remember that if the sample is collected over a time \(t\) (or length \(L\)), the parameter you use in the Poisson distribution must be \(\lambda_0 \times t\) (or \(\lambda_0 \times L\)).

5.3. Tests for the Mean (\(\mu\)) using the \(Z\)-Statistic

When testing the mean \(\mu\), we often use the Normal distribution (and thus the \(Z\)-statistic) in three key scenarios:

Scenario A: Normal Distribution with Known Variance (\(\sigma^2\))

If the original population is Normal, \(X \sim N(\mu, \sigma^2)\), and we know \(\sigma^2\), the sample mean \(\bar{X}\) follows the distribution:
\(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\)
The test statistic is the Z-statistic:
\(Z = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}\)

Scenario B: Large Sample Tests (Normal Approximation)

According to the Central Limit Theorem (CLT), if the sample size \(n\) is large (generally \(n > 30\)), the sampling distribution of the mean \(\bar{X}\) can be approximated by a Normal distribution, regardless of the original population distribution.

  • If the population variance \(\sigma^2\) is known, use it in the \(Z\)-formula.
  • If \(\sigma^2\) is unknown, for large samples, we can substitute the sample variance \(S^2\) (or sample standard deviation \(S\)) for \(\sigma^2\) and still use the \(Z\)-test.
  • Test Statistic (Large Sample): \(Z = \frac{\bar{X} - \mu_0}{S/\sqrt{n}}\)

5.4. Tests for the Mean (\(\mu\)) using the t-Statistic

This advanced test is used when we test the mean of a population where:

  • The population is assumed to be Normally distributed.
  • The sample size \(n\) is small (e.g., \(n < 30\)).
  • The population variance \(\sigma^2\) is unknown and must be estimated using the sample variance \(S^2\).

When these conditions hold, the sampling distribution of the mean follows a t-distribution, not the standard Normal distribution.

  • Test statistic: \(T = \frac{\bar{X} - \mu_0}{S/\sqrt{n}}\)
  • The t-distribution depends on the degrees of freedom, which is usually \(n-1\). You will need to use t-distribution tables (provided in your formula booklet) instead of Normal tables.

Quick Review: Hypothesis Testing Steps

  1. Hypotheses: State \(H_0\) (with \(=\)) and \(H_1\) (with \(<\), \(>\), or \(\neq\)).
  2. Level: Define the significance level, \(\alpha\).
  3. Distribution: Identify the appropriate distribution (Binomial, Poisson, or Normal/t).
  4. Test: Calculate the Test Statistic (or calculate the p-value).
  5. Decision: Compare the p-value to \(\alpha\), or compare the test statistic to the Critical Value.
  6. Conclusion: Write a clear concluding statement in the context of the problem (e.g., "There is sufficient evidence at the 5% level to reject the claim that the mean is 10.")

You've successfully mastered the foundations of statistical decision-making! Keep practicing those steps and you'll be hypothesis testing like a pro.