Welcome to Non-parametric Tests!

Hello future statistician! This chapter might feel like a breath of fresh air after the strict assumptions of \(t\)-tests and Normal distribution inferences. We are diving into **Non-parametric tests** (sometimes called distribution-free tests).

What will we learn? We will learn how to test hypotheses about population characteristics (like the median) when we cannot assume the underlying data follows a nice, clean distribution like the Normal distribution.

Why is this important? In the real world, data often breaks the rules! Non-parametric tests give us powerful tools to analyze skewed or non-normally distributed data, ensuring our conclusions are robust.

1. Understanding Non-parametric Tests

What makes a test "Non-parametric"?

In the previous statistics modules, you used tests (like the \(t\)-test) that are **parametric**. These tests make specific assumptions about the population from which the sample is drawn. The biggest assumption is often that the population is Normally distributed.

Key Differences: Parametric vs. Non-parametric
  • Parametric Tests (e.g., \(t\)-test):

    Assume data comes from a specific distribution (usually Normal).
    Test hypotheses about parameters (like the population mean, \(\mu\)).
    They are generally more powerful when assumptions hold.

  • Non-parametric Tests (e.g., Sign Test, Wilcoxon):

    Make no assumption about the population distribution (or only very general ones, like symmetry).
    Test hypotheses about medians or distributions.
    Used when sample size is small, or the data is heavily skewed/non-normal.

Analogy: Imagine choosing clothes for an event. Parametric tests are like tailored suits—they look great if the fit is perfect (Normal distribution), but if the fit is wrong, they look terrible. Non-parametric tests are like stretchy casual wear—they adapt well, regardless of the shape of the data, making them useful when the distribution is unknown or non-normal.

Quick Takeaway: Use non-parametric tests when you cannot assume normality. They usually focus on the median rather than the mean.

2. The Sign Test

The Sign Test is the simplest non-parametric test. It only looks at the direction (sign) of the data relative to a hypothesized value, or the direction of the difference between paired observations. It ignores the magnitude of the difference entirely.

2.1 Single-Sample Sign Test (Testing a Population Median)

We use this test to check if the population median \(M\) is equal to a hypothesized value \(M_0\).

Hypotheses usually look like: \(H_0: M = M_0\) vs. \(H_1: M \neq M_0\) (or one-tailed equivalents).

Step-by-Step Procedure:
  1. Determine the differences: Calculate the difference between each data point \(x_i\) and the hypothesized median \(M_0\): \(d_i = x_i - M_0\).
  2. Assign Signs: Assign a sign (+ or -) to each difference.
  3. Handle Zeros: Observations exactly equal to \(M_0\) (i.e., difference is zero) are ignored. The sample size \(n\) is reduced accordingly.
  4. Calculate the Test Statistic: The test statistic is usually the count of the less frequent sign (the number of positive signs or the number of negative signs, whichever is smaller). Let \(k\) be this count.
  5. Find the P-value: Under \(H_0\), the number of positive signs follows a Binomial distribution \(B(n, 0.5)\). We calculate the probability of observing \(k\) or fewer (for a one-tailed test) or \(2 \times P(\text{observing } k \text{ or fewer})\) (for a two-tailed test).

Example: If we expect the median study time to be 10 hours (\(M_0 = 10\)), and we find 12 students studied more (positive sign) and 3 studied less (negative sign), the test statistic \(k\) is 3. We then look up the probability of getting 3 or fewer negative signs in \(B(15, 0.5)\).

Memory Aid: The Sign Test is just a coin flip! If the null hypothesis is true, a data point is equally likely to be above or below \(M_0\). So, the probability \(p=0.5\).

2.2 Paired-Sample Sign Test

Used for matched-pairs data (e.g., comparing scores before and after an intervention). We are testing the hypothesis that the two populations (before and after) are identical (which implies the median difference is zero).

The procedure is identical to the single-sample test, except:

  1. The differences \(d_i\) are calculated between the paired observations (e.g., \(d_i = \text{Score}_A - \text{Score}_B\)).
  2. We then count the number of positive and negative differences (ignoring zeros) and proceed with the Binomial test based on \(p=0.5\).

Key Takeaway: The Sign Test is simple, relying on the Binomial distribution \(B(n, 0.5)\). It is effective but ignores valuable information about how large the differences are.

3. Wilcoxon Tests (Rank Tests)

Wilcoxon tests are more powerful than the Sign Test because they incorporate the magnitude of the differences, not just the sign. They do this by assigning ranks.

IMPORTANT SYLLABUS NOTE: Wilcoxon tests are only valid if the underlying population distribution is assumed to be symmetrical. If the data is known to be non-symmetrical (e.g., highly skewed), you should stick to the Sign Test.

3.1 Wilcoxon Signed-Rank Test

This test is the non-parametric equivalent of the single-sample or paired-sample \(t\)-test. It is used to test a hypothesis about the population median \(M\).

Step-by-Step Procedure (Paired Sample/Testing \(M_0\)):
  1. Calculate Differences: Find the differences \(d_i\) (either \(x_i - M_0\) or \(A_i - B_i\)).
  2. Ignore Zeros: Discard any differences equal to zero. Reduce \(n\) accordingly.
  3. Rank Absolute Differences: Rank the absolute values of the remaining differences, \(|d_i|\). (Note: The syllabus states that questions will not involve tied ranks, simplifying this step.)
  4. Assign Signs to Ranks: Give each rank the sign of its original difference \(d_i\).
  5. Calculate Rank Sums:
    • \(P\): Sum of the ranks corresponding to the positive differences.
    • \(Q\): Sum of the ranks corresponding to the negative differences.
    (Check: \(P + Q = \sum_{i=1}^{n} i = \frac{n(n+1)}{2}\), where \(n\) is the number of non-zero differences.)
  6. Determine the Test Statistic (\(T\)):

    The test statistic \(T\) is the smaller of \(P\) and \(Q\).

    \[ T = \min(P, Q) \]

  7. Consult Critical Values: Use the critical values table for the Wilcoxon Signed-Rank Test (MF19). If the calculated \(T\) is less than or equal to the critical value, we reject \(H_0\).

Why the smaller sum? If \(H_0\) is true, the sum of positive ranks and the sum of negative ranks should be roughly equal. If one is very small (meaning the other is very large), it indicates a strong shift away from the null median.

Accessibility Tip: Don't worry about memorizing the critical values. You will be given the table (MF19) in the exam. Just remember that smaller \(T\) means stronger evidence against \(H_0\) in this test.

3.2 Wilcoxon Rank-Sum Test (Mann-Whitney U Test)

This test is used for comparing two independent populations to see if they are identical (i.e., if their distributions are the same). It is the non-parametric alternative to the 2-sample \(t\)-test.

Let \(m\) be the size of the smaller sample and \(n\) be the size of the larger sample, with \(m \le n\).

Step-by-Step Procedure:
  1. Combine and Rank: Combine all data points from both samples into one set. Rank all \(m+n\) observations from smallest (rank 1) to largest.
  2. Calculate \(R_m\): Find the sum of the ranks belonging only to the smaller sample (size \(m\)). This sum is called \(R_m\).
  3. Determine the Test Statistic (\(W\)): We need to calculate a comparison value.
    • \(R_m\) (Sum of ranks of the smaller sample)
    • \(m(n+m+1) - R_m\) (The comparison sum)
    The test statistic \(W\) is the smaller of these two values.
    \[ W = \min\left( R_m, \ m(n+m+1) - R_m \right) \]
  4. Consult Critical Values: Use the critical values table for the Wilcoxon Rank-Sum Test (MF19), using the relevant values of \(m\) and \(n\). If the calculated \(W\) is less than or equal to the critical value, we reject \(H_0\) (identity of populations).

Did you know? The Wilcoxon Rank-Sum Test is mathematically equivalent to the Mann-Whitney U Test, although they use slightly different formulas for their test statistics.

Key Takeaway: Wilcoxon tests rely on ranking the data. The Signed-Rank test (one sample/paired) compares ranks to a median/zero difference, while the Rank-Sum test (two independent samples) compares the sum of ranks between two groups.

4. Normal Approximations for Large Samples

Just like with the Binomial and Poisson distributions, when the sample sizes become large, calculating critical values or probabilities using tables becomes tedious (and tables only go up to certain limits, e.g., \(n=20\)).

When \(n\) (or \(m\) and \(n\)) is large, we can use the Normal distribution to approximate the test statistic distribution. This relies on the Central Limit Theorem.

4.1 Normal Approximation for Wilcoxon Signed-Rank Test \(T\)

Used when the sample size \(n\) (number of non-zero differences) is large.

The test statistic \(P\) (or \(Q\)) is approximately Normally distributed with:

Mean: \(E(P) = \mu_P = \frac{1}{4}n(n+1)\)
Variance: \(Var(P) = \sigma^2_P = \frac{1}{24}n(n+1)(2n+1)\)

Since \(T\) is the smaller of \(P\) and \(Q\), we calculate the probability of the difference sum being less than or equal to the observed value \(T\).

4.2 Normal Approximation for Wilcoxon Rank-Sum Test \(R_m\)

Used when both sample sizes \(m\) and \(n\) are large. We typically look at the distribution of the rank sum of the smaller sample, \(R_m\).

The test statistic \(R_m\) is approximately Normally distributed with:

Mean: \(E(R_m) = \mu_{R_m} = \frac{1}{2}m(m+n+1)\)
Variance: \(Var(R_m) = \sigma^2_{R_m} = \frac{1}{12}mn(m+n+1)\)

In both cases, once we have the mean and variance, we can calculate the standardized test statistic \(Z\):

\[ Z = \frac{T - \mu_T}{\sigma_T} \quad \text{or} \quad Z = \frac{R_m - \mu_{R_m}}{\sigma_{R_m}} \]

(Note: You may need to apply a continuity correction when using the Normal approximation, though this is often small and depends on the specific question context.)

Common Mistake Alert!

Students often confuse when to use which Wilcoxon test:

  • Signed-Rank Test (\(T\)): Used for one sample or paired data. It uses differences.
  • Rank-Sum Test (\(W\)): Used for two independent samples. It uses combined ranks.

Key Takeaway: For large samples, non-parametric tests can be approximated using the Normal distribution, using specific formulas for the mean and variance derived from the ranks.