Introduction: What are Averages and Range, and Why Do We Care?

Welcome to the chapter on Statistics! Don't worry if numbers sometimes feel overwhelming—the goal of this chapter is to take big, messy lists of data and summarize them using just a few simple numbers.

These summarizing numbers fall into two main categories:

1. Averages (Measures of Central Tendency): These tell you what a typical value looks like (e.g., "The average score on the test was 75%").
2. Range (Measures of Spread): These tell you how spread out the data is (e.g., "The scores ranged from 10% to 100%").

Mastering these concepts allows you to analyze and compare different data sets, which is a crucial skill in the real world!

Section 1: Measures of Central Tendency (The Averages)

When people talk about “average,” they usually mean the mean. However, in mathematics, there are three main types of averages: the Mean, the Median, and the Mode.

1.1 The Mode (The Most Popular)

The Mode is the easiest average to find. It is simply the value that occurs most frequently in a dataset.

Key Facts about the Mode:
  • It can be used for non-numerical data (like favorite colors or types of cars).
  • A data set can have no mode (if all values are unique) or two or more modes (bimodal, multimodal).

Example: In the list of shoe sizes: 7, 8, 8, 9, 10, 10, 10, 11.
The number 10 appears three times, more than any other size.
Mode = 10

1.2 The Median (The Middle Ground)

The Median is the middle value when the data is arranged in order of size. It is great because it is not affected by extreme values (outliers).

Step-by-Step: Finding the Median for Individual Data
  1. Order the Data: Arrange all values from smallest to largest. (If you forget this step, your answer will be wrong!)
  2. Find the Position: Use the formula for the position of the median:
    \[\text{Position} = \frac{n+1}{2}\] where \(n\) is the total number of values in the dataset.
  3. Find the Value: Use the position to count into the ordered list and find the actual median value.

Case A: Odd number of data points (n is odd)
Example: Scores: 5, 2, 8, 1, 4 (n=5)
1. Order: 1, 2, 4, 5, 8
2. Position: \(\frac{5+1}{2} = 3\).
3. Value: The 3rd value is 4.
Median = 4

Case B: Even number of data points (n is even)
Example: Scores: 10, 12, 16, 20 (n=4)
1. Order: 10, 12, 16, 20
2. Position: \(\frac{4+1}{2} = 2.5\). This means the median is halfway between the 2nd and 3rd values.
3. Value: Median = \(\frac{12+16}{2} = 14\).
Median = 14

Memory Tip: The median is like the median strip on a road—it's the thing right in the middle!

1.3 The Mean (The Standard Average)

The Mean is the most common average. You find it by adding up all the data values and then dividing by the total number of values.

The Formula for the Mean (Individual Data)

\[\text{Mean} (\bar{x}) = \frac{\text{Sum of all values}}{\text{Number of values}}\]

Using mathematical notation (which you should be familiar with): \[\bar{x} = \frac{\sum x}{n}\]

Where:
\(\sum x\) (pronounced 'sigma x') means "the sum of all the data values."
\(n\) is the total number of values.

Example: Temperatures recorded (in °C): 20, 25, 22, 21
1. Sum of values (\(\sum x\)): \(20 + 25 + 22 + 21 = 88\)
2. Number of values (\(n\)): 4
3. Mean: \(\frac{88}{4} = 22\)
Mean = 22 °C

1.4 Distinguishing the Purposes of Averages

Why do we have three averages? Because they tell us different things about the data!

Quick Review: When to Use Which Average

AveragePurpose/Best Used When...Sensitivity to Outliers
ModeYou need the most frequent/popular result (e.g., stocking sizes). Best for non-numerical data.None
MedianThe data contains outliers (extreme values). It gives a reliable centre point that ignores the extremes. (e.g., house prices).Low (Robust)
MeanThe data is symmetric, numerical, and you need to use all the data points in the calculation. (e.g., scientific measurements).High (Sensitive)

Did you know? If you were calculating the average income in a small town, and Bill Gates suddenly moved there, the Mean income would shoot up and no longer represent the typical person's wage. The Median income would be much more representative!


Key Takeaway for Section 1: Mean, Median, and Mode are all ways to describe the centre of the data, but the Mean is calculated, the Median is the position, and the Mode is the frequency.


Section 2: Measures of Spread (Range and Quartiles)

Averages tell you the middle, but they don't tell you how spread out the numbers are. To compare two datasets effectively (as required by the syllabus), you need a measure of spread.

2.1 The Range

The Range is the simplest measure of spread. It tells you the total distance between the highest and lowest values.

Formula for the Range

\[\text{Range} = \text{Maximum Value} - \text{Minimum Value}\]

Example: Scores: 10, 45, 50, 52, 98
Range = \(98 - 10 = 88\)

Common Mistake: Because the Range only uses two values (the max and min), it is highly affected by outliers. If the 98 was actually 150, the range would increase dramatically, even though the middle scores haven't changed.

2.2 Quartiles and the Interquartile Range (IQR)

To get a measure of spread that isn't affected by extreme outliers, we use Quartiles. Quartiles divide the ordered data into four equal quarters.

Understanding Quartiles
  • \(Q_1\) (Lower Quartile): The median of the lower half of the data. 25% of the data is below this value.
  • \(Q_2\) (Median): This is the overall median (50%).
  • \(Q_3\) (Upper Quartile): The median of the upper half of the data. 75% of the data is below this value.
The Interquartile Range (IQR)

The Interquartile Range (IQR) is the spread of the middle 50% of the data. It measures the distance between the lower and upper quartiles.

Formula for the Interquartile Range

\[\text{IQR} = Q_3 - Q_1\]

Finding the Position of Quartiles:
While there are slight variations in methods, for IGCSE, if you have \(n\) pieces of individual data, the simplest approach is:

  • \(Q_1\) Position: \(\frac{1}{4} (n+1)\)
  • \(Q_3\) Position: \(\frac{3}{4} (n+1)\)

Example (Finding IQR): Data: 10, 12, 15, 16, 18, 20, 25, 30, 35 (n=9)
(The data is already ordered.)

1. Find the Median (\(Q_2\)): Position \(\frac{9+1}{2} = 5\). Median = 18.

2. Find \(Q_1\): Position \(\frac{1}{4} (9+1) = 2.5\). This is halfway between the 2nd (12) and 3rd (15) values.
\[Q_1 = \frac{12+15}{2} = 13.5\]

3. Find \(Q_3\): Position \(\frac{3}{4} (9+1) = 7.5\). This is halfway between the 7th (25) and 8th (30) values.
\[Q_3 = \frac{25+30}{2} = 27.5\]

4. Calculate IQR:
\[\text{IQR} = Q_3 - Q_1 = 27.5 - 13.5 = 14\]

Analogy: Think of the IQR like a target's bullseye. It measures how tightly clustered the most typical half of the data is, ignoring the outer rings (outliers).


Key Takeaway for Section 2: Range measures the total spread but is weak against outliers. The Interquartile Range (IQR) measures the spread of the middle 50% and is much more reliable for comparing data sets.


Section 3: Averages with Frequency Tables

Often, data is presented in a frequency table, which shows how often each value appears. The methods for finding averages change slightly when dealing with frequency.

3.1 Mode and Median from Frequency Tables (Individual Data)

Finding the Mode

For a frequency table (where x is the value and f is the frequency), the mode is simply the value \(x\) with the highest frequency \(f\).

Finding the Median

When dealing with a frequency table, the total number of data points \(n\) is the total frequency: \(n = \sum f\).

1. Calculate the total frequency, \(n = \sum f\).
2. Find the position: \(\frac{n+1}{2}\).
3. Use the cumulative frequency (running total of \(f\)) to locate where this position falls. The value \(x\) corresponding to that position is the median.

3.2 Calculating the Mean from a Frequency Table (Individual Data)

If a score of 10 was achieved 5 times, instead of adding \(10 + 10 + 10 + 10 + 10\), we calculate \(10 \times 5 = 50\). The formula adapts to incorporate this multiplication.

Formula for the Mean (Frequency Table)

\[\bar{x} = \frac{\sum fx}{\sum f}\]

Step-by-Step Process:
1. Add a column to your table called \(fx\) (Value \(\times\) Frequency).
2. Calculate every entry in the \(fx\) column.
3. Sum the \(fx\) column (this is \(\sum fx\)).
4. Sum the frequency column (this is \(\sum f\)).
5. Divide the totals: \(\frac{\sum fx}{\sum f}\).

Example: Scores (x) and Frequency (f)

x (Score)f (Frequency)fx
133
2510
326
Totals\(\sum f = 10\)\(\sum fx = 19\)

Mean = \(\frac{19}{10} = 1.9\)

3.3 Estimating the Mean for Grouped Data (Advanced Concept)

If your data is presented in classes or groups (e.g., Age 10-20, 20-30), you don't know the exact value of each item. Therefore, you can only calculate an estimate of the mean.

Don't worry if this seems tricky at first; the underlying concept is the same as the frequency table mean, but with one critical extra step!

The Extra Step: Using the Midpoint

Since we don't know the exact values, we must assume that the values in each group are clustered around the centre of that group. We use the midpoint (m) of the class interval as the representative value (\(x\)).

\[\text{Midpoint} (m) = \frac{\text{Lower Boundary} + \text{Upper Boundary}}{2}\]

Formula for the Estimated Mean (Grouped Data)

\[\text{Estimated Mean} = \frac{\sum fm}{\sum f}\]

Step-by-Step Process:
1. Calculate the Midpoint (m) for every class interval.
2. Add a column called \(fm\) (Frequency \(\times\) Midpoint).
3. Calculate every entry in the \(fm\) column.
4. Sum the \(fm\) column (\(\sum fm\)).
5. Sum the frequency column (\(\sum f\)).
6. Divide the totals: \(\frac{\sum fm}{\sum f}\).

Example: Heights (cm) and Frequency (f)

Height Classfm (Midpoint)fm
150 < h \(\leq\) 1605155775
160 < h \(\leq\) 170101651650
170 < h \(\leq\) 1805175875
Totals\(\sum f = 20\)\(\sum fm = 3300\)

Estimated Mean = \(\frac{3300}{20} = 165\) cm.

Common Mistake to Avoid: When dealing with grouped data, you can only find the modal class (the group with the highest frequency), not the exact mode. You cannot find the exact range either, only the maximum possible range (Max upper boundary - Min lower boundary).


Key Takeaway for Section 3: When using frequency tables, multiply the value (or midpoint) by its frequency before summing. Always divide by the total frequency (\(\sum f\)).