Welcome to the World of Statistics!

Hey there! Ready to become a data detective? That's what statistics is all about! It's the cool part of maths that helps us collect, organise, and understand information (which we call data) from the world around us. Why is it important? Because it helps us answer questions like:

- What is the most popular video game in our class?
- How much has my height changed over the past year?
- Is our school's basketball team getting better?

In this chapter, we'll learn how to make sense of all this information. Don't worry if it sounds complicated; we'll break it down into easy steps. Let's get started!


Part 1: Getting Our Data in Order

Imagine you've just asked everyone in your class for their shoe size. Now you have a messy list of numbers! Our first job is to organise this data so we can actually understand it.

Two Types of Data: Discrete vs. Continuous

Before we organise, we need to know what kind of data we're dealing with. There are two main types:

1. Discrete Data
This is data that can be counted. It has specific, separate values. Think of things you can't have a "half" of.
Examples:

  • The number of students in a classroom (you can't have 25.5 students).
  • Your shoe size (it's a size 7 or 7.5, not 7.23).
  • The number of goals scored in a football match.

2. Continuous Data
This is data that can be measured. It can take on any value within a certain range.
Examples:

  • The height of a person (you could be 150 cm, 150.1 cm, 150.11 cm...).
  • The time it takes to run 100 metres.
  • The weight of your backpack.
Quick Review Box

Discrete: If you can COUNT it, it's discrete.
Continuous: If you can MEASURE it, it's continuous.

Organising Data: Frequency Distribution Tables

A frequency distribution table is a super neat way to organise data. "Frequency" is just a fancy word for how many times something happens.

For Ungrouped Data (usually discrete data with a small range)

Let's say these are the scores of 15 students in a quiz (out of 10):
7, 8, 9, 6, 8, 7, 9, 10, 8, 7, 6, 8, 9, 8, 7

Step 1: List the possible scores in one column.
Step 2: Go through the data and make a tally mark ( | ) for each score.
Step 3: Count the tally marks to find the frequency.

Example Table: Quiz Scores

Score      Tally      Frequency
6            ||               2
7            ||||              4
8            |||||             5
9            |||              3
10           |                1
Total                         15

See? So much easier to read! We can quickly see that the most common score was 8.

For Grouped Data (usually continuous data or data with a large range)

What if you had the heights of 20 students? Listing each one would be too long. So, we group them into class intervals.

Example Heights (in cm): 155, 168, 172, 158, 163, 175, 151, 160, 165, 178, 153, 166, 170, 159, 161, 169, 174, 156, 162, 167

We can group them like this:

Example Table: Student Heights

Height (cm) (Class Interval)      Frequency
150 - 159                                                 6
160 - 169                                                 9
170 - 179                                                 5
Total                                                           20

Key Takeaways for Part 1

- Data is just information.
- Discrete data is counted (e.g., number of pets).
- Continuous data is measured (e.g., height).
- Frequency tables help us organise data to see patterns easily.


Part 2: Drawing Pictures with Data

They say a picture is worth a thousand words. In statistics, a chart is worth a thousand numbers! Let's learn how to draw our data.

Stem-and-Leaf Diagrams

This is a clever way to show the exact values in a set of data while also organising them. Think of a tree's stem (the first digit or digits) and its leaves (the last digit).

Example Data: Test scores - 78, 93, 85, 76, 81, 88, 95, 76

Step 1: The "stems" will be the tens digits (7, 8, 9).
Step 2: The "leaves" will be the units digits. Write them next to their stem.
Step 3: Always put the leaves in order from smallest to largest and add a key.

Test Scores Stem-and-Leaf Diagram

Stem | Leaf
  7     | 6, 6, 8
  8     | 1, 5, 8
  9     | 3, 5

Key: 7 | 6 means 76

This diagram shows us the spread of scores and every single original score!

Histograms: The Bar Chart's Cousin

A histogram looks like a bar chart, but it's used for continuous data that has been grouped. There are two big differences:

  1. The bars are drawn touching each other (no gaps!).
  2. The horizontal axis (x-axis) is a continuous scale, marked with class boundaries.

What are class boundaries? For a class interval like 150-159, the next one is 160-169. The boundary is halfway between 159 and 160, which is 159.5. So the boundaries are 149.5, 159.5, 169.5, etc. This closes the gaps!

Don't Get Confused!

Bar Chart: Has gaps. Used for discrete data (e.g., favourite colours).
Histogram: No gaps. Used for continuous, grouped data (e.g., height, weight).

Frequency Polygons and Curves

A frequency polygon is another way to show grouped data. It's basically a line graph.

How to make one:

  1. Find the middle of each class interval. This is the class mark. (For 160-169, the class mark is $$ \frac{160+169}{2} = 164.5 $$).
  2. Place a dot at the class mark for each group, at the height of its frequency.
  3. Connect the dots with straight lines!

A frequency curve is just a smoothed-out version of a frequency polygon, drawn with a free hand.

Cumulative Frequency Polygons and Curves

This sounds tricky, but the word "cumulative" just means "add up as you go."

Step 1: Create a Cumulative Frequency Table. Just keep adding the frequencies.

Example Table: Student Heights

Height (cm)      Frequency      Cumulative Frequency
150 - 159                   6                                 6
160 - 169                   9                                 6 + 9 = 15
170 - 179                   5                                 15 + 5 = 20

Step 2: Plot the graph. You plot the cumulative frequency against the UPPER class boundary. (e.g., Plot a point at (159.5, 6), then (169.5, 15), etc.). This creates a characteristic 'S' shaped curve.

Using the Curve to Find Treasures!

This curve is really useful for finding estimates:

  • Median (Q2): The middle value. Find the 50% mark on the vertical axis (for 20 students, this is the 10th student), draw a line across to the curve, and then down to the horizontal axis to read the median height.
  • Lower Quartile (Q1): The 25% mark (for 20 students, the 5th student).
  • Upper Quartile (Q3): The 75% mark (for 20 students, the 15th student).

Be a Data Detective: Uses and Abuses of Charts

Charts are powerful, but they can be used to trick you! Always look for:

  • Broken Axis: Does the vertical axis start at 0? If not, it can make differences look much bigger than they are.
  • Uneven Scales: Are the numbers on the axis spaced out evenly?
  • Misleading Pictures: Using pictures instead of bars can be misleading if the area of the picture grows, not just the height.
Key Takeaways for Part 2

- We use different charts for different types of data.
- Stem-and-leaf diagrams show all data values neatly.
- Histograms are for grouped continuous data and have no gaps.
- Cumulative Frequency Curves help us estimate the median and quartiles.
- Always look at charts carefully to make sure they aren't misleading!


Part 3: Finding the "Centre" of Your Data

Often, we want to describe a whole set of data with just a single, typical number. This is called a measure of central tendency. Let's learn the three main ones.

The Mean (The Average)

This is the one you probably already know! It's the most common type of "average".

How to find it: Add up all the values and divide by how many values there are.
Example: For scores 6, 7, 8, 9, 10
$$ \text{Mean} = \frac{6+7+8+9+10}{5} = \frac{40}{5} = 8 $$

Strength: Uses every single piece of data.
Weakness: Can be misleading if there is an extremely high or low value (an outlier). Imagine calculating the average pocket money in a group where one person gets $1000! It would make the average seem really high for everyone else.

The Median (The Middle Value)

The median is the value right in the middle after you've put all the data in order.

Memory Aid: The median is in the middle of the road.

How to find it:

  1. Put the data in order from smallest to largest.
  2. Find the middle number.

Example 1 (Odd number of values): 6, 7, 8, 9, 10. The median is 8.

Example 2 (Even number of values): 6, 7, 8, 9, 10, 11. The middle is between 8 and 9. So we find the mean of these two: $$ \frac{8+9}{2} = 8.5 $$. The median is 8.5.

Strength: Not affected by outliers! This makes it great for things like house prices or salaries.
Weakness: It doesn't use all the data values in the calculation.

The Mode (The Most Popular)

The mode is the value that appears most often.

Memory Aid: Mode = Most often.

Example: In the data 7, 8, 9, 6, 8, 7, 9, 10, 8, 7, 6, 8, 9, 8, 7, the number 8 appears 5 times, more than any other number. So, the mode is 8.

For grouped data, we find the modal class, which is the class interval with the highest frequency.

Strength: Easy to find and can be used for non-numerical data (e.g., the mode of favourite colours could be "Blue").
Weakness: Sometimes a set of data can have no mode, or more than one mode.

The Weighted Mean: When Some Data is More Important

Sometimes, not all data is equal. Think about your school grades: an exam is usually worth more than a quiz. This is where the weighted mean comes in.

Example: Your final mark is based on Homework (worth 20%) and a Final Exam (worth 80%). You score 90 on homework and 75 on the exam.
Normal Mean: $$ \frac{90+75}{2} = 82.5 $$ (This is wrong!)
Weighted Mean: $$ (90 \times 0.20) + (75 \times 0.80) = 18 + 60 = 78 $$
Your final mark is 78. This is a more accurate reflection because it considers the "weight" of each part.

What Happens When We Change All The Data?

This is a handy shortcut! What happens to the mean, median, and mode if we do the same thing to every piece of data?

  • If you add a constant: If you add 5 to every student's test score, the mean, median, and mode will also increase by 5.
  • If you multiply by a constant: If you double every student's score, the mean, median, and mode will also double.
Key Takeaways for Part 3

- The Mean is the sum divided by the count (sensitive to outliers).
- The Median is the middle value when data is in order (not sensitive to outliers).
- The Mode is the most frequent value.
- Choose the best one based on your data: use the median if there are strong outliers!


You've Mastered the Basics of Statistics!

Great job! You now know how to collect, organise, draw, and interpret data. You can find the mean, median, and mode to describe what's typical, and you know how to choose the right tools for the job. This is a super useful skill, not just in maths class, but in everyday life. Keep practising, and you'll be a data whiz in no time!