Uses and abuses of statistics - Mathematics - HKDSE

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.

Chapter 18: Uses and Abuses of Statistics - Your Guide to Becoming a Data Detective!

Hey everyone! Ever seen a headline that says "90% of users love our new app!" or "This one food can make you live longer!"? Statistics are everywhere – in the news, in ads, on social media. They can be incredibly powerful for understanding the world, but they can also be used to mislead you. Don't worry, this chapter is your secret weapon! We'll learn how to read between the lines, spot suspicious claims, and understand how statistics *should* be used. Think of it as developing a math superpower to see the truth!

Part 1: The Big Picture - Population vs. Sample

Before we collect any data, we need to know who we're talking about. This is where two of our most important words come in: Population and Sample.

Imagine you want to know the favourite subject of all senior secondary students in Hong Kong.

The Population is the entire group you're interested in. In this case, it's ALL senior secondary students in Hong Kong. Asking every single one would be impossible!
A Sample is a small part of the population that you actually collect data from. You might survey 500 students from different schools. This is your sample.

Analogy Time: Tasting the Soup!

Think of the population as a giant pot of soup. You don't need to eat the whole pot to know if it's tasty. You just take a small spoonful – that's your sample. If the spoonful is a good mix of everything in the pot, it will give you a good idea of what the whole soup tastes like. The goal is to make sure our "spoonful" of data truly represents the whole "pot".

Quick Review Box

Population: The ENTIRE group of individuals we want to study.
Sample: A SUBSET of the population that we actually collect data from.
Why sample? It's cheaper, faster, and more practical than studying the entire population.

Key Takeaway for Part 1

We study a small sample to make conclusions about a large population. The most important thing is that the sample must be a good representation of the population, otherwise our conclusions will be wrong!

Part 2: How to Choose Your Sample - Sampling Techniques

So, how do we get a good "spoonful" of the population? The method we use to pick our sample is super important. We can split these methods into two main types.

Probability Sampling (The Fair Methods)

In probability sampling, every member of the population has a known chance of being selected. This is the best way to get an unbiased, representative sample.

1. Simple Random Sampling

What it is: Every person has an equal chance of being picked. It's like putting everyone's name into a giant hat and drawing names out randomly.
Example: To survey 50 students from a school of 1000, you could assign each student a number and use a random number generator to pick 50 numbers.

2. Stratified Sampling

What it is: First, you divide the population into important subgroups (called 'strata'). Then, you take a simple random sample from each subgroup. This guarantees you have representation from all important groups.
Example: You want to survey students about a new school policy. You know boys and girls might have different opinions. So you divide the student population into two strata: 'Boys' and 'Girls'. Then you randomly sample from each group, making sure the proportion in your sample matches the school's proportion (e.g., if the school is 60% girls, your sample should be 60% girls).

3. Systematic Sampling

What it is: You select a starting point at random, and then select every 'k-th' member of the population.
Example: To survey 100 people from a list of 1000, you could decide to pick every 10th person. You randomly choose a starting number between 1 and 10 (say, 7). You then pick person #7, #17, #27, #37, and so on.

Non-probability Sampling (The Easy, but Biased, Methods)

These methods are quicker and easier, but they often lead to biased results because not everyone has an equal chance of being chosen. Be very suspicious of studies using these methods!

1. Convenience Sampling

What it is: You survey people who are easy to reach. It's convenient for the researcher, but terrible for getting a representative sample.
Example: A researcher stands outside one MTR station at lunchtime and surveys the first 100 people who agree to talk. This sample will over-represent office workers and miss out on students, elderly people, and people from other districts.

2. Quota Sampling

What it is: A bit like stratified sampling, but not random. The researcher decides on subgroups and a quota for each (e.g., "I need 50 men and 50 women"). Then, they use convenience sampling to fill those quotas.
Example: A researcher needs to survey 20 university students. They go to a university campus and survey the first 20 students they find. This is still a convenience sample, even if they have a 'quota' to fill.

Did you know?

A famous mistake in sampling happened in the 1936 US presidential election. A magazine called the Literary Digest polled over two million people and predicted a landslide victory for one candidate. But they were completely wrong! Why? Their sample was taken from telephone directories and car registration lists. In 1936, only wealthier people had phones and cars, so their sample wasn't representative of the whole voting population. This is a classic example of sampling bias!

Key Takeaway for Part 2

How you choose your sample is crucial. Probability sampling methods (like random, stratified, systematic) are fair and give the best results. Be very critical of results that come from non-probability sampling (like convenience sampling) as they are often biased.

Part 3: Asking the Right Questions - Questionnaire Design

Okay, so you've got your sample. Now you have to ask them questions. But the way you word a question can totally change the answer! A good questionnaire asks clear, neutral questions. A bad one can trick people into giving a certain answer.

Common Traps to Avoid in Questionnaires:

1. Leading Questions: These questions suggest a "correct" answer.
Bad: "Don't you agree that the new, improved school lunch is much more delicious?"
Good: "How would you rate the quality of the new school lunch on a scale of 1 to 5?"

2. Vague or Ambiguous Questions: The words used are unclear.
Bad: "Do you exercise regularly?" (What does "regularly" mean? Once a day? Once a week? Once a month?)
Good: "How many days did you do at least 30 minutes of exercise last week?"

3. Double-Barrelled Questions: Asking two things in one question.
Bad: "Do you think the school should spend less money on books and more on sports facilities?" (What if you agree with one part but not the other?)
Good: Split it into two questions: "Do you think the school should spend less money on books?" and "Do you think the school should spend more on sports facilities?"

4. Inappropriate Options: The choices are confusing or don't cover all possibilities.
Bad: "How old are you? (a) under 20 (b) 20-30 (c) over 30" (What if you are exactly 20 or 30? The options overlap and are not mutually exclusive.)
Good: "Which age group do you belong to? (a) under 20 (b) 20-29 (c) 30 or over"

5. Question Order: The order of questions can influence later answers.
Example: If you first ask "How happy are you with your life?" and then "How often do you go on dates?", the answers might be different than if you asked in the reverse order.

Key Takeaway for Part 3

The wording of a question matters! When you see a survey result, try to find out the exact questions that were asked. Watch out for leading, vague, or tricky questions designed to produce a certain result.

Part 4: Spotting the Lies - The Abuses of Statistics

This is where we put on our detective hats! People can misuse statistics at every stage: in how they collect data, how they present it, and how they interpret it.

Abuse 1: Misleading Data Collection

This goes back to our first two sections. If someone uses a biased sampling method (like convenience sampling) or a poorly designed questionnaire, their data is fundamentally flawed. It's like building a house on a shaky foundation – it doesn't matter how nice it looks, it's not reliable.

Red Flag: A headline says "85% of people prefer Brand X coffee!" but the "survey" was conducted by giving free samples outside a Brand X store. (This is a biased sample!)

Abuse 2: Misleading Graphs and Charts

A picture can paint a thousand words, but it can also tell a thousand lies. It's easy to manipulate a graph to make differences look bigger or smaller than they really are.

Common Graph Tricks:

The Truncated Y-Axis: This is the most common trick! The vertical axis (y-axis) doesn't start at zero. This makes small differences look like massive changes.
Inconsistent Scale: The numbers on an axis don't go up by a consistent amount (e.g., it goes 0, 10, 20, 100, 200), which distorts the graph.
Misleading Pictograms: Using pictures where both the height and width are scaled up. This makes the area of the picture increase exponentially, exaggerating the difference.
Confusing 3D Charts: 3D effects can make it hard to read the actual values and can make the parts closer to the viewer look bigger than they are.

Abuse 3: Misleading Interpretation

Even with good data and good graphs, the conclusions drawn can be wrong.

Common Interpretation Tricks:

Using the "Wrong" Average: Remember mean, median, and mode? A company might say its "average" salary is very high by using the mean, which is pulled up by a few millionaire executives. The median (the middle value) would give a much more honest picture of what a typical employee earns.
Correlation is NOT Causation: This is a huge one! Just because two things happen at the same time doesn't mean one causes the other.
Classic Example: Ice cream sales and the number of shark attacks are correlated (they both go up in the summer). Does eating ice cream cause shark attacks? Of course not! The real cause is the hot weather (the 'lurking variable') which makes more people swim AND eat ice cream.
Cherry-Picking Data: Only presenting the data that supports your argument while ignoring the data that doesn't.
Small Sample Size: Results from a very small sample (e.g., "3 out of 4 people agree") are not reliable and could easily be due to random chance.

Key Takeaway for Part 4

Be a critical observer! Always question the source of the data, check graph axes, and think carefully about whether the conclusion is truly supported by the evidence. Don't let flashy numbers or charts fool you.

Part 5: Your Statistics Detective Toolkit

Congratulations, you've learned the secrets! Now, whenever you encounter a statistic in the real world, you can use this simple checklist to assess it like a pro.

Ask Yourself These Questions:

1. Who paid for this study and who conducted it?
(Do they have a reason to want a particular outcome?)

2. What was the sample size and how was the sample chosen?
(Was it large enough? Was it a random sample or a biased convenience sample?)

3. What were the exact questions asked?
(Were they leading, vague, or tricky?)

4. How is the data being presented?
(Is the graph's y-axis starting at 0? Is the scale consistent?)

5. Is the conclusion logical?
(Are they confusing correlation with causation? Are they using the most appropriate 'average'?)

By learning about the uses and abuses of statistics, you're not just learning math – you're learning how to be a smarter, more critical thinker in your everyday life. Now go out there and be a data detective!