Welcome to Mathematical Models in Probability and Statistics!

Hello future statistician! This chapter might sound abstract, but it’s actually the foundation for almost everything we do in Statistics 1. We’re going to learn how to use Maths to predict and understand random events in the real world—from flipping a coin to forecasting the results of an election.

Don't worry if this seems tricky at first. A mathematical model is simply a structured way of dealing with uncertainty. We'll break down the concepts and use simple analogies to make sure you nail this important topic!

1. Understanding Mathematical Models

What is a Mathematical Model?

In statistics, the real world is complicated. People, weather, dice rolls—everything is complex and full of tiny variations. A mathematical model is a simplified description of a real-world system using mathematical concepts and language.

Analogy: Think of a city map. A map is a model of the real city. It leaves out unimportant details (like every individual tree or parked car) and focuses only on the essential information (roads, landmarks, train lines). Our statistical models do the same thing: they strip away complexity to focus on the probabilities.

Key Characteristics of Statistical Models
  • They require assumptions (e.g., assuming a coin is "fair").
  • They aim to predict outcomes based on probability.
  • They are often based on the idea of a random experiment—a repeatable process whose outcome is uncertain.

The Foundation: Random Variables

When we model an event, we need a way to represent the outcome numerically. This is done using a random variable.

A random variable, often denoted by a capital letter like \(X\), is a variable whose value is determined by the outcome of a random experiment.

We classify random variables into two main types based on the values they can take:

Discrete Random Variables (DRV)

These are variables that can only take on a countable number of distinct values.

  • Example: The number of heads when flipping a coin four times (\(X\) can be 0, 1, 2, 3, or 4).
  • Example: The number of defective items in a batch.
Continuous Random Variables (CRV)

These are variables that can take on any value within a given range (usually measured).

  • Example: The height of a student (e.g., 170.1 cm, 170.15 cm, 170.153 cm...).
  • Example: The time it takes for a bus to arrive.

Memory Aid: Discrete = Distinct (countable). Continuous = Can be anything in a range (measurable).

Quick Review: Modeling

A statistical model uses mathematics to simplify and predict outcomes of a random process. The outcomes are measured using a random variable, which is either discrete (countable) or continuous (measurable).

2. Probability: Theoretical vs. Experimental

To build models, we need to understand the two ways we calculate probability. Sometimes we calculate what *should* happen, and sometimes we calculate what *did* happen.

Theoretical Probability (The Ideal)

Theoretical probability (or classical probability) is based on logical reasoning and the assumption that all possible outcomes are equally likely. This is the probability derived directly from the mathematical model.

We calculate it using the formula:

$$P(A) = \frac{\text{Number of ways event A can occur}}{\text{Total number of equally likely outcomes}}$$

  • Example 1: Rolling a fair six-sided die. The theoretical probability of rolling a 4 is \(P(4) = 1/6\).
  • Example 2: Flipping a fair coin. The theoretical probability of getting heads is \(P(\text{Heads}) = 0.5\).

Key Point: Theoretical probability is what the model predicts under perfect conditions.

Experimental Probability (The Reality)

Experimental probability (also known as relative frequency) is based on actual data collected by conducting an experiment repeatedly. It tells us what actually happened during the trials.

We calculate it using the formula:

$$P(\text{Event}) = \frac{\text{Number of successful trials}}{\text{Total number of trials}}$$

  • Example: You roll a die 100 times. You get the number 4 exactly 18 times.
    The experimental probability of rolling a 4 is \(18/100 = 0.18\).
The Law of Large Numbers (An Important Connection)

The key connection between these two types of probability is known as the Law of Large Numbers.

The longer you run an experiment (i.e., the larger the number of trials), the closer the experimental probability will get to the theoretical probability.

Did you know? If you flipped a fair coin 10 times, you might get 7 heads (experimental probability of 0.7). But if you flipped it 10,000 times, you would expect the experimental probability to be extremely close to the theoretical probability of 0.5. The long run smooths out the randomness!

Avoid This Common Mistake!

Students sometimes confuse theoretical and experimental probability. Remember:

  • Theoretical = True/Ideal/Predicted (based on the math).
  • Experimental = Evidence/Experience/Observed (based on trials).

A question asking for the "relative frequency" is always asking for the experimental probability.

3. The Power and Limitations of Statistical Models

Mathematical models are incredibly useful tools, allowing us to make powerful predictions (e.g., insurance companies using models to calculate risk). However, they are not perfect copies of reality, and it is crucial to understand their limitations.

Assumptions: The Weakness of the Model

Every mathematical model in statistics relies on certain assumptions. If these assumptions are incorrect or are significantly violated in the real world, the model will break down, and the predictions will be inaccurate.

Example: Modeling a Die Roll

The mathematical model for a die roll assumes:

  1. The die is fair (each side has an equal chance of landing up).
  2. The rolls are independent (the result of one roll doesn't affect the next).

If the die is secretly weighted (not fair), the model \(P(4) = 1/6\) is completely useless.

Identifying Limitations in Models

When you are asked to critique or discuss the reliability of a model, you must always think about the assumptions:

  • Assumption of Independence: Are the events truly separate? (e.g., If we model the chance of rain tomorrow, that chance is highly dependent on whether it rained today.)
  • Assumption of Uniformity/Fairness: Is the object or sample truly unbiased? (e.g., Is the coin balanced? Was the sample selected randomly?)
  • Simplification: Has the model ignored important real-world factors? (e.g., A simple model of human reaction time ignores factors like tiredness, age, and caffeine intake.)

Key Takeaway: A model is only as good as its underlying assumptions. Statisticians must constantly check if the observed data contradicts the model’s assumptions.

When Is a Model Useful?

Despite their limitations, mathematical models are indispensable when:

  1. They provide a sufficiently accurate approximation of reality for the purpose needed (e.g., predicting global temperature trends).
  2. They allow us to simulate complex events quickly and affordably (e.g., running thousands of climate simulations on a computer).
  3. The underlying random variable and process fit the required distribution (a concept you will explore in later chapters like Binomial and Normal distributions).

Conversational Note: Think of a model as a simplified blueprint. It helps you build the house, but you still need to account for real-world details like crooked nails and uneven ground!

Chapter Summary: Mathematical Models

  • Model Definition: A mathematical description used to simplify and predict complex real-world phenomena.
  • Random Variable: The numerical outcome of a random experiment (either discrete or continuous).
  • Theoretical Probability: The ideal probability based on assumptions (\(1/6\) for a fair die).
  • Experimental Probability: Probability based on observed data (Relative Frequency).
  • Limitations: Models are limited by their simplifying assumptions. If assumptions are violated, the model fails.