Welcome to Mathematical Models in Probability and Statistics!
Hello future statistician! This chapter is where probability meets real-world application. We move beyond simple probability rules and start using powerful mathematical tools—called models—to predict outcomes and understand uncertainty.
Don't worry if probability felt a bit abstract before. By the end of this unit, you will be able to choose the correct model to describe situations like the number of successful free throws in a basketball game or the chance of finding a faulty item in a large batch. This is essential knowledge for higher-level statistics!
Section 1: Discrete Random Variables (DRVs)
What is a Random Variable?
A Random Variable, usually denoted by a capital letter like \(X\), \(Y\), or \(R\), is a variable whose value is determined by the outcome of a random event.
For example, if you roll a die, the outcome is random, but we can assign numbers (1, 2, 3, 4, 5, 6) to these outcomes. \(X\) could be the score shown on the die.
A variable is discrete if it can only take specific, separated values, usually whole numbers. You can count the possible outcomes.
Example: The number of heads when flipping a coin three times (X can be 0, 1, 2, or 3).
Non-Example (Continuous): The height of a person (which can take any value within a range).
Probability Distribution
A Probability Distribution for a discrete random variable \(X\) is a complete list of all the possible values the variable can take, along with the probability associated with each value.
This is often presented in a table format:
Key Properties of a Probability Distribution:
- Every probability must be between 0 and 1: \(0 \le P(X=x) \le 1\).
- The sum of all probabilities must equal 1: \(\sum P(X=x) = 1\).
Common Mistake Alert! Always check that your probabilities sum exactly to 1. If they don't, you've missed an outcome or made a calculation error.
Quick Review: Discrete Random Variables
A DRV takes specific, countable values. Its distribution lists every value and its corresponding probability, and all probabilities must add up to one.
Section 2: Describing Discrete Distributions
Once we have a distribution, we need ways to summarize it. The two most important measures are the expected value (mean) and the variance (spread).
1. Expected Value \(E(X)\)
The Expected Value, \(E(X)\), is the long-term average outcome of the random variable. It is also known as the mean (\(\mu\)).
Analogy: If you played a game thousands of times, \(E(X)\) is the average amount you would expect to win (or lose) per game.
The formula is: \[E(X) = \mu = \sum x P(X=x)\]
In simple terms: Multiply each possible value by its probability, and then add them all together.
2. Variance and Standard Deviation
The Variance, \(Var(X)\), measures the spread or variability of the distribution—how far, on average, the outcomes are from the mean.
The standard formula (definition): \[Var(X) = E((X - \mu)^2) = \sum (x - \mu)^2 P(X=x)\]
The quicker calculation formula (often used in exams): \[Var(X) = E(X^2) - [E(X)]^2\]
Where \(E(X^2) = \sum x^2 P(X=x)\).
The Standard Deviation (\(\sigma\)) is simply the square root of the variance. It is easier to interpret because it is in the same units as \(X\). \[\sigma = \sqrt{Var(X)}\]
3. Linear Transformations
What happens if we transform the random variable? For constants \(a\) and \(b\), and a random variable \(X\):
Expectation of a Transformation: \[E(aX + b) = a E(X) + b\] The expected value is affected by both multiplication (\(a\)) and addition (\(b\)).
Variance of a Transformation: \[Var(aX + b) = a^2 Var(X)\] The variance is affected only by multiplication (\(a\)), and you must square the coefficient \(a\). Adding a constant (\(b\)) shifts the whole distribution but doesn't change its spread, so \(b\) disappears!
Key Takeaway for Calculations
Remember the flow: Calculate \(E(X)\) first. Then calculate \(E(X^2)\). Finally, use the simplified formula to find \(Var(X)\). Be careful to square the entire \(E(X)\) term in the variance calculation!
Section 3: Mathematical Models in Probability
What is a Probability Model?
A probability model is a theoretical probability distribution that we use to represent a real-world situation. We use these models because they save us from having to calculate every single probability from scratch every time.
Using a model requires us to make certain assumptions about the real-world scenario. If these assumptions are reasonable, the model is a good fit. If they are violated (meaning the assumptions are untrue), the model will give inaccurate results.
In this unit, the most important discrete model we study is the Binomial Distribution.
Section 4: The Binomial Distribution \(B(n, p)\)
The Binomial Distribution is a powerful model used when you have a fixed number of independent trials, and each trial has only two possible outcomes: success or failure.
Conditions for a Binomial Model (The BINS Check)
You can only use the Binomial distribution \(X \sim B(n, p)\) if four conditions are met. Use the mnemonic BINS to check them:
- Binary Outcomes: Each trial must have only two outcomes (Success or Failure).
- Independent Trials: The outcome of one trial does not affect the outcome of any other trial.
- Number of Trials is fixed: The number of trials, \(n\), must be decided beforehand.
- Same Probability: The probability of success, \(p\), must be constant for every trial.
Did you know? The Binomial distribution is frequently used in quality control (is an item defective or not?) and medical testing (does a patient recover or not?).
Notation
If \(X\) follows a Binomial distribution, we write: \[X \sim B(n, p)\] Where:
- \(n\) is the number of trials.
- \(p\) is the probability of success in a single trial.
The Binomial Probability Formula
The probability of getting exactly \(x\) successes in \(n\) trials is given by: \[P(X=x) = \binom{n}{x} p^x (1-p)^{n-x}\]
Let's break down this formula:
- \(\binom{n}{x}\) (read as "n choose x") is the number of ways you can arrange \(x\) successes among \(n\) trials. This is calculated as \(\frac{n!}{x!(n-x)!}\).
- \(p^x\) is the probability of getting \(x\) successes.
- \((1-p)^{n-x}\) is the probability of getting \(n-x\) failures. (The probability of failure is \(1-p\), often called \(q\)).
Step-by-Step Example Calculation:
If \(X \sim B(10, 0.3)\), find \(P(X=2)\). (10 trials, success prob is 0.3, we want 2 successes).
- Identify parameters: \(n=10\), \(x=2\), \(p=0.3\), \(1-p=0.7\).
- Calculate the combinations: \(\binom{10}{2} = 45\).
- Calculate the probability: \(P(X=2) = 45 \times (0.3)^2 \times (0.7)^{10-2}\)
- \(P(X=2) = 45 \times 0.09 \times (0.7)^8 \approx 0.2335\)
Using Binomial Tables and Calculators (Cumulative Probability)
For large values of \(n\), calculating probabilities using the formula is tedious. We often use statistical tables or the calculator's built-in functions.
Statistical tables usually provide Cumulative Probabilities: \[P(X \le x)\] This is the probability of getting \(x\) successes or fewer.
How to handle different inequalities using cumulative tables/functions:
- \(P(X < x)\) is the same as \(P(X \le x-1)\). (If you want less than 5, you want 4 or less).
- \(P(X \ge x) = 1 - P(X \le x-1)\). (The complement rule).
- \(P(X > x) = 1 - P(X \le x)\).
- \(P(a \le X \le b) = P(X \le b) - P(X \le a-1)\).
Don't worry if this seems tricky at first. Practice transforming the inequality into the form \(P(X \le k)\). This is a crucial skill!
Expectation and Variance of the Binomial Distribution
Unlike general DRVs where we had to sum up \(x P(X=x)\), the Binomial distribution has wonderfully simple formulas for its mean and variance, derived from its structure:
Expected Value (Mean): \[E(X) = np\]
Variance: \[Var(X) = np(1-p)\]
Example: If you toss a fair coin (p=0.5) 20 times (n=20), the expected number of heads is \(E(X) = 20 \times 0.5 = 10\).
Key Takeaway: The Binomial Model
The Binomial model \(B(n, p)\) is used when counting successes in a fixed number of independent trials. Always check the BINS conditions. Use \(E(X)=np\) and \(Var(X)=np(1-p)\) for quick calculations of the distribution's center and spread.
Chapter Summary
We started with Discrete Random Variables, learning how to calculate their Expectation (mean) and Variance (spread). We then applied these concepts to the first major mathematical model: the Binomial Distribution. Success in this chapter relies on being able to identify when the BINS conditions are met and accurately using tables or formulas for cumulative probabilities. Keep practicing those inequality conversions!