Welcome to Unit S2: The Binomial and Poisson Distributions!

Hello future statistician! This chapter is incredibly important because it moves beyond just describing data and allows us to model and predict the probability of events happening in the real world. Don't worry if probability felt abstract before—we're going to break down these two key distributions, the Binomial and the Poisson, into easy, manageable steps. By the end, you’ll be able to tell which model to use and how to calculate precise probabilities for counting outcomes!

Let's dive in!


Section 1: Quick Review of Discrete Random Variables

Before jumping into the specific models, remember that both the Binomial and Poisson distributions deal with Discrete Random Variables, \(X\).

  • A Discrete Random Variable is a variable whose possible values are countable (usually integers).
  • It represents counts, not measurements (like height or weight).
  • Example: The number of heads when flipping a coin 10 times, the number of emails received in an hour.

Section 2: The Binomial Distribution \(B(n, p)\)

The Binomial distribution helps us calculate probabilities when we have a fixed number of independent trials, and each trial has only two possible outcomes.

4.1. Conditions for the Binomial Distribution (BINS)

A random variable \(X\) must meet four strict conditions to be modelled by a Binomial distribution. Use the mnemonic BINS to remember them:

  1. Binary outcomes: Each trial must result in either "Success" or "Failure".
  2. Independent trials: The result of one trial does not affect the outcome of any other trial.
  3. Number of trials is fixed: We must know the number of trials, \(n\), beforehand.
  4. Same probability: The probability of success, \(p\), must be constant for every trial.

Analogy: Think about shooting basketball free throws. \(n\) is the number of shots you take (fixed). \(p\) is your success rate (constant). Each shot is independent, and the outcome is either 'Success' (in) or 'Failure' (out).

4.2. Notation and Formula

If \(X\) follows a Binomial distribution, we write the notation:

\(X \sim B(n, p)\)

Where:

  • \(n\) is the number of trials.
  • \(p\) is the probability of success in a single trial.

The probability of getting exactly \(x\) successes in \(n\) trials is given by the formula:

\[ P(X=x) = \binom{n}{x} p^x (1-p)^{n-x} \]

Where:

  • \(\binom{n}{x}\) (read as "n choose x") is the number of ways to arrange \(x\) successes in \(n\) trials.
  • \(1-p\) is often denoted as \(q\), the probability of failure.
Step-by-Step Probability Calculation

Example: A biased coin lands on heads with probability 0.6. If it is flipped 5 times, what is the probability of getting exactly 3 heads?

Here, \(n=5\), \(p=0.6\), and we want \(x=3\). \((1-p) = 0.4\).

  1. Find the number of arrangements: \(\binom{5}{3} = 10\).
  2. Find the probability of 3 successes: \((0.6)^3 = 0.216\).
  3. Find the probability of 2 failures: \((0.4)^{5-3} = (0.4)^2 = 0.16\).
  4. Multiply them together: \(P(X=3) = 10 \times 0.216 \times 0.16 = 0.3456\).

4.3. Mean, Variance, and Standard Deviation

Calculating the expected number of successes, \(E(X)\), and the variance, \(Var(X)\), is very straightforward for a Binomial distribution. You do not need to use the general formulas for discrete variables.

Mean (Expectation):

\[ E(X) = np \]

Variance:

\[ Var(X) = np(1-p) \]

Standard Deviation:

\[ SD(X) = \sqrt{np(1-p)} \]

Memory Aid: The expected value is just what you'd intuitively think: the number of trials multiplied by the chance of success. The variance is that expected value, multiplied by the chance of failure (\(1-p\)).

Quick Review: Binomial
  • Conditions: BINS (Binary, Independent, Fixed \(N\), Same \(P\)).
  • Notation: \(X \sim B(n, p)\).
  • Key Formulas: \(E(X) = np\) and \(Var(X) = np(1-p)\).

Section 3: The Poisson Distribution \(Po(\lambda)\)

The Poisson distribution is used to model the number of times an event occurs in a fixed interval of time or space. Unlike the Binomial, there is no fixed upper limit (\(n\)) for the count.

5.1. Conditions for the Poisson Distribution

A random variable \(X\) must satisfy the following conditions:

  1. Events occur singly (one at a time, not simultaneously).
  2. Events occur at a constant average rate.
  3. Events occur independently of each other and of the time since the last event.
  4. Events occur randomly in time or space.

Analogy: Think about counting phone calls arriving at a help desk between 9 AM and 10 AM. You know the average rate (\(\lambda\), say 5 calls per hour), but the total number of calls could be 0, 5, 10, or even 100!

5.2. Notation and Formula

If \(X\) follows a Poisson distribution, we write the notation:

\(X \sim Po(\lambda)\)

Where \(\lambda\) (lambda) is the average rate of occurrence (the mean number of events in the given interval).

The probability of getting exactly \(x\) occurrences is given by the formula:

\[ P(X=x) = \frac{e^{-\lambda} \lambda^x}{x!} \]

Where:

  • \(e\) is the base of the natural logarithm (\(e \approx 2.71828\)).
  • \(x!\) is \(x\) factorial (\(x \times (x-1) \times \dots \times 1\)).

5.3. Scaling \(\lambda\) (The Most Common Mistake!)

Attention! \(\lambda\) is tied to the interval specified in the problem. If the time or space interval changes, you MUST adjust \(\lambda\).

Example: If the average number of texts received per hour is \(\lambda = 6\), then:

  • For a 30-minute interval (half an hour), the new rate \(\lambda_{new} = 6 \times 0.5 = 3\).
  • For a 2-hour interval, the new rate \(\lambda_{new} = 6 \times 2 = 12\).

Always check that the time interval for \(\lambda\) matches the interval required for the probability calculation.

5.4. Mean and Variance

One of the most defining characteristics of the Poisson distribution is the relationship between its mean and variance.

Mean (Expectation):

\[ E(X) = \lambda \]

Variance:

\[ Var(X) = \lambda \]

Did you know? Because \(E(X) = Var(X) = \lambda\), if you are testing data to see if it fits a Poisson model, one of the first checks is whether the sample mean is approximately equal to the sample variance.

Quick Review: Poisson
  • Conditions: Events are independent, random, occur singly, and at a constant rate.
  • Notation: \(X \sim Po(\lambda)\).
  • Key Formulas: \(E(X) = \lambda\) and \(Var(X) = \lambda\).
  • Crucial Step: Scale \(\lambda\) if the time/space interval changes.

Section 4: The Poisson Approximation to the Binomial

Sometimes, we encounter Binomial situations where the numbers are so large that calculating the probability using the Binomial formula becomes extremely difficult or time-consuming (especially finding \(\binom{n}{x}\)).

Fortunately, under specific conditions, the Poisson distribution provides an excellent, much simpler approximation for the Binomial.

6.1. When to Use the Approximation

We can approximate a Binomial distribution \(B(n, p)\) with a Poisson distribution \(Po(\lambda)\) when the following two conditions are met:

  1. \(n\) is large (The number of trials is large, usually \(n > 50\)).
  2. \(p\) is small (The probability of success is small, usually \(p < 0.1\)).

Think of it this way: You have millions of lottery tickets (large \(n\)), but the chance of winning is tiny (small \(p\)). This scenario fits the random, rare occurrence pattern of the Poisson model.

6.2. The Approximation Rule

If the conditions are met, we set the mean of the Binomial equal to the mean of the Poisson:

\[ \text{Set } \lambda = np \]

Thus, the approximation is:

\[ B(n, p) \approx Po(np) \]

We then use the Poisson formula (or tables) with \(\lambda = np\) to calculate probabilities.

Example of Approximation

A factory produces items with a defect rate of 0.005. If a batch of 1000 items is checked, find the probability of exactly 4 defective items.

  1. Check Conditions: \(n=1000\) (large), \(p=0.005\) (small). Approximation is valid.
  2. Calculate \(\lambda\): \(\lambda = np = 1000 \times 0.005 = 5\).
  3. State Approximation: \(X \sim Po(5)\).
  4. Calculate \(P(X=4)\): Using the Poisson formula with \(\lambda=5\) and \(x=4\): \[ P(X=4) = \frac{e^{-5} 5^4}{4!} \]

    This is much easier than calculating \(\binom{1000}{4} (0.005)^4 (0.995)^{996}\).

6.3. Common Pitfalls and Tips

Be careful when using cumulative tables!

In both distributions, tables often give cumulative probabilities \(P(X \le x)\). Remember these rules:

  • \(P(X=x) = P(X \le x) - P(X \le x-1)\)
  • \(P(X > x) = 1 - P(X \le x)\)
  • \(P(X \ge x) = 1 - P(X \le x-1)\)

Don't worry if this seems tricky at first—practice identifying \(n\), \(p\), and \(\lambda\) carefully. If the problem involves counting rare events over a large population or long time, think Poisson! If it involves a fixed number of trials with a success/failure outcome, think Binomial!


Chapter Summary: Key Takeaways

Model Identification Checklist

| Feature | Binomial \(B(n, p)\) | Poisson \(Po(\lambda)\) | |---|---|---| | Goal | Counting successes in a fixed number of trials. | Counting occurrences in a fixed interval (time/space). | | Trial Count | Fixed (n). | Unlimited (no fixed upper limit). | | Key Parameter | \(n\) (trials) and \(p\) (prob. of success). | \(\lambda\) (average rate). | | Mean/Variance | \(E(X) = np\); \(Var(X) = np(1-p)\). | \(E(X) = \lambda\); \(Var(X) = \lambda\). | | Approximation Use | Use Poisson if \(n\) is large and \(p\) is small, setting \(\lambda = np\). | N/A |