Unit S2: Statistics 2 - Chapter Notes: Continuous Distributions

Welcome to the World of Continuous Probability!

Hi there! This chapter marks a big shift from the discrete distributions (like Binomial and Poisson) where we counted specific outcomes (0, 1, 2, 3...). Now, we are diving into continuous distributions, which deal with measurements—things that can take on any value within a range, like height, time, or temperature.

Don't worry if this sounds intimidating! We will break down how to handle integrals and derivatives in a statistical context. If you feel confident with Calculus, you have a huge head start. If not, treat this as a fantastic opportunity to sharpen those skills!

Key Takeaway: In continuous distributions, we focus on the probability density, not the probability of a single exact value.

Section 1: The Probability Density Function (PDF)

Defining \(f(x)\)

For a continuous random variable \(X\), we define the Probability Density Function (PDF), denoted by \(f(x)\).

Analogy: You can think of \(f(x)\) as a map showing how "dense" the probability is at every point \(x\). Where the graph of \(f(x)\) is high, the outcomes are more likely to occur.

Two Essential Rules for all PDFs

Every function \(f(x)\) must satisfy these two rules to qualify as a legitimate PDF:

  1. Non-negativity: The density can never be negative.
    $$f(x) \ge 0 \quad \text{for all values of } x$$
  2. Total Area is One: The total probability across the entire domain (range of possible values) must equal 1.
    $$\int_{-\infty}^{\infty} f(x) \, dx = 1$$

    In practice, the integral limits will usually be defined by the range given in the question (e.g., \(\int_{0}^{5} f(x) \, dx = 1\)).

Calculating Probabilities

Since we are dealing with density, the probability of the random variable \(X\) falling between two points, \(a\) and \(b\), is the area under the curve between those points.

$$P(a < X < b) = \int_{a}^{b} f(x) \, dx$$

!!! Crucial Concept Alert !!!

For any continuous distribution, the probability of \(X\) being exactly equal to a specific value is zero.
$$P(X = x) = 0$$

Think about it: A single point has no width, so the area above it is zero. This means that for continuous variables, we don't worry about inequality signs:

$$P(a < X < b) = P(a \le X \le b) = P(a < X \le b)$$

Section 2: The Cumulative Distribution Function (CDF)

While the PDF tells us the probability density at a point, the Cumulative Distribution Function (CDF), denoted by \(F(x)\), tells us the total probability up to a certain point \(x\).

Definition of the CDF, \(F(x)\)

The CDF is the probability that the random variable \(X\) is less than or equal to a specific value \(x\).

$$F(x) = P(X \le x) = \int_{-\infty}^{x} f(t) \, dt$$

(Note: We use \(t\) as the variable inside the integral to avoid confusing it with the upper limit \(x\).)

Relationship Between PDF and CDF

Since the CDF is found by integrating the PDF, we can go back the other way by differentiating:

$$f(x) = \frac{d}{dx} F(x)$$

Memory Aid:
F (CDF) is found by Integrating \(f(x)\).
Differentiate \(F(x)\) to get f(x) (PDF).

Using the CDF to Find Probabilities

Once you have the CDF, calculating probabilities is much faster than integrating the PDF every time.

Quick Step-by-Step Probability Calculation

To find \(P(a < X < b)\):

  1. Calculate \(F(b)\) (the probability up to \(b\)).
  2. Calculate \(F(a)\) (the probability up to \(a\)).
  3. Subtract the two:
    $$P(a < X < b) = F(b) - F(a)$$

Section 3: Key Statistical Measures

Just like with discrete distributions, we need ways to describe the centre and spread of continuous distributions.

1. The Expected Value (Mean), \(E(X)\)

The Expected Value, \(\mu\), is the long-run average of the variable. For a continuous distribution, the formula is:

$$E(X) = \mu = \int_{-\infty}^{\infty} x \cdot f(x) \, dx$$

Tip: Compare this to discrete distributions, where we used summation: \(\sum x P(X=x)\). Here, integration replaces summation, and \(f(x) \, dx\) replaces \(P(X=x)\).

2. The Variance and Standard Deviation

Variance, \(\text{Var}(X)\), measures the spread of the data around the mean. We use the same identity as discrete distributions:

$$\text{Var}(X) = E(X^2) - [E(X)]^2$$

To find \(E(X^2)\), we adjust the expected value formula:

$$E(X^2) = \int_{-\infty}^{\infty} x^2 \cdot f(x) \, dx$$

The Standard Deviation (\(\sigma\)) is simply the square root of the variance.

3. The Median, \(m\)

The Median (\(m\)) is the value that splits the distribution exactly in half. Half the probability lies below \(m\), and half lies above.

To find the median, you must solve for \(m\) using either the PDF or the CDF:

Using the CDF: $$F(m) = 0.5$$ Using the PDF: $$\int_{-\infty}^{m} f(x) \, dx = 0.5$$

4. The Mode

The Mode is the value of \(x\) where the probability density function \(f(x)\) reaches its maximum height. It represents the most likely single outcome.

Step-by-Step for Finding the Mode:

  1. If \(f(x)\) is simple (e.g., linear or quadratic), you might find the peak by inspection (looking at the graph).
  2. If \(f(x)\) is more complex, use differentiation (standard calculus optimisation):
    a) Find the derivative: \(\frac{d}{dx} f(x)\)
    b) Set the derivative equal to zero and solve for \(x\): \(\frac{d}{dx} f(x) = 0\)
    c) Check that this value of \(x\) lies within the defined range of the distribution.
Quick Review of Measures
  • Mean: Requires \(\int x f(x) \, dx\)
  • Variance: Requires \(\int x^2 f(x) \, dx\) and \(\int x f(x) \, dx\)
  • Median: Requires solving \(F(m) = 0.5\)
  • Mode: Requires finding maximum of \(f(x)\) (often differentiation)

Section 4: The Uniform (Rectangular) Distribution

The Uniform distribution is the simplest type of continuous distribution, where the probability density is constant over a given interval.

If a random variable \(X\) is uniformly distributed over the interval \([a, b]\), we write \(X \sim U(a, b)\).

Defining the PDF of \(U(a, b)\)

The graph of the PDF looks like a rectangle. Since the area must equal 1, the height (the constant density \(k\)) multiplied by the width (\(b-a\)) must be 1.

$$k \times (b - a) = 1 \quad \Rightarrow \quad k = \frac{1}{b-a}$$

Therefore, the PDF is:

$$ f(x) = \begin{cases} \frac{1}{b-a} & \text{for } a \le x \le b \\ 0 & \text{otherwise} \end{cases} $$
Calculating Probabilities in \(U(a, b)\)

Because the distribution is constant, finding probabilities is simple area calculation (rectangle area = height × width), often without needing complex integration.

Example: If buses arrive uniformly between 0 and 10 minutes, \(U(0, 10)\). The height is \(1/10\). The probability you wait between 2 and 5 minutes is \((5-2) \times (1/10) = 3/10\).

Mean and Variance of \(U(a, b)\)

We can derive the mean and variance using the integration formulas, but for the Uniform distribution, these simplified formulas are crucial and should be memorised:

Mean (Expected Value): Since the density is symmetric, the mean is exactly in the middle.
$$E(X) = \frac{a+b}{2}$$

Variance:
$$\text{Var}(X) = \frac{(b-a)^2}{12}$$

Did you know? The denominator 12 is unique to the Uniform distribution and helps differentiate it from other formulas you will learn later!

Mode and Median of \(U(a, b)\)

Since the density is constant over the interval \([a, b]\):

  • Mode: Every value between \(a\) and \(b\) is a mode (we call this multimodal, or flat).
  • Median: The median is the same as the mean: \(\frac{a+b}{2}\).

Summary and Study Tips

Mastering Continuous Distributions relies heavily on your comfort with differentiation and integration. If you find a problem tricky, it's often a calculus error, not a statistics one!

Common Mistakes to Avoid

  • Forgetting the Limits: Always use the correct limits of integration defined by the PDF (or the specific probability range).
  • Mixing Up PDF and CDF: Remember, if the question asks for the median or a probability \(P(X \le x)\), the CDF (\(F(x)\)) is usually the most efficient tool.
  • Constants of Integration: When finding the CDF (\(F(x)\)) by integrating the PDF (\(f(x)\)), remember to use the domain limits to determine the constant of integration. Crucially, \(F(\text{lower limit}) = 0\) and \(F(\text{upper limit}) = 1\).
  • Incorrect \(E(X)\) Formula: Don't forget to multiply \(f(x)\) by \(x\) inside the integral when calculating the mean! (\(\int x f(x) \, dx\))

Keep practicing those integration skills, and you will find that these problems follow predictable patterns. Good luck!