Continuous Random Variables: Study Notes (9231 Further Probability & Statistics)
Welcome to the fascinating world of Continuous Random Variables (CRVs)! In your previous statistics course (9709), you focused mainly on discrete variables (like counting how many times an event occurs). Now, we dive into variables that can take *any* value within a range—measurements like time, height, or temperature.
This chapter is crucial because it moves beyond standard distributions (like the Normal Distribution) and teaches you how to handle custom probability models using calculus (integration and differentiation). Don't worry if this seems tricky at first; we will break down the calculus into simple, clear steps!
1. Understanding Continuous Random Variables (CRVs)
A Continuous Random Variable (CRV), usually denoted \(X\), is a variable that can take any value within a specified interval.
Example: The exact time (in seconds) it takes a computer to boot up. It could be 15.00 seconds, 15.01 seconds, or 15.0000001 seconds.
Why CRVs are Different from Discrete Variables
- For a discrete variable, we can find \(P(X = x)\).
-
For a CRV, the probability of hitting one exact value is always zero: \(P(X = a) = 0\).
Analogy: Imagine a continuous rope 1 meter long. What is the chance that a random point chosen lands exactly on the value 0.5000000...? Zero, because there are infinitely many points around it. - Therefore, we only talk about probabilities over an interval, e.g., \(P(a < X < b)\).
Quick Review Box: Because \(P(X=a)=0\), the inclusion of equality signs does not change the probability:
\(P(a < X < b) = P(a \leq X \leq b) = P(a < X \leq b)\)
2. The Probability Density Function (PDF)
Since we cannot assign a probability to a single point, we use a function called the Probability Density Function (PDF), denoted \(f(x)\). This function describes the *likelihood* of the variable falling within a range.
Properties of the PDF, \(f(x)\)
For a function to be a valid PDF, it must satisfy two fundamental rules:
-
Non-Negative: The density cannot be negative.
$$f(x) \geq 0 \text{ for all values of } x$$ -
Total Area is One: The total probability for all possible outcomes must equal 1 (or 100%). This is found by integrating over the entire domain.
$$\int_{-\infty}^{\infty} f(x) dx = 1$$
Note on Piecewise Functions: Often, CRVs are defined using a piecewise function, meaning the PDF only exists over a specific range \([a, b]\), and \(f(x) = 0\) outside that range.
In this case, the total probability rule simplifies to:
$$\int_{a}^{b} f(x) dx = 1$$
Step-by-Step: Finding the Probability \(P(a < X < b)\)
The probability that \(X\) lies between \(a\) and \(b\) is the area under the PDF curve between those points.
- Identify Limits: Determine the range \([a, b]\) required.
-
Integrate: Calculate the definite integral of the PDF over that range.
$$P(a < X < b) = \int_{a}^{b} f(x) dx$$
Key Takeaway: The PDF uses integration to define probability as an area under the curve. Remember that the total area must be exactly 1.
3. The Cumulative Distribution Function (CDF)
The Cumulative Distribution Function (CDF), denoted \(F(x)\), gives the probability that the random variable \(X\) is less than or equal to a certain value \(x\).
$$\text{Definition: } F(x) = P(X \leq x)$$The Relationship between PDF and CDF (Using Calculus)
Since the CDF measures the accumulated probability up to \(x\), it is found by integrating the PDF:
$$F(x) = \int_{-\infty}^{x} f(t) dt$$Conversely, if you have the CDF and need the PDF, you differentiate:
$$f(x) = \frac{d}{dx} F(x)$$Memory Aid: Think of the alphabet: C (CDF) comes before P (PDF). Integration involves moving 'up' (increasing power), and differentiation moves 'down' (decreasing power). You integrate the PDF to get the CDF, and you differentiate the CDF to get the PDF.
Properties of the CDF, \(F(x)\)
If the variable \(X\) is defined for \(x\) in the range \([a, b]\):
- Lower Limit: \(F(a) = 0\) (The probability of being less than the starting point is zero).
- Upper Limit: \(F(b) = 1\) (The probability of being less than the ending point is one).
- Non-Decreasing: \(F(x)\) must always be increasing or constant (it can never go down).
Using the CDF for Probability:
You can calculate \(P(a < X < b)\) without integration if you have the CDF:
$$P(a < X < b) = F(b) - F(a)$$
Did you know? In many professional statistical software packages, the CDF is used far more frequently than the PDF because it directly provides cumulative probability values, which are easier to interpret.
4. Percentiles and Measures of Location
A percentile (or quartile) is a value that divides the probability distribution into specific proportions.
The \(p\)-th percentile is the value \(k\) such that the probability of \(X\) being less than or equal to \(k\) is \(p\).
$$\text{Mathematically: } F(k) = p$$Key Percentiles
- Median (\(m\)): The 50th percentile. This is the value \(m\) where \(F(m) = 0.5\). Half the data lies below it, and half lies above it.
- Lower Quartile (\(Q_1\)): The 25th percentile, where \(F(Q_1) = 0.25\).
- Upper Quartile (\(Q_3\)): The 75th percentile, where \(F(Q_3) = 0.75\).
Step-by-Step: Finding the Median
-
Set up the integral: Find the median \(m\) by solving:
$$\int_{-\infty}^{m} f(x) dx = 0.5$$ -
Alternatively, use the CDF: If you already calculated the CDF, \(F(x)\), simply solve:
$$F(m) = 0.5$$
5. Expectation (Mean and Variance)
The Expected Value or Mean, \(E(X)\) or \(\mu\), is the long-run average value of the variable.
Expected Value of X (The Mean)
For a CRV, the summation used for discrete variables is replaced by integration:
$$\mu = E(X) = \int_{-\infty}^{\infty} x f(x) dx$$Variance of X
The variance, \(Var(X)\), measures the spread of the distribution around the mean. We use the standard formula, but substitute expectation calculations with integrals:
$$Var(X) = E(X^2) - [E(X)]^2$$Where \(E(X^2)\) is calculated using the following general formula:
Crucial Further Math Concept: Expected Value of a Function
This is a key result for Paper 4. If \(g(X)\) is any function of the random variable \(X\), its expected value is found by replacing \(x\) in the integral with \(g(x)\).
$$\mathbf{E(g(X))} = \int_{-\infty}^{\infty} \mathbf{g(x)} f(x) dx$$
Example: To find \(E(X^2)\), we let \(g(x) = x^2\):
$$E(X^2) = \int x^2 f(x) dx$$
Example: If you needed the expected cost of an item whose price is determined by the formula \(C = 5X + 10\), you would calculate \(E(5X + 10) = \int (5x + 10) f(x) dx\).
Common Mistake to Avoid: When calculating variance, students often forget to square the *final* mean in the formula \([E(X)]^2\). Make sure you calculate \(E(X)\) first, then calculate \(E(X^2)\), and finally apply the variance formula.
6. Functions of Continuous Random Variables (The Advanced Step)
In Further Mathematics, you must be able to find the distribution (both CDF and PDF) of a new variable \(Y\) which is defined as a function of \(X\), i.e., \(Y = g(X)\).
The most reliable method is to first find the CDF of Y, \(F_Y(y)\), and then differentiate it to find the PDF of Y, \(f_Y(y)\).
Step-by-Step: Finding the Distribution of \(Y = g(X)\)
-
Define the CDF of Y: Start with the definition:
$$F_Y(y) = P(Y \leq y)$$ -
Substitute and Relate to X: Replace \(Y\) with the function \(g(X)\):
$$F_Y(y) = P(g(X) \leq y)$$ -
Solve for X: Rearrange the inequality \(g(X) \leq y\) to isolate \(X\). Let's assume the resulting inequality is \(X \leq h(y)\).
(Be careful with signs and directions if the function \(g\) is decreasing!) -
Use the CDF of X: Since we now have a probability statement about \(X\), we use the known CDF of \(X\), \(F_X(x)\):
$$F_Y(y) = P(X \leq h(y)) = F_X(h(y))$$ -
Find the PDF of Y: Differentiate the expression for \(F_Y(y)\) with respect to \(y\) using the chain rule:
$$f_Y(y) = \frac{d}{dy} F_Y(y)$$
Simple Example: \(Y = X^3\)
Assume \(X\) is a CRV defined for \(x > 0\). We want the PDF of \(Y\).
- $$F_Y(y) = P(Y \leq y)$$
- $$F_Y(y) = P(X^3 \leq y)$$
- Solve for X: Assuming \(y>0\), this means \(X \leq y^{1/3}\). (Here, \(h(y) = y^{1/3}\))
- Use \(F_X\): $$F_Y(y) = F_X(y^{1/3})$$
-
Differentiate to get \(f_Y(y)\): Using the chain rule, \(f_Y(y) = F'_X(y^{1/3}) \cdot \frac{d}{dy}(y^{1/3})\).
Since \(F'_X = f_X\), we get:
$$f_Y(y) = f_X(y^{1/3}) \cdot \frac{1}{3} y^{-2/3}$$
You would then substitute the specific PDF for \(X\), \(f_X(x)\), into this result to get the final answer.
Key Takeaway: Finding the distribution of a function of X always follows the same pattern: CDF, Substitute, Solve, Differentiate.
Chapter Summary: Continuous Random Variables
- PDF \(f(x)\): Defines probability density. Must be \(\geq 0\) and \(\int f(x) dx = 1\).
- Probability: \(P(a < X < b) = \int_a^b f(x) dx\).
- CDF \(F(x)\): Accumulates probability: \(F(x) = \int_{-\infty}^{x} f(t) dt\).
- Calculus Link: \(f(x) = F'(x)\).
-
Expectation: The general result for finding expected values of functions is:
$$E(g(X)) = \int g(x) f(x) dx$$ - Functions of CRVs: Find \(F_Y(y)\) by relating the inequality \(Y \leq y\) back to \(X\), then differentiate \(F_Y(y)\) to find \(f_Y(y)\).