📚 Study Notes: Continuous Random Variables (9709 P&S 2, Section 6.3)
Welcome to one of the most theoretical yet essential topics in Probability and Statistics 2! In Paper 5 (P&S 1), you dealt mostly with Discrete Random Variables (like counts). Now, we step into the world of Continuous Random Variables, where we deal with measurements (like time, weight, or temperature). This is where your Pure Mathematics skills (especially integration!) meet statistics. Don't worry if this seems tricky at first—it’s just finding the area under a graph!
1. Understanding Continuous Random Variables (CRVs)
A Continuous Random Variable (X) is a variable that can take any value within a given range (or interval). Think of measuring something, rather than counting something.
CRV vs. DRV (Quick Review)
- Discrete Random Variable (DRV): Takes countable values (e.g., number of heads, scores on a dice). Probability is given by \(P(X=x)\).
- Continuous Random Variable (CRV): Takes any value in an interval (e.g., the time taken to run 100m, height).
⚠ Crucial CRV Property:
For any specific single value \(x\), the probability is always zero:
$$P(X = x) = 0$$
Analogy: Imagine throwing a dart at a number line. The chance of hitting the exact point \(x=5.000000...\) is zero. We can only find the probability of landing in a range (e.g., between 4.9 and 5.1).
Because of this, whether you include the boundary points doesn't matter when calculating probability:
$$P(a \le X \le b) = P(a < X < b) = P(a < X \le b)$$
Key Takeaway: For CRVs, we calculate probability over intervals (areas), not individual points.
2. The Probability Density Function (PDF), \(f(x)\)
Since we can't use a probability mass function (like in discrete stats), we use a Probability Density Function, \(f(x)\). This function describes the relative likelihood of a random variable taking on a given value.
Think of the PDF as the outline of a hill. The height of the hill (\(f(x)\)) shows where the values are most likely to cluster.
Properties of the PDF
For \(f(x)\) to be a valid PDF, it must satisfy two fundamental properties:
1. Non-negativity
- The function must never be negative, since probability cannot be negative.
- $$f(x) \ge 0 \quad \text{for all } x$$
2. Total Area is 1
- The total area under the entire graph must equal 1 (or 100%), because the random variable must take some value.
- $$\int_{-\infty}^{\infty} f(x) \, dx = 1$$
- In practice, since \(f(x)\) is usually defined over a specific interval \([a, b]\), this simplifies to: $$\int_{a}^{b} f(x) \, dx = 1$$
Accessibility Note: This property is usually used to find an unknown constant, \(k\), within the function definition (e.g., \(f(x) = kx^2\)).
3. Calculating Probabilities (The Power of Integration)
The probability that a continuous random variable \(X\) lies between two values, \(a\) and \(b\), is given by the area under the PDF curve between those two points.
Formula for Probability
$$P(a < X < b) = \int_{a}^{b} f(x) \, dx$$
Step-by-Step Calculation
If you are given \(f(x)\) and asked for \(P(a < X < b)\):
- Identify the Limits: Determine the lower limit \(a\) and upper limit \(b\).
- Set up the Integral: Write the definite integral \(\int_{a}^{b} f(x) \, dx\).
- Integrate: Calculate the integral (Remembering Pure Math integration rules, including the use of substitutions if necessary, although complex P3 substitutions are rare here).
- Evaluate: Substitute the limits \(b\) and \(a\) to find the final probability.
Common Mistake to Avoid: When calculating probability, make sure your integration limits \((a, b)\) fall within the defined domain of \(f(x)\). If the variable only exists between 0 and 5, integrating up to 10 makes no sense!
Key Takeaway: Probability is found by integrating (finding the area) of the PDF over the desired interval.
4. Finding Measures of Location and Spread
Just like discrete variables, we need the mean and variance to understand the center and spread of the distribution.
A. The Mean (Expectation), \(E(X)\)
The mean, \(\mu\), is the expected value of \(X\).
The formula replaces the summation \(\sum x P(X=x)\) from discrete variables with an integral:
$$E(X) = \mu = \int_{-\infty}^{\infty} x f(x) \, dx$$
If the function is only defined from \(a\) to \(b\):
$$E(X) = \int_{a}^{b} x f(x) \, dx$$
Memory Aid: To find the mean, you integrate x times the function (\(x \cdot f(x)\)).
B. The Variance, \(Var(X)\)
The variance measures the spread around the mean. The standard formula applies:
$$Var(X) = E(X^2) - [E(X)]^2$$
First, you must calculate \(E(X^2)\) using integration:
$$E(X^2) = \int_{-\infty}^{\infty} x^2 f(x) \, dx$$
Then, substitute this value and your previously calculated mean, \(E(X)\), into the variance formula.
Key Takeaway: Mean and variance calculations require integrating \(x f(x)\) and \(x^2 f(x)\) respectively. Use your calculator wisely for the heavy arithmetic after integration!
5. Finding the Median and Percentiles
The median and percentiles locate specific points on the distribution, often requiring you to solve an equation involving an integral.
A. The Median, \(m\)
The median (\(m\)) is the value that splits the distribution exactly in half. Half the probability mass lies below it, and half lies above it.
Therefore, to find the median \(m\), you solve the equation:
$$\int_{Domain_{start}}^{m} f(x) \, dx = 0.5$$
Example: If the function is defined from 0 to 4, you find \(m\) such that \(\int_{0}^{m} f(x) \, dx = 0.5\).
B. Percentiles
The \(p^{th}\) percentile (\(k\)) is the value such that \(p\%\) of the distribution lies below it.
To find the value \(k\) that represents the \(p^{th}\) percentile (e.g., the 90th percentile, where \(p=90\)), you solve:
$$\int_{Domain_{start}}^{k} f(x) \, dx = \frac{p}{100}$$
Did you know? The median is simply the 50th percentile!
☞ Example: Finding the 90th Percentile
If \(X\) is defined for \(x > 0\), and you need the 90th percentile \(k\):
$$P(X < k) = 0.9$$
You set up the integral: \(\int_{0}^{k} f(x) \, dx = 0.9\). You then integrate, substitute \(k\) and 0, and solve the resulting equation for \(k\).
Key Takeaway: The median and percentiles involve setting the area integral equal to the required probability (0.5 for the median) and solving for the upper limit of integration.
6. Quick Review of Formulas (MF19 Reference)
Here are the key formulas you must know for CRVs, as listed in your formula booklet (MF19, Probability & Statistics section):
Continuous Random Variables
Expected Value (Mean):
$$E(X) = \int x f(x) \, dx$$
Variance:
$$Var(X) = \int x^2 f(x) \, dx - \{E(X)\}^2$$
Remember that the absolute requirement for any PDF \(f(x)\) is that the total area must be 1:
$$\int f(x) \, dx = 1$$
Success in this topic relies heavily on accurate integration and solving the resulting equations. Take your time setting up the limits correctly!
Final Encouragement: You have already mastered integration in Pure Mathematics. This chapter is simply teaching you what to integrate and why, within the context of probability. Keep practicing your integral calculations!