Probability Generating Functions (PGF): FS1.4 Study Notes

Hello Future Mathematician! Welcome to one of the most clever and powerful tools in statistics: the Probability Generating Function (PGF).

Don't worry if the name sounds intimidating! A PGF is essentially a compact, mathematical "code" or "data file" that holds *all* the information about a discrete random variable's probability distribution in a single, neat polynomial (or power series).

This chapter will show you how to unlock this code to easily find probabilities, the mean, and the variance, often without needing complex summation or integration. Let's dive in!


1. Defining the Probability Generating Function (\(G_X(t)\))

What is a PGF?

The PGF, denoted \(G_X(t)\), for a discrete random variable \(X\) is an expected value involving a dummy variable \(t\).

Think of \(t\) as a placeholder. We use its powers to "store" the probability corresponding to each possible outcome.

The Formal Definition

The PGF of a discrete random variable \(X\), which can take values \(x_1, x_2, x_3, \dots\) with corresponding probabilities \(p_1, p_2, p_3, \dots\), is defined as:

\(G_X(t) = E(t^X) = \sum_{\text{all } x} t^x P(X=x)\)

In simpler terms: You take every possible value \(x\) that the random variable \(X\) can take, multiply \(t\) raised to that power (\(t^x\)) by its probability (\(P(X=x)\)), and sum them all up.

Example: A Simple Die Roll

Let \(X\) be the outcome of rolling a fair six-sided die. \(X\) can be 1, 2, 3, 4, 5, or 6, each with probability \(1/6\).
The PGF is:
\(G_X(t) = t^1 P(X=1) + t^2 P(X=2) + \dots + t^6 P(X=6)\)
\(G_X(t) = \frac{1}{6}t + \frac{1}{6}t^2 + \frac{1}{6}t^3 + \frac{1}{6}t^4 + \frac{1}{6}t^5 + \frac{1}{6}t^6\)
\(G_X(t) = \frac{1}{6}(t + t^2 + t^3 + t^4 + t^5 + t^6)\)

Key Takeaway: The probability \(P(X=x)\) is simply the coefficient of the term \(t^x\) in the function \(G_X(t)\).


2. Properties: Finding Probabilities from the PGF

One of the most immediate and useful properties is retrieving the original probabilities.

Property 1: Extracting Probabilities

If you have the PGF \(G_X(t)\), the probability of the random variable \(X\) taking a specific value \(x\) is found by:

\(P(X = x) = \text{coefficient of } t^x \text{ in } G_X(t)\)

This is why the PGF is so powerful—it generates the probabilities!

Quick Check Property: \(G_X(1) = 1\)

If we set \(t=1\) in the PGF formula:
\(G_X(1) = \sum 1^x P(X=x) = \sum P(X=x)\)
Since the sum of all probabilities must be 1, \(G_X(1)\) must always equal 1. This is a great way to check your PGF derivation.

Common Mistake to Avoid: Remember that \(G_X(t)\) is a function of \(t\). Do not treat \(t\) as a variable we are trying to solve for, but as a symbolic placeholder.


3. Properties: Generating the Mean and Variance (Moments)

The greatest utility of the PGF lies in calculating the mean (\(\mu\)) and variance (\(\sigma^2\)) using simple differentiation.

Finding the Mean (\(\mu\))

The mean, or expectation \(E(X)\), is found using the first derivative of the PGF, evaluated at \(t=1\).

Step 1: Find the first derivative, \(G'_X(t)\).
Step 2: Substitute \(t=1\) into the derivative.

Mean: \(\mu = E(X) = G'_X(1)\)

Finding the Variance (\(\sigma^2\))

The variance requires both the first and second derivatives, evaluated at \(t=1\).

Step 1: Find the second derivative, \(G''_X(t)\).
Step 2: Evaluate \(G''_X(1)\).
Step 3: Apply the variance formula (which is provided in the syllabus):

Variance: \(\sigma^2 = G''_X(1) + \mu - \mu^2\)

Wait, why is this formula used?
Did you know? The value \(G''_X(1)\) actually gives us \(E(X^2) - E(X)\), or \(E(X^2) - \mu\). If you rearrange the standard formula for variance \(\sigma^2 = E(X^2) - \mu^2\), you get:
\(\sigma^2 = [G''_X(1) + \mu] - \mu^2\).
This is why the specific formula \(\sigma^2 = G''_X(1) + \mu - \mu^2\) must be used for PGFs.

Key Takeaway: PGFs turn complex summations into easy differentiation steps for finding the mean and variance.


4. Derivations of PGFs for Standard Distributions

You must know how to derive (or instantly recall) the PGFs for key distributions, as specified by the syllabus.

4.1 Bernoulli Distribution (Ber(p))

A Bernoulli variable \(X\) takes \(x=0\) (failure) or \(x=1\) (success).
\(P(X=0) = q\) (where \(q = 1-p\))
\(P(X=1) = p\)

\(G_X(t) = \sum t^x P(X=x)\)
\(G_X(t) = t^0 P(X=0) + t^1 P(X=1)\)
\(G_X(t) = 1 \cdot q + t \cdot p\)

Bernoulli PGF: \(G_X(t) = q + pt\)

4.2 Binomial Distribution (B(n, p))

A Binomial variable \(X\) is the sum of \(n\) independent Bernoulli trials.
\(P(X=x) = \binom{n}{x} p^x q^{n-x}\) for \(x=0, 1, \dots, n\).

\(G_X(t) = \sum_{x=0}^{n} t^x P(X=x)\)
\(G_X(t) = \sum_{x=0}^{n} t^x \binom{n}{x} p^x q^{n-x}\)
\(G_X(t) = \sum_{x=0}^{n} \binom{n}{x} (pt)^x q^{n-x}\)

Recognise this structure? It is the Binomial Expansion of \((A+B)^n\), where \(A=q\) and \(B=pt\).

Binomial PGF: \(G_X(t) = (q + pt)^n\)

4.3 Geometric Distribution (Geo(p))

A Geometric variable \(X\) is the number of trials needed to get the first success (\(x=1, 2, 3, \dots\)).
\(P(X=x) = q^{x-1} p\)

\(G_X(t) = \sum_{x=1}^{\infty} t^x P(X=x)\)
\(G_X(t) = \sum_{x=1}^{\infty} t^x q^{x-1} p\)
\(G_X(t) = p \sum_{x=1}^{\infty} t^x q^{x-1}\)

Expanding the series:
\(G_X(t) = p (t^1 q^0 + t^2 q^1 + t^3 q^2 + \dots)\)
\(G_X(t) = pt (1 + qt + (qt)^2 + (qt)^3 + \dots)\)

The bracketed expression is an infinite geometric series with first term \(a=1\) and common ratio \(r=qt\).
The sum of this series is \(\frac{a}{1-r} = \frac{1}{1 - qt}\).

Geometric PGF: \(G_X(t) = \frac{pt}{1 - qt}\)

4.4 Uniform Distribution (Discrete)

Let \(X\) be a discrete uniform variable taking values \(1, 2, \dots, N\), each with probability \(1/N\).

\(G_X(t) = \sum_{x=1}^{N} t^x P(X=x) = \sum_{x=1}^{N} t^x \frac{1}{N}\)
\(G_X(t) = \frac{1}{N} (t^1 + t^2 + t^3 + \dots + t^N)\)

The expression in the bracket is a geometric series with \(a=t\), ratio \(r=t\), and \(N\) terms. The sum is \(\frac{a(1-r^N)}{1-r}\).

Uniform PGF: \(G_X(t) = \frac{t(1 - t^N)}{N(1 - t)}\)

Quick Review Box: Key PGFs to Memorise

Bernoulli \(Ber(p)\): \(G_X(t) = q + pt\)

Binomial \(B(n, p)\): \(G_X(t) = (q + pt)^n\)

Geometric \(Geo(p)\): \(G_X(t) = \frac{pt}{1 - qt}\)


5. Sum of Independent Random Variables

This is arguably the most powerful reason PGFs are used: they simplify the process of adding independent random variables.

Property 2: The PGF of a Sum

If \(X\) and \(Y\) are independent discrete random variables, and \(W = X + Y\), then the PGF of the sum \(W\) is simply the product of their individual PGFs.

\(G_{X+Y}(t) = G_X(t) G_Y(t)\)

Analogy: Imagine you have two separate data files (PGFs) for two different processes. If those processes are independent, you can combine their information just by multiplying the files together!

Application Example: Sum of Binomials

Suppose \(X_1 \sim B(n_1, p)\) and \(X_2 \sim B(n_2, p)\) are independent, measuring the number of successes in two different sets of trials (with the same probability \(p\)).
Let \(W = X_1 + X_2\).
\(G_{X_1}(t) = (q + pt)^{n_1}\)
\(G_{X_2}(t) = (q + pt)^{n_2}\)

\(G_W(t) = G_{X_1}(t) G_{X_2}(t) = (q + pt)^{n_1} \cdot (q + pt)^{n_2}\)
\(G_W(t) = (q + pt)^{n_1 + n_2}\)

Since the resulting PGF has the form of a Binomial PGF, we immediately know that the sum \(W\) follows a Binomial distribution:
\(W \sim B(n_1 + n_2, p)\).

This multiplication property makes combining distributions incredibly simple, whereas calculating the probability distribution of the sum using traditional probability methods (convolution) can be extremely complicated.

Key Takeaway: When adding independent random variables, multiply their PGFs. This often helps identify the distribution of the resultant sum.

Important Reminders & Common Errors

1. Differentiation is key: To find the variance, remember to calculate both \(G'_X(1)\) (the mean, \(\mu\)) and \(G''_X(1)\) before using the formula: \(\sigma^2 = G''_X(1) + \mu - \mu^2\).

2. Independent Sums ONLY: The property \(G_{X+Y}(t) = G_X(t) G_Y(t)\) only holds if \(X\) and \(Y\) are independent. If they are not independent, this relationship does not apply.

3. Check \(G_X(1) = 1\): If your derived PGF does not equal 1 when \(t=1\), you have made an error!