Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The probability mass function and probability density, cdf, and survival functions are all ways of specifying the probability distribution of a random variable. They are all defined as probabilities or as probability per unit length, and thus have natural interpretations and visualizations.

But there are also more abstract ways of describing distributions. One that you have encountered is the probability generating function (pgf), which we defined for random variables with finitely many non-negative integer values.

We now define another such transform of a distribution. More general than the pgf, it is a powerful tool for studying distributions.

Let XX be a random variable. The moment generating function (mgf) of XX is a function defined on the real numbers by the formula

MX(t) = E(etX)M_X(t) ~ = ~ E(e^{tX})

for all tt for which the expectation is finite. It is a fact (which we will not prove) that the domain of the mgf has to be an interval, not necessarily finite but necessarily including 0 because MX(0)=1M_X(0) = 1.

For XX with finitely many non-negative integer values, we had defined the pgf by GX(s)=E(sX)G_X(s) = E(s^X). Notice that this is a special case of the mgf with s=ets = e^t and hence positive. For a random variable XX that has both a pgf GXG_X and an mgf MXM_X, the two functions are related by MX(log(s))=GX(s)M_X(\log(s)) = G_X(s). Therefore the properties of MXM_X near 0 reflect the properties of GXG_X near 1.

This section presents three ways in which the mgf is useful. Other ways are demonstrated in the subsequent sections of this chapter. Much of what we say about mgf’s will not be accompanied by complete proofs as the math required is beyond the scope of this class. But the results should seem reasonable, even without formal proofs.

We will list the three ways first, and then use them all in examples.

🎥 See More
Loading...

19.2.1Generating Moments

For non-negative integers kk, the expectation E(Xk)E(X^k) is called kkth moment of XX. You saw in Data 8 and again in this course that the mean E(X)E(X) is the center of gravity of the probability histogram of XX. In physics, the center of mass is called the first moment. The terminology of moments is used in probability theory as well.

In this course we are only going to work with mgf’s that are finite in some interval around 0. The interval could be the entire real line. It is a fact that if the mgf is finite around 0 (not just to one side of 0), then all the moments exist.

Expand etXe^{tX} to see that

MX(t) = E(1+tX1!+t2X22!+t3X33!+)= 1+tE(X)1!+t2E(X2)2!+t3E(X3)3!+\begin{align*} M_X(t) ~ &= ~ E \big( 1 + t \frac{X}{1!} + t^2 \frac{X^2}{2!} + t^3 \frac{X^3}{3!} + \cdots \big) \\ \\ &= ~ 1 + t \frac{E(X)}{1!} + t^2 \frac{E(X^2)}{2!} + t^3 \frac{E(X^3)}{3!} + \cdots \end{align*}

by blithely switching the expectation and the infinite sum. This requires justification, which we won’t go into.

Continue to set aside questions about whether we can switch infinite sums with other operations. Just go ahead and differentiate MXM_X term by term. Let MX(n)M_X^{(n)} denote the nnth derivative. Then

MX(1)(t) = ddtMX(t) =E(X)1!+2tE(X2)2!+3t2E(X3)3!+M_X^{(1)} (t) ~ = ~ \frac{d}{dt} M_X(t) ~ = \frac{E(X)}{1!} + 2t \frac{E(X^2)}{2!} + 3t^2 \frac{E(X^3)}{3!} + \cdots

and hence

M(1)(0) = E(X)M^{(1)} (0) ~ = ~ E(X)

Now differentiate MX(1)M_X^{(1)} to see that MX(2)(0)=E(X2)M_X^{(2)}(0) = E(X^2), and, by induction,

M(n)(0) = E(Xn),    n=1,2,3,M^{(n)} (0) ~ = ~ E(X^n), ~~~~ n = 1, 2, 3, \ldots

Hence we can generate the moments of XX by evaluating successive derivatives of MXM_X at t=0t=0. This is one way in which mgf’s are helpful.

19.2.2Identifying the Distribution

In this class we have made heavy use of the first and second moments, and no use at all of the higher moments. That will continue to be the case. But mgf’s do involve all the moments, and this results in a property that is very useful for proving facts about distributions.

If two distributions have the same mgf, then they must be the same distribution. This property is valid if the mgf exists in an interval around 0, which we assumed earlier in this section.

For example, if you recognize the mgf of a random variable as the mgf of a normal distribution, then the random variable must be normal.

By contrast, if you know the expectation of a random variable you can’t identify the distribution of the random variable; even if you know both the mean and the SD (equivalently, the first and second moments), you can’t identify the distribution. But if you know the moment generating function, and hence all the moments, then you can.

19.2.3Working Well with Sums

The third reason mgf’s are useful is that like the pgf, the mgf of the sum of independent random variables is easily computed as a product.

Let XX and YY be independent. Then

MX+Y(t) = E(et(X+Y)) = E(etXetY)M_{X+Y} (t) ~ = ~ E(e^{t(X+Y)}) ~ = ~ E(e^{tX} \cdot e^{tY})

So if XX and YY are independent,

MX+Y(t) = MX(t)MY(t)M_{X+Y}(t) ~ = ~ M_X(t) M_Y(t)

It’s time for some examples. Remember that the mgf of XX is the expectation of a function of XX. In some cases we will calculate it using the non-linear function rule for expectations. In other cases we will use the multiplicative property of the mgf of the sum of independent random variables.

19.2.4MGFs of Some Discrete Random Variables

19.2.4.1Bernoulli (p)(p)

P(X=1)=pP(X = 1) = p and P(X=0)=1p=qP(X = 0) = 1 - p = q. So

MX(t) = qet0+pet1 = q+pet = 1+p(et1)   for all tM_X(t) ~ = ~ qe^{t \cdot 0} + pe^{t \cdot 1} ~ = ~ q + pe^t ~ = ~ 1 + p(e^t - 1) ~~~ \text{for all } t

19.2.4.2Binomial (n,p)(n, p)

A binomial random variable is the sum of nn i.i.d. indicators. So

MX(t) = (q+pet)n   for all tM_X(t) ~ = ~ (q + pe^t)^n ~~~ \text{for all } t

19.2.4.3Poisson (μ)(\mu)

This one is an exercise.

MX(t) = eμ(et1)   for all tM_X(t) ~ = ~ e^{\mu(e^t - 1)} ~~~ \text{for all } t

You can also use this to show that the sum of independent Poisson variables is Poisson.

19.2.5MGF of a Gamma (r,λ)(r, \lambda ) Random Variable

Let XX have the gamma (r,λ)(r, \lambda) distribution. Then

MX(t) = 0etxλrΓ(r)xr1eλxdx= λrΓ(r)0xr1e(λt)xdx= λrΓ(r)Γ(r)(λt)r    t<λ=(λλt)r    t<λ\begin{align*} M_X(t) ~ &= ~ \int_0^\infty e^{tx} \frac{\lambda^r}{\Gamma(r)} x^{r-1} e^{-\lambda x} dx \\ \\ &= ~ \frac{\lambda^r}{\Gamma(r)} \int_0^\infty x^{r-1} e^{-(\lambda - t)x} dx \\ \\ &= ~ \frac{\lambda^r}{\Gamma(r)} \cdot \frac{\Gamma(r)}{(\lambda - t)^r} ~~~~ t < \lambda \\ \\ &= \big( \frac{\lambda}{\lambda - t} \big)^r ~~~~ t < \lambda \end{align*}

19.2.6Sums of Independent Gamma Variables with the Same Rate

If XX has gamma (r,λ)(r, \lambda) distribution and YY independent of XX has gamma (s,λ)(s, \lambda) distribution, then

MX+Y(t) = (λλt)r(λλt)s    t<λ= (λλt)r+s    t<λ\begin{align*} M_{X+Y} (t) ~ &= ~ \big( \frac{\lambda}{\lambda - t} \big)^r \cdot \big( \frac{\lambda}{\lambda - t} \big)^s ~~~~ t < \lambda \\ \\ &= ~ \big( \frac{\lambda}{\lambda - t} \big)^{r+s} ~~~~ t < \lambda \end{align*}

That’s the mgf of the gamma (r+s,λ)(r+s, \lambda) distribution. Because the mgf identifies the distribution, X+YX+Y must have the gamma (r+s,λ)(r+s, \lambda) distribution.

This is what we observed in an earlier section by simulation, using numerical values of rr and λ\lambda.

🎥 See More
Loading...

19.2.7Note on Existence

Let XX be a random variable. For all tt, the random variable etXe^{tX} is positive, so MX(t)M_X(t) is either positive or ++\infty.

The rough statements below should give you a sense of the connection between the tails of the distribution of XX and the existence of the mgf. We will not cover the proofs.

If t>0t > 0 then etXe^{tX} is large for large positive values of XX. So if MX(t)M_X(t) is finite for a positive tt, then the right hand tail of the distribution of XX can’t be heavy.

If t<0t < 0 then etXe^{tX} is large for large negative values of XX. So if MX(t)M_X(t) is finite for a negative tt, then the left hand tail of the distribution of XX can’t be heavy.

So if MX(t)M_X(t) is finite for a positive value of tt as well as for a negative value of tt, then both of the tails aren’t heavy.

It can be shown that if MX(t)M_X(t) is finite for some tt, then MX(s)M_X(s) is finite for all ss between 0 and tt. So MX(t)M_X(t) being finite for a positive tt as well as for a negative tt is equivalent to MXM_X being finite on an interval around 0. The interval might be very small, but as long as it straddles 0 all the properties listed in this section hold.