Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Once we have a random variable, we often want to work with functions of it. For example, if a random variables is an estimator, we usually want to see how far it is from the value it is trying to estimate. For example, we might want to see how far a random variable XX is from the number 10. That’s a function of XX. Let’s call it YY. Then

Y=X10Y = |X - 10|

This section is about finding the expectation of a function of a random variable whose distribution you know. Throughout, we will assume that all the expectations that we are discussing are well defined.

In what follows, let XX be a random variable whose distribution (and hence also expectation) are known.

8.3.1Linear Function Rule

Let XX be a random variable with expectation E(X)E(X) and let Y=aX+bY = aX + b for some constants aa and bb.

This kind of transformation happens for example when you change units of measurement.

  • If you switch from Celsius to Fahreneheit, then a=9/5a = 9/5 and b=32b = 32.

  • If you switch from inches to centimeters, then a=2.54a = 2.54 and b=0b = 0.

We can find E(Y)E(Y) by applying the definition of expectation on the domain Ω\Omega. For every ωΩ\omega \in \Omega, we have Y(ω)=aX(ω)+bY(\omega) = aX(\omega) + b. So

E(Y) = all ω(aX(ω)+b)P(ω)= aall ωX(ω)P(ω)+ ball ωP(ω)= aE(X)+b\begin{align*} E(Y) ~ &= ~ \sum_{\text{all }\omega} (aX(\omega)+b)P(\omega) \\ &= ~a \sum_{\text{all }\omega} X(\omega)P(\omega ) + ~b \sum_{\text{all }\omega} P(\omega )\\ &= ~aE(X) + b \end{align*}

For example, E(2X3)=2E(X)3E(2X - 3) = 2E(X) - 3. Also E(X/2)=E(X)/2E(X/2) = E(X)/2, and E(1X)=1E(X)E(1 - X) = 1 - E(X).

The expectation of a linear transformation of XX is the linear transformation of the expectation of XX. This is a handy result as we will often be transforming variables linearly.

But expectation behaves differently under non-linear transformation.

🎥 See More
Loading...

8.3.2Non-linear Function Rule

Now let Y=g(X)Y = g(X) where gg is any numerical function. Remember that XX is a function on Ω\Omega. So the function that defines the random variable YY is a composition:

Y(ω)=(gX)(ω)         for ωΩY(\omega) = (g \circ X) (\omega) ~~~~~~~~~ \text{for } \omega \in \Omega

This allows us to write E(Y)E(Y) in three equivalent ways:

On the range of YY

E(Y)=all yyP(Y=y)E(Y) = \sum_{\text{all }y} yP(Y=y)

On the domain Ω\Omega

E(Y)=E(g(X))=ωΩ(gX)(ω)P(ω)E(Y) = E(g(X)) = \sum_{\omega \in \Omega} (g \circ X) (\omega) P(\omega)

On the range of XX

E(Y)=E(g(X))=all xg(x)P(X=x)E(Y) = E(g(X)) = \sum_{\text{all }x} g(x)P(X=x)

As before, it is a straightforward matter of grouping to show that all the forms are equivalent.

The first form looks the simplest, but there’s a catch: you need to first find P(Y=y)P(Y=y). The second form involves an unnecessarily high level of detail.

The third form is the one to use. It uses the known distribution of XX. It says that to find E(Y)E(Y) where Y=g(X)Y = g(X) for some function gg:

  • Take a generic value xx of XX.

  • Apply gg to xx; this g(x)g(x) is a generic value of YY.

  • Weight g(x)g(x) by P(X=x)P(X=x), which is known.

  • Do this for all xx and add. The sum is E(Y)E(Y).

The crucial thing to note about this method is that we didn’t have to first find the distribution of YY. That saves us a lot of work.

🎥 See More
Loading...

Let’s see how our method works in some examples.

8.3.3Y=X3Y = \vert X-3 \vert

Let XX have a distribution we worked with earlier:

x = np.arange(1, 6)
probs = make_array(0.15, 0.25, 0.3, 0.2, 0.1)
dist = Table().values(x).probabilities(probs)
dist = dist.relabel('Value', 'x').relabel('Probability', 'P(X=x)')
dist
Loading...

Let gg be the function defined by g(x)=x3g(x) = \vert x-3 \vert, and let Y=g(X)Y = g(X). In other words, Y=X3Y = \vert X - 3 \vert.

To calculate E(Y)E(Y), we first have to create a column that transforms the values of XX into values of YY:

dist_with_Y = dist.with_column('g(x)', np.abs(dist.column('x')-3)).move_to_end('P(X=x)')

dist_with_Y
Loading...

To get E(Y)E(Y), find the appropriate weighed average: multiply the g(x) and P(X=x) columns, and add. The calculation shows that E(Y)=0.95E(Y) = 0.95.

ev_Y = sum(dist_with_Y.column('g(x)') * dist_with_Y.column('P(X=x)'))
ev_Y
0.94999999999999996

8.3.4Y=min(X,3)Y = \min(X, 3)

Let XX be as above, but now let Y=min(X,3)Y = \min(X, 3). We want E(Y)E(Y). What we know is the distribution of XX:

dist
Loading...

To find E(Y)E(Y) we can just go row by row and replace the value of xx by the value of min(x,3)\min(x, 3), and then find the weighted average:

ev_Y = 1*0.15 + 2*0.25 + 3*0.3 + 3*0.2 + 3*0.1
ev_Y
2.45

8.3.5E(X(X1))E(X(X-1)) for a Poisson Variable XX

Let XX have the Poisson (μ)(\mu) distribution.

E(X(X1))=k=0k(k1)eμμkk!=eμμ2k=2μk2(k2)!=eμμ2eμ=μ2\begin{align*} E(X(X-1)) &= \sum_{k=0}^\infty k(k-1) e^{-\mu} \frac{\mu^k}{k!} \\ \\ &= e^{-\mu} \mu^2 \sum_{k=2}^\infty \frac{\mu^{k-2}}{(k-2)!} \\ \\ &= e^{-\mu}\mu^2 e^\mu \\ \\ &= \mu^2 \end{align*}

In the next section we will use this to find E(X2)E(X^2). For now, notice that

E(X2) = k=0k2eμμkk!  k=0k(k1)eμμkk! = E(X(X1)) = μ2E(X^2) ~ = ~ \sum_{k=0}^\infty k^2 e^{-\mu} \frac{\mu^k}{k!} ~ \ge ~ \sum_{k=0}^\infty k(k-1) e^{-\mu} \frac{\mu^k}{k!} ~ = ~ E(X(X-1)) ~ = ~ \mu^2

Since E(X)=μE(X) = \mu, we have E(X2)(E(X))2E(X^2) \ge (E(X))^2. We will see later that this inequality is true for all random variables for which the expected square is finite.