Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Let TT be a random variable, and let SS be a random variable defined on the same space as TT. As we have seen, conditioning on SS might be a good way to find probabilities for TT if SS and TT are related. In this section we will see that conditioning on SS can also be a good way to find the expectation of TT.

We will start with a simple example to illustrate the ideas.

🎥 See More
Loading...

Let the joint distribution of TT and SS be as in the table below.

t = [3, 4]
s = [5, 6, 7]
pp = [0.1, 0.2, 0.3, 0.1, 0.2, 0.1]
jt_dist = Table().values('T', t, 'S', s).probabilities(pp)
jt_dist
Loading...

How can SS be involved in the calculation of E(T)E(T)?

Notice that to find E(T)E(T), you could use the joint distribution table and the definition of expectation as follows:

3*(0.3 + 0.2 + 0.1) + 4*(0.1 + 0.2 + 0.1) 
3.4

This is equivalent to going to each cell of the table, weighting the value of TT in that cell with the probability in the cell, and then adding. Here’s another way of looking at this.

Let’s condition on SS:

jt_dist.conditional_dist('T', 'S')
Loading...

Each of the three conditional distributions is a distribution in its own right. Therefore its histogram has a balance point, just as the marginal distribution of TT does.

jt_dist.conditional_dist('T', 'S', show_ev=True)
Loading...

You can see E(T)=3.4E(T) = 3.4 in the row corresponding to the distribution of TT. And you can also see the conditional expectation of TT given each possible value of SS:

  •  E(TS=5)=3.5~E(T \mid S=5) = 3.5

  •  E(TS=6)=3.5~E(T \mid S=6) = 3.5

  •  E(TS=7)=3.25~E(T \mid S=7) = 3.25

This defines a function of SS: for each value ss of SS, the function returns E(TS=s)E(T \mid S=s).

ev_T_given_S = Table().with_columns(
    's', s,
    'E(T | S = s)', [3.5, 3.5, 3.25],
    'P(S = s)', [0.2, 0.4, 0.4]
)
ev_T_given_S
Loading...

This function of SS is called the conditional expectation of TT given SS and is denoted E(TS)E(T \mid S). Unlike expectation which is a number, conditional expectation is a random variable.

As it’s a random variable, it has an expectation, which we can calculate using the non-linear function rule. The answer is a quantity that you will recognize.

ev = sum(ev_T_given_S.column('E(T | S = s)')*ev_T_given_S.column('P(S = s)'))
ev
3.4000000000000004

That’s right: it’s the expectation of TT.

What we have learned from this is that E(T)E(T) is the average of the conditional expectations of TT given the different values of SS, weighted by the probabilities of those values.

In short, E(T)E(T) is the expectation of the conditional expectation of TT given SS.

9.2.1Conditional Expectation as a Random Variable

In general, suppose TT and SS are two random variables on a probability space.

Then for each fixed value of ss, TT has a conditional distribution given S=sS=s. This is an ordinary distribution and has an expectation. That is called the conditional expectation of TT given S=sS=s and is denoted E(TS=s)E(T \mid S = s).

So for each ss, there is a value E(TS=s)E(T \mid S=s). This defines a function of the random variable SS. It is called the conditional expectation of TT given SS, and is denoted E(TS)E(T \mid S).

The key difference between expectation and conditional expectation:

  • E(T)E(T), the expectation of TT, is a real number.

  • E(TS)E(T \mid S), the conditional expectation of TT given SS, is a function of SS and hence is a random variable.

🎥 See More
Loading...

Since E(TS)E(T \mid S) is a random variable, it has an expectation. That expectation is equal to E(T)E(T). We observed this in an example; now here is a proof.

9.2.2Iterated Expectations

Suppose we want the expectation of a random variable, and suppose it is easy for us to say what that expectation would be if we were given the value of a related random variable. The rule of iterated expectations says that we can find that conditional expectation first, and take its expectation to get our answer.

Formally, let SS and TT be two random variables on the same space. Then E(T)=E(E(TS))E(T) = E(E(T \mid S)).

Proof:

E(T)=all ttP(T=t)=all ttall sP(S=s,T=t)=all ttall sP(S=s)P(T=tS=s)=all s(all ttP(T=tS=s))P(S=s)=all sE(TS=s)P(S=s)=E(E(TS))\begin{align*} E(T) &= \sum_{\text{all }t} tP(T=t) \\ \\ &= \sum_{\text{all }t} t \sum_{\text{all }s} P(S=s, T=t) \\ \\ &= \sum_{\text{all }t} t \sum_{\text{all }s} P(S=s)P(T=t \mid S=s) \\ \\ &= \sum_{\text{all }s} \Big( \sum_{\text{all }t} tP(T=t \mid S=s) \Big) P(S=s) \\ \\ &= \sum_{\text{all }s} E(T \mid S=s)P(S=s) \\ \\ &= E(E(T \mid S)) \end{align*}

9.2.3Example: Random Sums

Let X1,X2,X_1, X_2, \ldots be i.i.d. and let E(X1)=μXE(X_1) = \mu_X. Let NN be a non-negative integer valued random variable that is independent of the sequence of XX’s and let E(N)=μNE(N) = \mu_N.

Define the random sum SS to be

S=X1+X2++XNS = X_1 + X_2 + \ldots + X_N

where S=0S = 0 if N=0N=0.

Notice that SS is the sum of a random number of terms.

Question: What is E(S)E(S)?

Answer: If NN were the constant 10, then the answer would be 10μX10\mu_X. This is our signal to condition on NN. Here are the steps to follow.

  • First condition on a fixed value of NN. Given N=nN=n, SS is the sum of nn i.i.d. terms. Hence

E(SN=n)=nμXE(S \mid N=n) = n\mu_X

This is an equality of real numbers. Note that it is true for all nn, including 0.

  • Next write the conditional expectation in random variable notation.

E(SN)=NμXE(S \mid N) = N\mu_X

This is an equality of random variables.

  • Now use iterated expectations.

E(S)=E(E(SN))=E(NμX)=E(N)μX=μNμXE(S) = E(E(S \mid N)) = E(N\mu_X) = E(N)\mu_X = \mu_N\mu_X

This is a natural answer. It is the expected number of terms being added times the expected size of each of those terms.

This is an important point to note about calculating expectations by conditioning. The natural answer is often correct.

9.2.4Example: Population Size in a Branching Process

In a Galton-Watson branching process, each individual has a random number of progeny. Assume that the numbers of progeny of the different indviduals are i.i.d. with mean μ\mu. Suppose the process starts with one individual in Generation 0.

Question: Assuming that there are no deaths, what is the expected total number of individuals in Generations 0 through nn?

Answer: Let TkT_k be the number of individuals born in Generation kk. We are assuming T0=1T_0 = 1. By the example above, for each k>1k > 1,

E(Tk)=E(Tk1)μE(T_k) = E(T_{k-1})\mu

So by induction, for each k>1k > 1 the expected number of people in Generation kk is

E(Tk)=μkE(T_k) = \mu^k

Indeed, the result is true for k=0k=0 as well. So the expected total number of people through Generation nn is

k=0nμk={nif μ=11μn+11μ=μn+11μ1if μ1\sum_{k=0}^n \mu^k = \begin{cases} n & \text{if } \mu = 1 \\ \frac{1 - \mu^{n+1}}{1 - \mu} = \frac{\mu^{n+1} - 1}{\mu - 1} & \text{if } \mu \ne 1 \end{cases}

The value of μ\mu, the expected number of progeny of a single individual, determines how this expected total behaves as nn gets large. Even with no deaths, if μ<1\mu < 1 the expected population size tends to a positive constant as nn \to \infty. But if μ1\mu \ge 1 then the expected population size explodes.

This is closely related to the R0R_0 value you might have read about in the context of the Covid-19 pandemic. R0R_0 is notation for the average number of people infected by a single individual, and is the equivalent of μ\mu in our example. The assumptions of the Covid-19 models are more complex than ours, but the conclusion is the same: for the epidemic to be under control, R0R_0 has to be below 1.

9.2.5Other Properties of Conditional Expectation

The most important property of conditional expectation is the iteration that we have studied in this section. But conditional expectation has other properties that are analogous to those of expectation. They are now expressed as equalities of random variables instead of equalities of real numbers.

Go through the list and notice that all the moves you’d naturally want to make are justified. The proofs are routine; we won’t go through them.

  • Additivity.  E(T+US)=E(TS)+E(US)~E(T+U \mid S) = E(T \mid S) + E(U \mid S)

  • Linear Transformation.  E(aT+bS)=aE(TS)+b~E(aT+b \mid S) = aE(T \mid S) + b

Two more properties formalize the idea that the variable that is given can be treated as a constant in conditional expectations.

  • “Constant”: Let gg be a function. Then E(g(S)S)=g(S)E(g(S) \mid S) = g(S).

  • “Pulling out a Constant”:  E(g(S)TS)=g(S)E(TS)~E(g(S)T \mid S) = g(S)E(T \mid S).

For example,

E(3ST+log(S)U+7S)=3SE(TS)+log(S)E(US)+7E(3ST + \log(S)U + 7 \mid S) = 3SE(T \mid S) + \log(S)E(U \mid S) + 7

though we sincerely hope you won’t encounter a random variable as bizarre as this.