Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

🎥 See More
Loading...

Let XX have density ff. Let gg be a real valued function on the real line, and suppose you want to find E(g(X))E(g(X)). Then you can follow a procedure analogous to the non-linear function rule we developed for finding expectations of functions of discrete random variables.

  • Write a generic value of XX: that’s xx.

  • Apply the function gg to get g(x)g(x).

  • Weight g(x)g(x) by the chance that XX is just around xx, resulting in the product g(x)f(x)dxg(x) \cdot f(x)dx.

  • “Sum” over all xx, that is, integrate.

The expectation is

E(g(X)) = g(x)f(x)dxE(g(X)) ~ = ~ \int_{-\infty}^{\infty} g(x)\cdot f(x)dx

Technical Note: We must be careful here as gg is an arbitrary function and the integral above need not exist. If gg is non-negative, then the integral is either finite or diverges to ++\infty, but it doesn’t oscillate. So if gg is non-negative, define

E(g(X)) = g(x)f(x)dx   provided the integral is finite.E(g(X)) ~ = ~ \int_{-\infty}^{\infty} g(x)\cdot f(x)dx ~~~ \text{provided the integral is finite.}

For a general gg, first check whether E(g(X))E(\lvert g(X) \rvert ) is finite, that is, whether

g(x)f(x)dx < \int_{-\infty}^{\infty} \lvert g(x) \rvert \cdot f(x)dx ~ < ~ \infty

If it is finite then there is a theorem that says g(x)f(x)dx\int_{-\infty}^{\infty} g(x)\cdot f(x)dx exists, so it makes sense to define

E(g(X)) = g(x)f(x)dxE(g(X)) ~ = ~ \int_{-\infty}^{\infty} g(x)\cdot f(x)dx

Non-technical Note: In almost all of our examples, we will not be faced with questions about the existence of integrals. For example, if the set of possible values of g(X)g(X) is bounded, then its expectation exists. But we will see a few examples of random variables that don’t have expectations. Such random variables have “heavy tails” and are important in many applications.

All the properties of means, variances, and covariances that we proved for discrete variables are still true. The proofs need to be rewritten for random variables with densities, but we won’t take the time to do that. Just use the properties as you did before. The Central Limit Theorem holds as well.

15.3.1Uniform (0,1)(0, 1)

The random variable UU is uniform on the unit interval if its density is flat over that interval and zero everywhere else:

fU(u)={1      if 0<u<10      otherwisef_U(u) = \begin{cases} 1 ~~~~~~ \text{if } 0 < u < 1 \\ 0 ~~~~~~ \text{otherwise} \end{cases}
<Figure size 432x288 with 1 Axes>

The area under fUf_U over an interval is a rectangle. So it follows easily that the probability of an interval is its length relative to the total length of the unit interval, which is 1. For example, for every pair u1u_1 and u2u_2 with u1<u2u_1 < u_2,

P(u1<U<u2) = u2u1P(u_1 < U < u_2) ~ = ~ u_2 - u_1

Equivalently, the cdf of UU is

FU(u)={0   if u0u   if 0<u<11   if u1F_U(u) = \begin{cases} 0 ~~~ \text{if } u \le 0 \\ u ~~~ \text{if } 0 < u < 1 \\ 1 ~~~ \text{if } u \ge 1 \end{cases}
<Figure size 432x288 with 1 Axes>

The expectation E(U)E(U) doesn’t require an integral either. It’s the balance point of the density “curve”, which is 1/2. But if you insist, you can integrate:

E(U) = 01u1du = 12E(U) ~ = ~ \int_0^1 u\cdot 1du ~ = ~ \frac{1}{2}

For the variance, you do have to integrate. By the formula for expectation given at the start of this section,

E(U2) = 01u21du = 13               Var(U) = 13(12)2 = 112E(U^2) ~ = ~ \int_0^1 u^2\cdot 1du ~ = ~ \frac{1}{3} ~~~~~~~~~~~~~~~ Var(U) ~ = ~ \frac{1}{3} - \big(\frac{1}{2}\big)^2 ~ = ~ \frac{1}{12}

15.3.2Uniform (a,b)(a, b)

Fix a<ba < b. The uniform distribution on (a,b)(a, b) is flat over the interval (a,b)(a, b) and 0 elsewhere. Since its graph is a rectangle and the total area must be 1, the height of the rectangle is 1ba\frac{1}{b-a}.

So if XX has the uniform (a,b)(a, b) distribution, then the density of XX is

fX(x) = 1ba,    a<x<bf_X(x) ~ = ~ \frac{1}{b-a}, ~~~~ a < x < b

and 0 elsewhere. Probabilities are still relative lengths, so the cdf of XX is

FX(x) = xaba,    a<x<bF_X(x) ~ = ~ \frac{x - a}{b - a}, ~~~~ a < x < b

The expectation and variance of XX can be derived with little calculation once you notice that XX can be created by starting with a uniform (0,1)(0, 1) random variabe UU.

  • Step 1: UU is uniform on (0,1)(0, 1)

  • Step 2: (ba)U(b-a)U is uniform on (0,(ba))(0, (b-a))

  • Step 3: X=a+(ba)UX = a + (b-a)U is uniform on (a,b)(a, b).

Now XX is a linear transformation of UU, so

E(X) = a+(ba)E(U) = a+ba2 = a+b2E(X) ~ = ~ a + (b-a)E(U) ~ = ~ a + \frac{b-a}{2} ~ = ~ \frac{a+b}{2}

which is the midpoint of (a,b)(a, b). Also,

Var(X) = (ba)212Var(X) ~ = ~ \frac{(b-a)^2}{12}

15.3.3Example: Random Discs

A screen saver chooses a random radius uniformly in the interval (0,2)(0, 2) centimeters and draws a disc with that radius. Then it chooses another radius in the same way, independently of the first, and draws another disc. And so on.

Question 1. Let SS be the area of the first disc. Find E(S)E(S).

Answer. Let RR be the radius of the first disc. Then S=πR2S = \pi R^2. So

E(S) = πE(R2) = π(Var(R)+(E(R))2) = π(412+12) = 4.19 cm2E(S) ~ = ~ \pi E(R^2) ~ = ~ \pi\big(Var(R) + (E(R))^2\big) ~ = ~ \pi\big( \frac{4}{12} + 1^2\big) ~ = ~ 4.19 ~ cm^2
np.pi * (4/12 + 1)
4.1887902047863905

Question 2. Let Rˉ\bar{R} be the average radius of the first 100 discs. Find a number cc so that P(Rˉ1<c)99%P(\lvert \bar{R} - 1 \rvert < c) \approx 99\%.

Answer. Let R1,R2,,R100R_1, R_2, \ldots , R_{100} be the first 100 radii. These are i.i.d. random variables, each with mean 1 and variance 4/124/12. So E(Rˉ)=1E(\bar{R}) = 1 and

SD(Rˉ)=4/12100 = 0.0577 cmSD(\bar{R}) = \frac{\sqrt{4/12}}{\sqrt{100}} ~ = ~ 0.0577 ~ \mbox{cm}
sd_rbar = ((4/12)**0.5)/(100**0.5)
sd_rbar
0.057735026918962574

By the Central Limit Theorem, the distribution of Rˉ\bar{R} is approximately normal. Let’s draw it using Plot_norm.

Plot_norm((0.8, 1.2), 1, sd_rbar)
plt.xlabel('Radius in Centimeters')
plt.title('Approximate Distribution of Sample Mean Radius');
<Figure size 432x288 with 1 Axes>

We are looking for cc such that there is about 99% chance that Rˉ\bar{R} is in the interval (1c,1+c)(1-c, 1+c). Therefore 1+c1 + c is the 99.5th (not 99th) percent point of the curve above, from which you can find cc.

z = stats.norm.ppf(0.995)
z
2.5758293035489004
c = z*sd_rbar
c
0.14871557417904838

We can now get the endpoints of the interval. The graph below shows the corresponding area of 99%.

1-c, 1+c
(0.8512844258209517, 1.1487155741790485)
Plot_norm((0.8, 1.2), 1, sd_rbar, left_end = 1-c, right_end = 1+c)
plt.xticks([1-c, 1, 1+c])
plt.xlabel('Radius in Centimeters')
plt.title('Gold Area is Approximately 99%');
<Figure size 432x288 with 1 Axes>