# HIDDEN
import warnings
warnings.filterwarnings('ignore')
from datascience import *
from prob140 import *
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
from scipy import stats
from myst_nb import glue
import warnings
warnings.filterwarnings("ignore")
from IPython.display import YouTubeVideo
YouTubeVideo('gJkb-5YXly4')
Loading...
🎥 See More
Loading...
Let X have density f. Let g be a real valued function on the real line, and suppose you want to find E(g(X)). Then you can follow a procedure analogous to the non-linear function rule we developed for finding expectations of functions of discrete random variables.
Write a generic value of X: that’s x.
Apply the function g to get g(x).
Weight g(x) by the chance that X is just around x, resulting in the product g(x)⋅f(x)dx.
“Sum” over all x, that is, integrate.
The expectation is
E(g(X))=∫−∞∞g(x)⋅f(x)dx
Technical Note: We must be careful here as g is an arbitrary function and the integral above need not exist. If g is non-negative, then the integral is either finite or diverges to +∞, but it doesn’t oscillate. So if g is non-negative, define
E(g(X))=∫−∞∞g(x)⋅f(x)dxprovided the integral is finite.
For a general g, first check whether E(∣g(X)∣) is finite, that is, whether
∫−∞∞∣g(x)∣⋅f(x)dx<∞
If it is finite then there is a theorem that says ∫−∞∞g(x)⋅f(x)dx exists, so it makes sense to define
E(g(X))=∫−∞∞g(x)⋅f(x)dx
Non-technical Note: In almost all of our examples, we will not be faced with questions about the existence of integrals. For example, if the set of possible values of g(X) is bounded, then its expectation exists. But we will see a few examples of random variables that don’t have expectations. Such random variables have “heavy tails” and are important in many applications.
All the properties of means, variances, and covariances that we proved for discrete variables are still true. The proofs need to be rewritten for random variables with densities, but we won’t take the time to do that. Just use the properties as you did before. The Central Limit Theorem holds as well.
The area under fU over an interval is a rectangle. So it follows easily that the probability of an interval is its length relative to the total length of the unit interval, which is 1. For example, for every pair u1 and u2 with u1<u2,
The expectation E(U) doesn’t require an integral either. It’s the balance point of the density “curve”, which is 1/2. But if you insist, you can integrate:
E(U)=∫01u⋅1du=21
For the variance, you do have to integrate. By the formula for expectation given at the start of this section,
Fix a<b. The uniform distribution on (a,b) is flat over the interval (a,b) and 0 elsewhere. Since its graph is a rectangle and the total area must be 1, the height of the rectangle is b−a1.
So if X has the uniform (a,b) distribution, then the density of X is
fX(x)=b−a1,a<x<b
and 0 elsewhere. Probabilities are still relative lengths, so the cdf of X is
FX(x)=b−ax−a,a<x<b
The expectation and variance of X can be derived with little calculation once you notice that X can be created by starting with a uniform (0,1) random variabe U.
A screen saver chooses a random radius uniformly in the interval (0,2) centimeters and draws a disc with that radius. Then it chooses another radius in the same way, independently of the first, and draws another disc. And so on.
Question 1. Let S be the area of the first disc. Find E(S).
Answer. Let R be the radius of the first disc. Then S=πR2. So
E(S)=πE(R2)=π(Var(R)+(E(R))2)=π(124+12)=4.19cm2
np.pi * (4/12 + 1)
4.1887902047863905
Question 2. Let Rˉ be the average radius of the first 100 discs. Find a number c so that P(∣Rˉ−1∣<c)≈99%.
Answer. Let R1,R2,…,R100 be the first 100 radii. These are i.i.d. random variables, each with mean 1 and variance 4/12. So E(Rˉ)=1 and
SD(Rˉ)=1004/12=0.0577cm
sd_rbar = ((4/12)**0.5)/(100**0.5)
sd_rbar
0.057735026918962574
By the Central Limit Theorem, the distribution of Rˉ is approximately normal. Let’s draw it using Plot_norm.
Plot_norm((0.8, 1.2), 1, sd_rbar)
plt.xlabel('Radius in Centimeters')
plt.title('Approximate Distribution of Sample Mean Radius');
We are looking for c such that there is about 99% chance that Rˉ is in the interval (1−c,1+c). Therefore 1+c is the 99.5th (not 99th) percent point of the curve above, from which you can find c.
z = stats.norm.ppf(0.995)
z
2.5758293035489004
c = z*sd_rbar
c
0.14871557417904838
We can now get the endpoints of the interval. The graph below shows the corresponding area of 99%.
1-c, 1+c
(0.8512844258209517, 1.1487155741790485)
Plot_norm((0.8, 1.2), 1, sd_rbar, left_end = 1-c, right_end = 1+c)
plt.xticks([1-c, 1, 1+c])
plt.xlabel('Radius in Centimeters')
plt.title('Gold Area is Approximately 99%');