# HIDDEN
import warnings
warnings.filterwarnings('ignore')
from datascience import *
from prob140 import *
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
from scipy import stats
import warnings
warnings.filterwarnings("ignore")
If you know E(X) and SD(X) you can get some idea of how much probability there is in the tails of the distribution of X.
In this section we are going to get upper bounds on probabilities such as the gold area in the graph below. That’s P(X≥20) for the random variable X whose distribution is displayed in the histogram.
# NO CODE
x = np.arange(1, 26)
probs = (1/x)/sum(1/x)
dist = Table().values(x).probabilities(probs)
Plot(dist, event=np.arange(20, 26), show_ev=True)
plt.xlim(0, 25);
from IPython.display import YouTubeVideo
YouTubeVideo('WIokPScne_8')
Loading...
🎥 See More
Loading...
The function h is the indicator defined by h(x)=I(x≥c). So h(X)=I(X≥c) and E(h(X))=P(X≥c).
The function g is constructed so that the graph of g is a straight line that is at or above the graph of h on [0,∞), with the two graphs meeting at x=0 and x=c. The equation of the straight line is g(x)=x/c.
Thus g(X)=X/c and hence E(g(X))=E(X/c)=E(X)/c.
By construction, g(x)≥h(x) for x≥0. Since X is a non-negative random variable, P(g(X)≥h(X))=1.
Let X be a non-negative random variable. Then for any c>0,
P(X≥c)≤cE(X)
This result is called a “tail bound” because it puts an upper limit on how big the right tail at c can be. It is worth noting that P(X>c)≤P(X≥c)≤E(X)/c by Markov’s bound.
In the figure below, E(X)=6.5 and c=20. Markov’s inequality says that the gold area is at most
206.5=0.325
You can see that the bound is pretty crude. The gold area is clearly quite a bit less than 0.325.
# NO CODE
x = np.arange(1, 26)
probs = (1/x)/sum(1/x)
dist = Table().values(x).probabilities(probs)
Plot(dist, event=np.arange(20, 26), show_ev=True)
plt.xlim(0, 25);
Another way to think of Markov’s bound is that if X is a non-negative random variable with expectation μX, then
P(X≥kμX)≤k1for all k>0
That is, P(X≥2μX)≤1/2, P(X≥5μX)≤1/5, and so on. The chance that a non-negative random variable is at least k times the mean is at most 1/k.
Notes:
k need not be an integer. For example, the chance that a non-negative random variable is at least 3.8 times the mean is at most 1/3.8.
If k≤1, the inequality doesn’t tell you anything you didn’t already know. If k≤1 then Markov’s bound is 1 or greater. All probabilities are bounded above by 1, so the inequality is true but useless for k≤1.
When k is large, the bound does tell you something. You are looking at a probability quite far out in the tail of the distribution, and Markov’s bound is 1/k which is small.
Markov’s bound only uses E(X), not SD(X). To get bounds on tails it seems better to use SD(X) if we can. Chebyshev’s Inequality does just that. It provides a bound on the two tails outside an interval that is symmetric about E(X) as in the following graph.
# NO CODE
x = np.arange(31)
poi = stats.poisson.pmf(x, 9)
dist2 = Table().values(x).probabilities(poi)
Plot(dist2, event=np.append(np.arange(4), np.arange(15, 31, 1)), show_ev=True, show_sd=True)
YouTubeVideo('n6DilL4PzAQ')
Loading...
🎥 See More
Loading...
The red arrow marks μX as usual, and now the two blue arrows are at a distance of SD(X) on either side of the mean. The gold tails start at the same constant c on either side of μ. We will get an upper bound on the gold area by applying Markov’s Inequality to the non-negative random variable (X−μX)2.
P(∣X−μX∣≥c)=P((X−μX)2≥c2)≤c2E[(X−μX)2](Markov’s Inequality)=c2σX2(definition of variance)
The figure below is analogous to the figure drawn earlier to illustrate the derivation of Markov’s inequality.
The graph of the quadratic function g(x)=(x−μX)2/c2 is always at or above the graph of the indicator function h(x)=I(∣x−μX∣≥c).
Chebyshev’s Inequality is just a restatement of the fact that E(g(X))≥E(h(X))=P(∣X−μX∣≥c).
It is important to remember that Chebyshev’s Inequality just provides an upper bound on the total of two tail probabilities. It is not an exact probability or an approximation. The same upper bound applies for a single tail:
P(X−μX≥c)≤P(∣X−μX∣≥c)≤c2σX2
Don’t yield to the temptation of dividing the bound by 2. The two tails need not be equal. There is no assumption of symmetry.
Answer
Both upper bounds are 1/36
12.3.6Another Way of Writing Chebyshev’s Inequality¶
It is often going to be convenient to think of E(X) as “the origin” and to measure distances in units of SDs on either side.
Thus we can think of the two tails as the event “X is at least z SDs away from μX”, for some positive z. Chebyshev’s Inequality says
P(∣X−μX∣≥zσX)≤z2σX2σX2=z21
This is the form in which you saw Chebyshev’s Inequality in Data 8.
Chebyshev’s Inequality makes no assumptions about the shape of the distribution. It implies that no matter what the distribution of X looks like,
P(μX−2σX<X<μX+2σX)≥1−1/4=75%
P(μX−3σX<X<μX+3σX)≥1−1/9=88.88...%
P(μX−4σX<X<μX+4σX)≥1−1/16=93.75%
P(μX−5σX<X<μX+5σX)≥1−1/25=96%
That is, no matter what the shape of the distribution, the bulk of the probability is in the interval “expected value plus or minus a few SDs”.
This is one reason why the SD is a good measure of spread. No matter what the distribution, if you know the expectation and the SD then you have a pretty good sense of where the bulk of the probability is located.
If you happen to know more about the distribution then of course you can do better than Chebyshev’s bound. But in general Chebyshev’s bound is as well as you can do without making further assumptions.
To formalize the notion of "setting μX as the origin and measuring distances in units of σX, we define a random variable Z called “X in standard units” as follows:
Z=σXX−μX
Z measures how far X is above its mean, relative to its SD. In other words, X is Z SDs above the mean:
X=ZσX+μX
It is important to learn to go back and forth between these two scales of measurement, as we will be using standard units quite frequently. Note that by the linear function rules,
E(Z)=0andSD(Z)=1
no matter what the distribution of X is.
Also note that because Var(Z)=1, we have
E(Z2)=Var(Z)+(E(Z))2=1+02=1
Chebyshev’s Inequality says
P(∣X−μX∣≥zσX)≤z21
which is the same as saying
P(∣Z∣≥z)≤z21
So if you have converted a random variable to standard units, the overwhelming majority of the values of the standardized variable should be in the range -5 to 5. It is possible that there are values outside that range, but it is not likely.