Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

If the form of a distribution is intractable in that it is difficult to find exact probabilities by integration, then good estimates and bounds become important. Bounds on the tails of the distribution of a random variable help us quantify roughly how close to the mean the random variable is likely to be.

We already know two such bounds. Let XX be a random variable with expectation μ\mu and SD σ\sigma.

19.4.1Markov’s Bound on the Right Hand Tail

If XX is non-negative,

P(Xc)  μcP(X \ge c) ~ \le ~ \frac{\mu}{c}

This bound depends only on the first moment of XX (and the fact that XX is non-negative).

19.4.2Chebychev’s Bound on Two Tails

P(Xμc)  σ2c2P(\lvert X - \mu\rvert \ge c) ~ \le ~ \frac{\sigma^2}{c^2}

This bound depends on the first and second moments of XX since σ2=E(X2)(E(X))2\sigma^2 = E(X^2) - (E(X))^2.

In cases where both bounds apply, Chebyshev often does better than Markov because it uses two moments instead of one. So it is reasonable to think that the more moments you know, the closer you can get to the tail probabilities.

Moment generating functions can help get good bounds on tail probabilities. In what follows, we will assume that the moment generating function of XX is finite over the whole real line. If it is finite only over a smaller interval around 0, the calculations of the mgf below should be confined to that interval.

🎥 See More
Loading...

19.4.3Exponential Bounds on Tails

Let XX be a random variable with mgf MXM_X. We are going to find an upper bound for the right hand tail probability P(Xc)P(X \ge c) for a fixed cc.

To see how the moment generating function comes in, fix t>0t > 0. The function defined by g(x)=etxg(x) = e^{tx} is increasing as well as non-negative. Because it is increasing,

Xc      etXetcX \ge c ~ \iff ~ e^{tX} \ge e^{tc}

Since etXe^{tX} is a non-negative random variable, we can apply Markov’s inequality as follows.

P(Xc) =P(etXetc) E(etX)etc    (Markov’s bound)= MX(t)etc= MX(t)etc\begin{align*} P(X \ge c) ~ &= P(e^{tX} \ge e^{tc}) \\ &\le ~ \frac{E(e^{tX})}{e^{tc}} ~~~~ \text{(Markov's bound)} \\ &= ~ \frac{M_X(t)}{e^{tc}} \\ &=~ M_X(t)e^{-tc} \end{align*}

Since tt is fixed, MX(t)M_X(t) is constant. So we have shown that P(Xc)P(X \ge c) is falling exponentially as a function of cc.

19.4.4Chernoff Bound on the Right Tail

The calculation above is the first step in developing a Chernoff bound on the right hand tail probability P(Xc)P(X \ge c) for a fixed cc.

For the next step, notice that you can choose tt to be any positive number. For our fixed cc, some choices of tt will give sharper upper bounds than others. The sharpest among all of the bounds will correspond to the value of tt that minimizes the right hand side. So the Chernoff bound has an optimized form:

P(Xc)  mint>0MX(t)etcP(X \ge c) ~ \le ~ \min_{t > 0} M_X(t)e^{-tc}

19.4.5Application to the Normal Distribution

Suppose XX has the normal (μ,σ2)(\mu, \sigma^2) distribution and we want to get a sense of how far XX can be above the mean. Fix c>0c > 0. The exact chance that the value of XX is at least cc above the mean is

P(Xμc) = 1Φ(c/σ)P(X - \mu \ge c) ~ = ~ 1 - \Phi(c/\sigma)

because the distribution of XμX - \mu is normal (0,σ2)(0, \sigma^2). This exact answer looks neat and tidy, but the standard normal cdf Φ\Phi is not easy to work with analytically. Sometimes we can gain more insight from a good bound.

The optimized Chernoff bound is

P(Xμc)  mint>0MXμ(t)etc= mint>0eσ2t2/2etc= mint>0ect+σ2t2/2\begin{align*} P(X- \mu \ge c) ~ &\le ~ \min_{t > 0} M_{X-\mu}(t)e^{-tc} \\ \\ &= ~ \min_{t > 0} e^{\sigma^2t^2/2} \cdot e^{-tc} \\ \\ &= ~ \min_{t > 0} e^{-ct + \sigma^2t^2/2} \end{align*}

The curve below is the graph of exp(ct+σ2t2/2)\exp(-ct + \sigma^2t^2/2) as a function of tt, in the case σ=2\sigma = 2 and c=5c = 5. The flat line is the exact probability of P(Xμc)P(X - \mu \ge c). The curve is always above the flat line: no matter what tt is, the bound is an upper bound. The sharpest of all the upper bounds corresponds to the minimizing value tt^* which is somewhere in the 1.2 to 1.3 range.

<Figure size 432x288 with 1 Axes>

To find the minimizing value of tt analytically, we will use the standard calculus method of minimization. But first we will simplify our calculations by observing that finding the point at which a positive function is minimized is the same as finding the point at which the log of the function is minimized. This is because log\log is an increasing function.

So the problem reduces to finding the value of tt that minimizes the function h(t)=ct+σ2t2/2h(t) = -ct + \sigma^2t^2/2. By differentiation, the minimizing value of tt solves

c=σ2tc = \sigma^2 t^*

and hence

t=cσ2t^* = \frac{c}{\sigma^2}

So the Chernoff bound is

P(Xμc)ect+σ2t2/2=ec2/2σ2P(X - \mu \ge c) \le e^{-ct^* + \sigma^2{t^*}^2/2} = e^{-c^2/2\sigma^2}

Compare this with the bounds we already have. Markov’s bound can’t be applied directly as XμX - \mu can have negative values. Because the distribution of XμX - \mu is symmetric about 0, Chebychev’s bound becomes

P(Xμc)σ22c2P(X - \mu \ge c ) \le \frac{\sigma^2}{2c^2}

When cc is large, the optimized Chernoff bound is quite a bit sharper than Chebychev’s. In the case σ=2\sigma = 2, the graph below shows the exact value of P(Xμc)P(X - \mu \ge c) as a function of cc, along with the Chernoff and Chebychev bounds.

<Figure size 432x288 with 1 Axes>

19.4.6Chernoff Bound on the Left Tail

By an analogous argument we can derive a Chernoff bound on the left tail of a distribution. For a fixed t>0t > 0, the function g(x)=etxg(x) = e^{-tx} is decreasing and non-negative. So for t>0t > 0 and any fixed cc,

P(Xc)=P(etXetc)E(etX)etc=MX(t)etcP(X \le c) = P(e^{-tX} \ge e^{-tc}) \le \frac{E(e^{-tX})}{e^{-tc}} = \frac{M_X(-t)}{e^{-tc}}

and therefore

P(Xc)mint>0MX(t)etcP(X \le c) \le \min_{t > 0} \frac{M_X(-t)}{e^{-tc}}

19.4.7Sums of Independent Random Variables

The Chernoff bound is often applied to sums of independent random variables. Let X1,X2,,XnX_1, X_2, \ldots, X_n be independent and let Sn=X1+X2++XnS_n = X_1 + X_2 + \ldots + X_n. Fix any number cc. For every t>0t > 0,

P(Snc)MSn(t)etc=i=1nMXi(t)etcP(S_n \ge c) \le \frac{M_{S_n}(t)}{e^{tc}} = \frac{\prod_{i=1}^n M_{X_i}(t)}{e^{tc}}

This result is useful for finding bounds on binomial tails because the moment generating function of a Bernoulli random variable has a straightforward form. It is also used for bounding tails of sums of independent indicators with possibly different success probabilities. We will leave all this for a subsequent course.