Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Define the deviation from the mean to be XμXX - \mu_X. Let’s see what we expect that to be. By the linear function rule,

E(XμX)=E(X)μX=μXμX=0E(X - \mu_X) = E(X) - \mu_X = \mu_X - \mu_X = 0

For every random variable, the expected deviation from the mean is 0. The positive deviations exactly cancel out the negative ones.

This cancellation prevents us from understanding how big the deviations are regardless of their sign. But that’s what we need to measure, if we want to measure the distance between the random variable XX and its expectation μX\mu_X.

We have to get rid of the sign of the deviation somehow. One time-honored way of getting rid of the sign of a number is to take the absolute value. The other is to square the number. That’s the method we will use. As you will see, it results in a measure of spread that is crucial for understanding the sums and averages of large samples.

Measuring the rough size of the squared deviations has the advantage that it avoids cancellation between positive and negative errors. The disadvantage is that squared deviations have units that are difficult to understand. The measure of spread that we are about to define takes care of this problem.

🎥 See More
Loading...

12.1.1Standard Deviation

Let XX be a random variable with expectation μX\mu_X. The standard deviation of XX, denoted SD(X)SD(X) or σX\sigma_X, is the root mean square (rms) of deviations from the mean:

SD(X)=σX=E((XμX)2)SD(X) = \sigma_X = \sqrt{ E\big( (X-\mu_X)^2 \big) }

SD(X)SD(X) has the same units as XX and E(X)E(X). In this chapter we will make precise the sense in which the standard deviation measures the spread of the distribution of XX about the center μX\mu_X.

The quantity inside the square root is called the variance of XX and has better computational properties than the SD. This turns out to be closely connected to the fact that by Pythagoras’ Theorem, squares of distances combine in useful ways.

Var(X)=σX2=E((XμX)2)Var(X) = \sigma_X^2 = E\big( (X-\mu_X)^2 \big)

Almost invariably, we will calculate standard deviations by first finding the variance and then taking the square root.

Let’s try out the definition of the SD on a random variable XX that has the distribution defined below.

x = make_array(3, 4, 5)
probs = make_array(0.2, 0.5, 0.3)
dist_X = Table().values(x).probability(probs)
dist_X
Loading...
dist_X.ev()
4.1

Here are the squared deviations from the expectation E(X)=4.1E(X) = 4.1.

sd_table = Table().with_columns(
    'x', dist_X.column(0),
    '(x - 4.1)**2', (dist_X.column(0)-4.1)**2,
    'P(X = x)', dist_X.column(1)
)
sd_table
Loading...

The standard deviation of XX is the square root of the mean squared deviation. The calculation below shows that its numerical value is SD(X)=0.7SD(X) = 0.7.

sd_X = np.sqrt(sum(sd_table.column(1)*sd_table.column(2)))
sd_X
0.7

The prob140 method sd applied to a distribution object returns the standard deviation, saving you the calculation above.

dist_X.sd()
0.7

We now know how to calculate the SD. But we don’t yet have a good understanding of what it does. Let’s start developing a few properties that it ought to have. Then we can check if it has them.

First, the SD of a constant should be 0. You should check that this is indeed what the definition implies.

12.1.2Shifting and Scaling

The SD is a measure of spread. It’s natural to want measures of spread to remain unchanged if we just shift a probability histogram to the left or right. Such a shift occurs when we add a constant to a random variable. The figure below shows the distribution of the same XX as above, along with the distribution of X+5X+5. It is clear that X+5X+5 should have the same SD as XX.

dist2 = Table().values(x+5).probability(probs)
Plots('X', dist_X, 'X+5', dist2)
<Figure size 432x288 with 1 Axes>

On the other hand, multiplying XX by a constant results in a distribution that should have a different spread. Here is the distribution of XX along with the distribution of 4X4X. The spread of the distribution of 4X4X appears to be four times as large as that of XX.

dist3 = Table().values(4*x).probability(probs)
Plots('X', dist_X, '4X', dist3 )
plt.xlim(0, 40);
<Figure size 432x288 with 1 Axes>

Multiplying by -4 should have the same effect on the spread as multiplying by 4, as the figure below shows. One histogram is just the mirror image of the other about the vertical axis at 0. There is no change in spread.

dist4 = Table().values(-4*x).probability(probs)
Plots('-4X', dist4, '4X', dist3 )
<Figure size 432x288 with 1 Axes>
🎥 See More
Loading...

12.1.3Linear Functions

The graphs above help us visualize what happens to the SD when the random variable is transformed linearly, for example when changing units of measurement. Let Y=aX+bY = aX + b. Then

Var(Y)=E[(YμY)2]=E[(aX+baμXb)2]=a2E[(XμX)2]=a2σX2\begin{align*} Var(Y) = E\big[ (Y-\mu_Y)^2 \big] &= E\big[ (aX + b - a\mu_X - b)^2 \big]\\ &= a^2 E \big[ (X - \mu_X)^2 \big]\\ &= a^2 \sigma_X^2 \end{align*}

Notice that the shift bb has no effect on the variance. This is consistent with what we saw in the first visualization above.

Because the units of the variance are the square of the units of XX, Var(Y)Var(Y) is a2a^2 times the variance of XX. That is,

Var(aX+b)=a2Var(X),                  SD(aX+b)=aσXVar(aX + b) = a^2Var(X), ~~~~~~~~~~~~~~~~~~ SD(aX + b) = |a|\sigma_X

Notice that you get the same answer when the multiplicative constant is aa as when it is a-a. That is what the two “mirror image” histograms had shown.

In particular, it is very handy to remember that SD(X)=SD(X)SD(X) = SD(-X).

12.1.4“Computational” Formula for Variance

An algebraic simplification of the formula for variance turns out to be very useful.

σX2=E((XμX)2)=E(X22XμX+μX2)=E(X2)2μXE(X)+μX2=E(X2)2μX2+μX2=E(X2)μX2\begin{align*} \sigma_X^2 &= E\big( (X-\mu_X)^2 \big) \\ \\ &= E(X^2 - 2X\mu_X + \mu_X^2) \\ \\ &= E(X^2) - 2\mu_XE(X) + \mu_X^2 \\ \\ &= E(X^2) - 2\mu_X^2 + \mu_X^2 \\ \\ &= E(X^2) - \mu_X^2 \end{align*}

Thus the variance is the “mean of the square minus the square of the mean.”

Apart from giving us an alternative way of calculating variance, the formula tells us something about the relation between E(X2)E(X^2) and μX2=(E(X))2\mu_X^2 = (E(X))^2. Since variance is non-negative, the formula shows that

E(X2)  (E(X))2E(X^2) ~ \ge ~ (E(X))^2

with equality only when XX is a constant.

The formula is often called the “computational” formula for variance. But it can be be numerically inaccurate if the possible values of XX are large and numerous. For algebraic computation, however, it is very useful, as you will see in the calculations below.

🎥 See More
Loading...

12.1.5Indicator

The values of an indicator random variable are 0 and 1. Each of those two numbers is equal to its square. So if II is an indicator, then I2=II^2 = I, and thus

Var(I)=E(I2)[E(I)]2=E(I)[E(I)]2=pp2=p(1p)Var(I) = E(I^2) - [E(I)]^2 = E(I) - [E(I)]^2 = p - p^2 = p(1-p)

You should check that this variance is largest when p=0.5p = 0.5. Take the square root to get

SD(I)=p(1p)SD(I) = \sqrt{p(1-p)}

12.1.6Uniform

Let UU be uniform on 1,2,3,,n1, 2, 3, \ldots, n. Then

E(U2)=k=1nk21n=1nk=1nk2=1nn(n+1)(2n+1)6=(n+1)(2n+1)6\begin{align*} E(U^2) &= \sum_{k=1}^n k^2 \cdot \frac{1}{n} \\ \\ &= \frac{1}{n} \sum_{k=1}^n k^2 \\ \\ &= \frac{1}{n} \cdot \frac{n(n+1)(2n+1)}{6} \\ \\ &= \frac{(n+1)(2n+1)}{6} \end{align*}

In the last-but-one step above, we used the formula for the sum of the first nn squares.

We know that E(U)=(n+1)/2E(U) = (n+1)/2, so

Var(U)=(n+1)(2n+1)6(n+1)24=n+12(2n+13n+12)=n2112Var(U) = \frac{(n+1)(2n+1)}{6} - \frac{(n+1)^2}{4} = \frac{n+1}{2} \big( \frac{2n+1}{3} - \frac{n+1}{2} \big) = \frac{n^2-1}{12}

and

SD(U)=n2112SD(U) = \sqrt{\frac{n^2-1}{12}}

By shifting, this is the same as the SD of the uniform distribution on any nn consecutive integers.

12.1.7Poisson

Let XX have the Poisson (μ)(\mu) distribution. In Chapter 8 we showed that

E(X2)=μ2+μE(X^2) = \mu^2 + \mu

We also know that E(X)=μE(X) = \mu. Thus

Var(X)=μ2+μμ2=μVar(X) = \mu^2 + \mu - \mu^2 = \mu

and

SD(X)=μSD(X) = \sqrt{\mu}

So for example if XX has the Poisson (5)(5) distribution, then E(X)=5E(X) = 5 and SD(X)=52.24SD(X) = \sqrt{5} \approx 2.24. In the this chapter and the next, we will try to figure out what that means.