Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

14.4.1Plotting Normal Curves

The prob140 function Plot_norm takes three arguments and displays the corresponding normal curve. The arguments are:

  • the interval over which to draw the curve, as a list or array with the two endpoints

  • the mean

  • the SD

Plot_norm([-4, 4], 0, 1)
<Figure size 432x288 with 1 Axes>

You can shade all the area to the left of a point xx, by providing the point xx as the right_end of the interval (,x](-\infty, x].

Plot_norm([-4, 4], 0, 1, right_end=1.5)
<Figure size 432x288 with 1 Axes>

All the area to the right of a point:

Plot_norm([-4, 4], 0, 1, left_end=1.5)
<Figure size 432x288 with 1 Axes>

The area between two points:

Plot_norm([-4, 4], 0, 1, right_end=-1, left_end=1.5)
<Figure size 432x288 with 1 Axes>

14.4.2Φ\Phi and Φ1\Phi^{-1}

All the areas displayed above can be expressed in terms of the standard normal cdf Φ\Phi.

Recall that the standard normal cdf Φ\Phi is the function defined by

Φ(x)=xϕ(z)dz ,    <x<\Phi(x) = \int_{-\infty}^x \phi(z)dz ~, ~~~~ -\infty < x < \infty

where ϕ\phi is the standard normal curve.

For each xx, the value of Φ(x)\Phi(x) is an area under the standard normal curve. The function Φ\Phi takes a real number xx as its argument and returns a proportion pp which is all the area to the left of xx under the standard normal curve.

<Figure size 432x288 with 1 Axes>

It will also be helpful to go the other way, and identify the xx such that Φ(x)\Phi(x) is a specified value pp. In other words, we will need Φ1\Phi^{-1}, the inverse of Φ\Phi, which is determined by

Φ(z) = p        Φ1(p)=z\Phi(z) ~ = ~ p ~~ \iff ~~ \Phi^{-1}(p) = z

For each pp in the interval (0,1)(0, 1), the value of Φ1(p)\Phi^{-1}(p) is a point on the horizontal axis of the graph of the standard normal curve.

<Figure size 432x288 with 1 Axes>

14.4.3Φ\Phi and Φ1\Phi^{-1} in SciPy

As we noted in the previous section, there is no closed form formula for Φ\Phi. So there also isn’t one for Φ1\Phi^{-1}. But most computational systems provide excellent numerical approximations.

In SciPy the approximations are in the familiar stats module. For the standard normal cdf, use stats.norm.cdf just as you used stats.binom.cdf and so on. By default, stats.norm.cdf is based on the standard normal curve.

The area to the left of 1 under the standard normal curve:

stats.norm.cdf(1)
0.8413447460685429

The area between -1 and 1 under the standard normal curve can be found by using the cdf and subtraction in a familiar way:

stats.norm.cdf(1) - stats.norm.cdf(-1)
0.6826894921370859

In both examples above, we started with a point or points on the horizontal axis and used the cdf Φ\Phi to find a related area. We can also go backwards, by specfiying an area and using Φ1\Phi^{-1} to find a related point on the horizontal axis.

For example, if you want xx such that Φ(x)=0.9\Phi(x) = 0.9, you can use the percent point function stats.norm.ppf. The name comes from the expression “90% point” of the distribution, or equivalently, the 90th percentile.

stats.norm.ppf(0.9)
1.2815515655446004
<Figure size 432x288 with 1 Axes>

By the definition of an inverse, we should have Φ(Φ1(0.9))=0.9\Phi(\Phi^{-1}(0.9)) = 0.9. Let’s check that.

stats.norm.cdf(stats.norm.ppf(0.9))
0.8999999999999999

14.4.4Example

Suppose the weights of a sample of 100 people are i.i.d. with a mean of 150 pounds and an SD of 20 pounds. Then the total weight of the sampled people is roughly normal with mean 100×150=15,000100 \times 150 = 15,000 pounds and SD 100×20=200\sqrt{100} \times 20 = 200 pounds.

Who cares about the total weight of a random group of people? Ask those who construct stadiums, elevators, and airplanes.

# Approximate distribution of total weight

n = 100
mu = 150
sigma = 20

mean = n*mu
sd = (n**0.5)*sigma

plot_interval = make_array(mean-4*sd, mean+4*sd)

Plot_norm(plot_interval, mean, sd)
<Figure size 432x288 with 1 Axes>

The chance that the total weight of the sampled people is less than 15,100 pounds is approximately the gold area below. The CLT allows us to use the normal curve as an approximation to the unknown exact distribution of the total weight.

Plot_norm(plot_interval, mean, sd, right_end=15100)
<Figure size 432x288 with 1 Axes>

The function stats.norm.cdf takes the mean and SD as optional arguments. Remember that the names mean and sd were assigned in an earlier cell. Also remember that the answer below is not exact but an approximation based on the CLT.

stats.norm.cdf(15100, mean, sd)
0.6914624612740131

To find the approximate 90th percentile of the distribution of weights, you can use stats.norm.ppf with the mean and SD as arguments.

stats.norm.ppf(0.9, mean, sd)
15256.31031310892

The conclusion is P(S15256)0.9P(S \le 15256) \approx 0.9 where SS denotes the total weight.

14.4.5Using Standard Units

While it convenient to be able to enter the mean and SD as arguments to stats.norm.cdf and stats.norm.ppf, the fundamental curve is the standard normal curve. All the others are obtained by linear transformations.

Therefore all the calculations above can be done in terms of the standard normal cdf by standardizing, and therefore all normal approximations can (and will) be written in terms of the standard normal cdf Φ\Phi. We don’t need to use a different cdf for each mean and SD.

For example, we can redo the two calculations above as follows.

To find the approximate chance that the total weight is less than 15100 pounds, first standardize 15100 and then use the standard normal cdf:

P(S<15100)  Φ(1510015000200)P(S < 15100) ~ \approx ~ \Phi \big( \frac{15100 - 15000}{200} \big)

The calculation gives the same answer as before.

z = (15100 - mean)/sd

stats.norm.cdf(z) 
0.6914624612740131

To find 90th percentile of the approximate distribution of the SS, first find the 90th percentile of the standard normal curve. This value is the 90th percentile of any normal curve, measured in standard units.

z = stats.norm.ppf(0.9)
z
1.2815515655446004

Now convert the standard units back to pounds. The 90th percentile of the distribution of SS is approximately Φ1(0.9)200+15000\Phi^{-1}(0.9)\cdot200 + 15000. The numerical answer is the same as before.

x = z*sd + mean
x
15256.31031310892