Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

This section consists of examples based on one important fact:

The sum of independent normal variables is normal.

We will prove the fact in a later section using moment generating functions. For now, we will just run a quick simulation and then see how to use the fact in examples.

mu_X = 10
sigma_X = 2
mu_Y = 15
sigma_Y = 3
x = stats.norm.rvs(mu_X, sigma_X, size=10000)
y = stats.norm.rvs(mu_Y, sigma_Y, size=10000)
s = x+y
Table().with_column('S = X+Y', s).hist(bins=20)
plt.title('$X$ is normal (10, $2^2$); $Y$ is normal (15, $3^2$) independent of $X$');
<matplotlib.figure.Figure at 0x1a1a410588>

The simulation above generates 10,000 copies of X+YX+Y where XX has the normal distribution with mean 10 and SD 2 and YY is independent of XX and has the normal distribution with mean 15 and SD 3. The distribution of the sum is clearly normal. You can vary the parameters and check that the distribution of the sum has the same shape, though with different labels on the axes.

To identify which normal, you have to find the mean and variance of the sum. Just use properties of the mean and variance:

If XX has the normal (μX,σX2)(\mu_X, \sigma_X^2) distribution, and YY independent of XX has the normal (μY,σY2)(\mu_Y, \sigma_Y^2) distribution, then the distribution of X+YX+Y is normal with mean μX+μY\mu_X + \mu_Y and variance σX2+σY2\sigma_X^2 + \sigma_Y^2.

This means that we don’t need the joint density of XX and YY to find probabilities of events determined by X+YX+Y.

18.2.1Sums of IID Normal Variables

Let X1,X2,,XnX_1, X_2, \ldots, X_n be i.i.d. normal with mean μ\mu and variance σ2\sigma^2. Let Sn=X1+X2++XnS_n = X_1 + X_2 + \ldots + X_n. Then the distribution of SnS_n is normal with mean nμn\mu and variance nσ2n\sigma^2.

This looks rather like the Central Limit Theorem but notice that there is no assumption that nn is large, and no approximation.

If the underlying distribution is normal, then the distribution of the i.i.d. sample sum is normal regardless of the sample size.

18.2.2The Difference of Two Independent Normal Variables

If YY is normal, then so is Y-Y. So if XX and YY are independent normal variables then XYX-Y is normal with mean μXμY\mu_X - \mu_Y and variance given by

Var(XY) = Var(X)+Var(Y) = σX2+(1)2σY2 = σX2+σY2Var(X - Y) ~ = ~ Var(X) + Var(-Y) ~ = ~ \sigma_X^2 + (-1)^2\sigma_Y^2 ~ = ~ \sigma_X^2 + \sigma_Y^2

For example, let the heights of Persons A and B be HAH_A and HBH_B respectively, and suppose HAH_A and HBH_B are i.i.d. normal with mean 66 inches and SD 3 inches. Then the chance that Person A is more than 2 inches taller than Person B is

P(HA>HB+2)=P(HAHB>2)=1Φ(2018)P(H_A > H_B + 2) = P(H_A - H_B > 2) = 1 - \Phi\big(\frac{2 - 0}{\sqrt{18}}\big)

because HAHBH_A - H_B is normal with mean 0 and SD 32+32=18=4.24\sqrt{3^2 + 3^2} = \sqrt{18} = 4.24 inches.

<matplotlib.figure.Figure at 0x1a1a4b7be0>
mu = 0
sigma = 18**0.5
1 - stats.norm.cdf(2, mu, sigma)
0.31867594411696853

18.2.3Comparing Two Sample Proportions

A candidate is up for election. In State 1, 50% of the voters favor the candidate. In State 2, only 27% of the voters favor the candidate. A simple random sample of 1000 voters is taken in each state. You can assume that the samples are independent of each other and that there are millions of voters in each state.

Question. Approximately what is the chance that in the sample from State 1, the proportion of voters who favor the candidate is more than twice as large as the proportion in the State 2 sample?

Answer. For i=1,2i = 1, 2, let XiX_i be the proportion of voters who favor the candidate in the sample from State ii. We want the approximate value of P(X1>2X2)P(X_1 > 2X_2). By the Central Limit Theorem, both X1X_1 and X2X_2 are approximately normal. So X12X2X_1 - 2X_2 is also approximately normal.

Now it’s just a matter of figuring out the mean and the SD.

E(X12X2) = 0.52×0.27=0.04E(X_1 - 2X_2) ~ = ~ 0.5 - 2\times 0.27 = -0.04
Var(X1)=0.5×0.51000=0.00025,      Var(X2)=0.27×0.731000=0.000197Var(X_1) = \frac{0.5 \times 0.5}{1000} = 0.00025, ~~~~~~ Var(X_2) = \frac{0.27 \times 0.73}{1000} = 0.000197
Var(X12X2)=Var(X1)+4Var(X2)=0.00104,      SD(X12X2)=0.03222Var(X_1 - 2X_2) = Var(X_1) + 4Var(X_2) = 0.00104, ~~~~~~ SD(X_1 - 2X_2) = 0.03222

So

P(X1>2X2) = P(X12X2>0)  1Φ(0(0.04)0.03222)  10.7%P(X_1 > 2X_2) ~ = ~ P(X_1 - 2X_2 > 0) ~ \approx ~ 1 - \Phi \big( \frac{0 - (-0.04)}{0.03222} \big) ~ \approx ~ 10.7\%
mu = 0.5 - 2*0.27
var = (0.5*0.5/1000) + 4*(0.27*.73/1000)
sigma = var**0.5
1 - stats.norm.cdf(0, mu, sigma)
0.1072469993885582