Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

When the joint distribution of XX and YY is bivariate normal, the regression line of the previous section does even better than just being the best among all linear predictors of YY based on XX. In this section we will construct a bivariate normal pair (X,Y)(X, Y) from i.i.d. standard normal variables. In the next section, we will identify the main property of the regression line for bivariate normal (X,Y)(X, Y).

The multivariate normal distribution is defined in terms of a mean vector and a covariance matrix. As you know, normalizing the covariance makes it is easier to interpret. You have shown in exercises that for jointly distributed random variables XX and YY the correlation between XX and YY is defined as

rX,Y = Cov(X,Y)σXσY = E(XμXσXYμYσY) = E(XsuYsu)r_{X,Y} ~ = ~ \frac{Cov(X, Y)}{\sigma_X\sigma_Y} ~ = ~ E\Big( \frac{X-\mu_X}{\sigma_X} \cdot \frac{Y-\mu_Y}{\sigma_Y} \Big) ~ = ~ E(X_{su}Y_{su})

where XsuX_{su} is XX in standard units and YsuY_{su} is YY in standard units.

24.2.1Properties of Correlation

You showed all of these in exercises.

  • rX,Yr_{X,Y} depends only on standard units and hence is a pure number with no units

  • rX,Y=rY,Xr_{X,Y} = r_{Y,X}

  • 1rX,Y1-1 \le r_{X,Y} \le 1

  • If Y=aX+bY = aX + b then rX,Yr_{X,Y} is 1 or -1 according to whether the sign of aa is positive or negative.

We say that rX,Yr_{X,Y} measures the linear association between XX and YY.

24.2.2Correlation as a Cosine

Rewrite the formula for correlation to see that

Cov(X,Y) = rX,YσXσYCov(X, Y) ~ = ~ r_{X,Y}\sigma_X\sigma_Y

So the variance of X+YX+Y is

σX+Y2 = σX2+σY2+2rX,YσXσY\sigma_{X+Y}^2 ~ = ~ \sigma_X^2 + \sigma_Y^2 + 2r_{X,Y}\sigma_X\sigma_Y

Notice the parallel with the formula for the length of the sum of two vectors, with correlation playing the role of the cosine of the angle between two vectors. If the angle is 90 degrees, the the cosine is 0. This corresponds to correlation being zero and hence the random variables being uncorrelated.

Later in this section, we will visualize this idea in the case where the joint distribution of XX and YY is bivariate normal.

🎥 See More
Loading...
🎥 See More
Loading...

24.2.3Constructing the Standard Bivariate Normal

The goal is to construct XX and YY that have the multivariate normal distribution with mean vector [0 0]T[0 ~ 0]^T and covariance matrix [1ρρ1]\begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} for some ρ\rho such that 1<ρ<1-1 < \rho < 1. We will say that XX and YY have the standard bivariate normal distribution with correlation ρ\rho.

Any bivariate normal vector is a linear transformation of an i.i.d. standard normal vector. Start with two i.i.d. standard normal random variables XX and ZZ. We will construct the required bivariate normal random vector [X Y]T[X ~ Y]^T as a linear transformation of the random vector [X Z]T[X ~ Z]^T.

First note that since all three of XX, YY, and ZZ must have mean 0, the linear transformation has no shift term. We just need to identify numbers aa, bb, cc, and dd such that

[XY] = [abcd][XZ]\begin{bmatrix} X \\ Y \end{bmatrix} ~ = ~ \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} X \\ Z \end{bmatrix}

Taking a=1a=1 and b=0b=0 is a good start because it gives us the right first coordinate.

[10cd][XZ] = [XcX+dZ]\begin{bmatrix} 1 & 0 \\ c & d \end{bmatrix} \begin{bmatrix} X \\ Z \end{bmatrix} ~=~ \begin{bmatrix} X \\ cX+dZ \end{bmatrix}

Since both XX and YY must have variance 1, the covariance of XX and YY is equal to the correlation. So, by the independence of XX and ZZ,

ρ = Cov(X,cX+dZ) = cVar(X)=c\rho ~ = ~ Cov(X, cX+dZ) ~ = ~ cVar(X) = c

So now we have

[10ρd][XZ] = [XρX+dZ] = [XY]\begin{bmatrix} 1 & 0 \\ \rho & d \end{bmatrix} \begin{bmatrix} X \\ Z \end{bmatrix} ~=~ \begin{bmatrix} X \\ \rho X+dZ \end{bmatrix} ~ = ~ \begin{bmatrix} X \\ Y \end{bmatrix}

Since Var(Y)=1Var(Y) = 1, the final condition is 1=Var(ρX+dZ)=ρ2Var(X)+d2Var(Z)=ρ2+d21 = Var(\rho X + dZ) = \rho^2Var(X) + d^2Var(Z) = \rho^2 + d^2. So d=1ρ2d = \sqrt{1 - \rho^2} will work, and we have the following result.

  • Let XX be standard normal.

  • Let ZZ be standard normal, independent of XX.

  • Let Y=ρX+1ρ2ZY = \rho X + \sqrt{1-\rho^2}Z.

  • Then XX and YY have the standard bivariate normal distribution with correlation ρ\rho.

It is also true that if XX and YY are standard bivariate normal with correlation ρ\rho, then there is a standard normal ZZ independent of XX such that Y=ρX+1ρ2ZY = \rho X + \sqrt{1-\rho^2}Z. The proof is an exercise.

The graph below shows the empirical distribution of 1000 (X,Y)(X, Y) points in the case ρ=0.6\rho = 0.6. You can change the value of ρ\rho and see how the scatter diagram changes. It will remind you of numerous such simulations in Data 8.

# Plotting parameters
plt.figure(figsize=(5, 5))
plt.axes().set_aspect('equal')
plt.xlabel('$X$')
plt.ylabel('$Y$', rotation=0)
plt.xticks(np.arange(-4, 4.1))
plt.yticks(np.arange(-4, 4.1))

# X, Z, and Y
x = stats.norm.rvs(0, 1, size=1000)
z = stats.norm.rvs(0, 1, size=1000)
rho = 0.6
y = rho*x + np.sqrt((1-rho**2))*z
plt.scatter(x, y, color='darkblue', s=10);
<Figure size 360x360 with 1 Axes>

24.2.4Representations of the Bivariate Normal

When we are working with just two variables XX and YY, matrix representations are usually unnecessary. We will use the following three representations interchangeably.

  • XX and YY are bivariate normal with parameters (μX,μY,σX2,σY2,ρ)(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho)

  • The standardized variables XsuX_{su} and YsuY_{su} are standard bivariate normal with correlation ρ\rho. Then Ysu=ρXsu+1ρ2ZY_{su} = \rho X_{su} + \sqrt{1-\rho^2}Z for some standard normal ZZ that is independent of XsuX_{su}. This follows from Definition 2 of the multivariate normal.

  • XX and YY have the multivariate normal distribution with mean vector [μX μY]T[\mu_X ~ \mu_Y]^T and covariance matrix

[σX2ρσXσYρσXσYσY2]\begin{bmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{bmatrix}

24.2.5Standard Bivariate Normal: Matrix Approach

In lab, you used a matrix approach to constructing standard bivariate normal XX and YY with correlation ρ\rho. Here is a summary of the construction. The end result is the same as what we developed above.

Let XX and ZZ be independent standard normal variables, that is, bivariate normal random variables with mean vector 0\mathbf{0} and covariance matrix equal to the identity. Now fix a number ρ\rho (that’s the Greek letter rho, the lower case r) so that 1<ρ<1-1 < \rho < 1, and let

A = [10ρ1ρ2]\mathbf{A} ~ = ~ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1 - \rho^2} \end{bmatrix}

Define a new random variable Y=ρX+1ρ2ZY = \rho X + \sqrt{1-\rho^2}Z, and notice that

[XY] = [10ρ1ρ2][XZ] = A[XZ]\begin{bmatrix} X \\ Y \end{bmatrix} ~ = ~ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1 - \rho^2} \end{bmatrix} \begin{bmatrix} X \\ Z \end{bmatrix} ~ = ~ \mathbf{A} \begin{bmatrix} X \\ Z \end{bmatrix}

So XX and YY have the bivariate normal distribution with mean vector 0\mathbf{0} and covariance matrix

AIAT = [10ρ1ρ2][1ρ01ρ2] = [1ρρ1]\mathbf{AIA}^T ~ = ~ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1 - \rho^2} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ 0 & \sqrt{1 - \rho^2} \end{bmatrix} ~ = ~ \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}

24.2.6Correlation as a Cosine: Geometry in the Bivariate Normal Case

We have defined

Y = ρX+1ρ2ZY ~ = ~ \rho X + \sqrt{1 - \rho^2} Z

where XX and ZZ are i.i.d. standard normal.

Let’s understand this construction geometrically. A good place to start is the joint density of XX and ZZ, which has circular symmetry.

<matplotlib.figure.Figure at 0x1a0a333a90>

The XX and ZZ axes are orthogonal. Let’s see what happens if we twist them.

Take any positive angle θ\theta degrees and draw a new axis at angle θ\theta to the original XX axis. Every point (X,Z)(X, Z) has a projection onto this axis.

The figure below shows the projection of the point (X,Z)=(1,2)(X, Z) = (1, 2) onto the gold axis which is at an angle of θ\theta degrees to the XX axis. The blue segment is the value of XX. You get that by dropping the perpendicular from (1,2)(1, 2) to the horizontal axis. That’s called projecting (1,2)(1, 2) onto the horizontal axis.

The red segment is the projection of (1,2)(1, 2) onto the gold axes, obtained by dropping the perpendicular from (1,2)(1, 2) to the gold axis.

Vary the values of θ\theta in the cell below to see how the projection changes as the gold axis rotates.

theta = 20
projection_1_2(theta)
<Figure size 432x432 with 1 Axes>

Let YY be the length of the red segment, and remember that XX is the length of the blue segment. When θ\theta is very small, YY is almost equal to XX. When θ\theta approaches 90 degrees, YY is almost equal to ZZ.

A little trigonometry shows that Y = Xcos(θ)+Zsin(θ)Y ~ = ~ X \cos(\theta) + Z\sin(\theta).

projection_trig()
<Figure size 576x576 with 1 Axes>

Thus

Y = Xcos(θ)+Zsin(θ) = ρX+1ρ2ZY ~ = ~ X\cos(\theta) + Z\sin(\theta) ~ = ~ \rho X + \sqrt{1 - \rho^2}Z

where ρ=cos(θ)\rho = \cos(\theta).

The sequence of graphs below illustrates the transformation for θ=30\theta = 30 degrees.

theta = 30
projection_1_2(theta)
<matplotlib.figure.Figure at 0x1a1302f748>

The bivariate normal distribution is the joint distribution of the blue and red lengths XX and YY when the original point (X,Z)(X, Z) has i.i.d. standard normal coordinates. This transforms the circular contours of the joint density surface of (X,Z)(X, Z) into the elliptical contours of the joint density surface of (X,Y)(X, Y).

cos(theta), (3**0.5)/2
(0.8660254037844387, 0.8660254037844386)
rho = cos(theta)
Plot_bivariate_normal([0, 0], [[1, rho], [rho, 1]])
plt.title('Standard Bivariate Normal Distribution, Correlation = '+str(round(rho, 2)));
<matplotlib.figure.Figure at 0x1a128d2630>

24.2.7Small θ\theta

As we observed earlier, when θ\theta is very small there is hardly any change in the position of the axis. So XX and YY are almost equal.

theta = 2
projection_1_2(theta)
<Figure size 432x432 with 1 Axes>

The bivariate normal density of XX and YY, therefore, is essentially confined to the X=YX = Y line. The correlation cos(θ)\cos(\theta) is large because θ\theta is small; it is more than 0.999.

You can see the plotting function having trouble rendering this joint density surface.

rho = cos(theta)
rho
0.99939082701909576
Plot_bivariate_normal([0, 0], [[1, rho], [rho, 1]])
<Figure size 864x576 with 1 Axes>

24.2.8Orthogonality and Independence

When θ\theta is 90 degrees, the gold axis is orthogonal to the XX axis and YY is equal to ZZ which is independent of XX.

theta = 90
projection_1_2(theta)
<Figure size 432x432 with 1 Axes>

When θ=90\theta = 90 degrees, cos(θ)=0\cos(\theta) = 0. The joint density surface of (X,Y)(X, Y) is the same as that of (X,Z)(X, Z) and has circular symmetry.

If you think of ρX\rho X as a “signal” and 1ρ2Z\sqrt{1-\rho^2}Z as “noise”, then YY can be thought of as an observation whose value is “signal plus noise”. In the rest of the chapter we will see if we can separate the signal from the noise.