Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The equation of the regression line for predicting YY based on XX can be written in several equivalent ways. The regression equation, and the error in the regression estimate, are best understood in standard units. All the other representations follow by straightforward algebra.

Let XX and YY be bivariate normal with parameters (μX,μY,σX2,σY2,ρ)(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho). Then, as we have seen, the best predictor E(YX)E(Y \mid X) is a linear function of XX and hence the formula for E(YX)E(Y \mid X) is also the equation of the regression line.

24.4.1In Standard Units

Let XsuX_{su} be XX in standard units and YsuY_{su} be YY in standard units. The regression equation is

E(YsuXsu) = ρXsuE(Y_{su} \mid X_{su}) ~ = ~ \rho X_{su}

and the amount of error in the prediction is measured by

SD(YsuXsu) = 1ρ2SD(Y_{su} \mid X_{su}) ~ = ~ \sqrt{1 - \rho^2}

The conditional SD is in the same units as the prediction. The conditional variance is

Var(YsuXsu) = 1ρ2Var(Y_{su} \mid X_{su}) ~ = ~ 1 - \rho^2

We know more than just the conditional expectation and conditional variance. We know that the conditional distribution of YsuY_{su} given XsuX_{su} is normal. This allows us to find conditional probabilities given XsuX_{su}, by the usual normal curve methods. For example,

P(Ysu<ysuXsu=xsu) = Φ(ysuρxsu1ρ2)P(Y_{su} < y_{su} \mid X_{su} = x_{su}) ~ = ~ \Phi \big( \frac{y_{su} - \rho x_{su}}{\sqrt{1-\rho^2}} \big)

In one of Galton’s famous data sets, the distribution of the heights of father-son pairs was roughly bivariate normal with a correlation of 0.5. Of the fathers whose heights were two SDs above average, about what percent had sons whose heights were more than 2 SDs above average?

By the regression effect, you know this answer has to be less than 50%. If YsuY_{su} denotes the height of a randomly picked son in standard units, and XsuX_{su} the height of his father in standard units, then the proportion is approximately

P(Ysu>2Xsu=2) = 1Φ(20.5×210.52)P(Y_{su} > 2 \mid X_{su} = 2) ~ = ~ 1 - \Phi \big( \frac{2 - 0.5\times2}{\sqrt{1-0.5^2}} \big)

which is approximately 12.4%.

1 - stats.norm.cdf(2, 0.5*2, np.sqrt(1-0.5**2))
0.12410653949496186
🎥 See More
Loading...

24.4.2In the Original Units

Usually, you want to make predictions in the units in which the data were measured. Before changing units in the formulas above, keep in mind that conditioning on XX is equivalent to conditioning on XsuX_{su}. If you know the value of either of XX or XsuX_{su}, you also know the other.

The regression equation is

E(YX) = E(YXsu)= E(σYYsu+μYXsu)= σYE(YsuXsu)+μY= σYρ(XμXσX)+μY= ρσYσXX+(μYρσYσXμX)\begin{align*} E(Y \mid X) ~ &= ~ E(Y \mid X_{su}) \\ &= ~ E(\sigma_Y Y_{su} + \mu_Y \mid X_{su}) \\ &= ~ \sigma_Y E(Y_{su} \mid X_{su}) + \mu_Y \\ &= ~ \sigma_Y \rho \big( \frac{X - \mu_X}{\sigma_X} \big) + \mu_Y \\ &= ~ \rho \frac{\sigma_Y}{\sigma_X} X + \big( \mu_Y - \rho\frac{\sigma_Y}{\sigma_X}\mu_X \big) \end{align*}

which is the same as the equation of the least squares line we had derived earlier without any assumptions about the joint distribution of XX and YY. This confirms our observation that if XX and YY are bivariate normal, the best linear predictor is the best among all predictors.

The amount of error in the prediction is measured by SD(YX)SD(Y \mid X) which is the same as

SD(YXsu) = SD(σYYsu+μYXsu) = σYSD(YsuXsu) = 1ρ2σYSD(Y \mid X_{su}) ~ = ~ SD(\sigma_Y Y_{su} + \mu_Y \mid X_{su}) ~ = ~ \sigma_Y SD(Y_{su} \mid X_{su}) ~ = ~ \sqrt{1-\rho^2}\sigma_Y

and

Var(YX)=(1ρ2)σY2Var(Y \mid X) = (1 - \rho^2)\sigma_Y^2

The conditional distribution of YY given XX is normal with the mean and variance calculated above.

🎥 See More
Loading...

24.4.3Regression Equation: Alternative Form I

Regardless of the joint distribution of XX and YY, the regression equation is

Y^ = aX+b   where a=ρσYσX and b=μYaμX\hat{Y} ~ = ~ a^*X + b^* ~~~ \text{where } a^* = \rho \frac{\sigma_Y}{\sigma_X} \text{ and } b^* = \mu_Y - a^*\mu_X

This is equivalent to

Y^ = a(XμX)+μY\hat{Y} ~ = ~ a^*(X - \mu_X) + \mu_Y

This form shows that the regression line passes through the point (μX,μY)(\mu_X, \mu_Y) and that E(Y^)=μYE(\hat{Y}) = \mu_Y. The predicted values and the actual values are the same on average.

🎥 See More
Loading...

24.4.4Regression Equation: Alternative Form II

When there are just two variables, matrix formulations are hardly necessary. But it is worth writing the regression estimate and the conditional variance using only the mean vector and covariance matrix, and replacing division with multipliciation by an inverse. This effort will be rewarded in the next chapter because exactly analogous formulas will work for multiple regression.

Define σX,Y=Cov(X,Y)=σY,X\sigma_{X,Y} = Cov(X, Y) = \sigma_{Y,X}. Then XX and YY have mean vector [μX, μY]T[\mu_X, ~ \mu_Y]^T and covariance matrix

[σX2σY,XσX,YσY2]\begin{bmatrix} \sigma_X^2 & \sigma_{Y,X} \\ \sigma_{X,Y} & \sigma_Y^2 \end{bmatrix}

We know that

ρ = σX,YσXσY\rho ~ = ~ \frac{\sigma_{X,Y}}{\sigma_X \sigma_Y}

The regression equation can therefore be written as

E(YX) = σYρ(XμXσX)+μY= σX,YσX2(XμX)+μY= σY,X(σX2)1(XμX)+μY\begin{align*} E(Y \mid X) ~ &= ~ \sigma_Y \rho \big( \frac{X - \mu_X}{\sigma_X} \big) + \mu_Y \\ &= ~ \frac{\sigma_{X,Y}}{\sigma_X^2}(X - \mu_X) + \mu_Y \\ &= ~ \sigma_{Y,X}(\sigma_X^2)^{-1} (X - \mu_X) + \mu_Y \end{align*}

Also

ρ2 = σX,Y2σX2σY2\rho^2 ~ = ~ \frac{\sigma_{X,Y}^2}{\sigma_X^2 \sigma_Y^2}

so the variance of the error is

Var(YX) = (1ρ2)σY2 = σY2σX,Y2(σX2)1 = σY2σY,X(σX2)1σX,YVar(Y \mid X) ~ = ~ (1 - \rho^2)\sigma_Y^2 ~ = ~ \sigma_Y^2 - \sigma_{X,Y}^2 (\sigma_X^2)^{-1} ~ = ~ \sigma_Y^2 - \sigma_{Y,X} (\sigma_X^2)^{-1} \sigma_{X,Y}