In this section we are going to see if we can identify the best among all linear predictors of one numerical variable based on another, regardless of the joint distribution of the two variables.
For jointly distributed random variables X and Y, you know that E(Yβ£X) is the least squares predictor of Y based on functions of X. We will now restrict the allowed functions to linear functions and see if we can find the best among those. In later sections we will see the connection between this best linear predictor, the best among all predictors, and the bivariate normal distribution.
from IPython.display import YouTubeVideo
YouTubeVideo('p-Dmvjh7MP4')
The derivative with respect to a is β2Cov(X,Y)+2aVar(X). Thus the minimizing value of a is
aβΒ =Β Var(X)Cov(X,Y)β
At this point we should check that what we have is a minimum, not a maximum, but based on your experience with prediction you might just be willing to accept that we have a minimum. If youβre not, then differentiate again and look at the sign of the resulting function.
24.1.2Slope and Intercept of the Regression LineΒΆ
The least squares straight line is called the regression line.You now have a proof of its equation, familiar to you from Data 8. Let rX,Yβ be the correlation between X and Y and let ΟXβ and ΟYβ be the standard deviations of X and Y respectively. As you know, rX,Yβ=ΟXβΟYβCov(X,Y)β. So the slope and intercept are given by
If both X and Y are measured in standard units, then the slope of the regression line is the correlation rX,Yβ and the intercept is 0.
In other words, given that X=x standard units, the predicted value of Y is rX,Yβx standard units. When rX,Yβ is positive but not 1, this result is called the regression effect: the predicted value of Y is closer to 0 than the given value of X.
The calculations above show that regardless of the joint distribution of X and Y, that is, regardless of the relation between X and Y,
The equation of the regression line holds.
The regression line goes through the point (E(X),E(Y)).
There is a unique best straight line predictor among all straight lines. If the relation between X and Y isnβt roughly linear then you wonβt want to use the best straight line for predictions, because the best straight line is only the best among a bad class of predictors. But it exists.
In Data 8, the setting for simple linear regression was that we had a deterministic set of points {(xiβ,yiβ):1β€iβ€n} and we were using a line of the from y=ax+b as our predictor.
The equation of the regression line based on the data is a special case of the random variable calculations of this section. The mean squared error of the prediction is easily seen to be equal to MSE(a,b) as defined in this section for a randomly picked point: