Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In data science, regression models are widely used for prediction. This chapter examines linear least squares from a probabilistic perspective. The focus is on simple regression, that is, prediction based on one numerical attribute.

When the joint distribution of the attribute XX and the response YY is bivariate normal, the empirical distribution of (X,Y)(X, Y) has the football shape so familiar from Data 8. We will start with a geometric interpretation of correlation, as that is helpful for understanding both regression and the bivariate normal. The equation of the regression line, which we will derive, can be written in several ways; by the end of the chapter we will have written it in the way that is most easily extended to multiple regression.