Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1. Let XX and YY be jointly distributed random variables and let Y^\hat{Y} be the linear regression estimate of YY based on XX. Show that the mean squared error of this estimate is (1r2)Var(Y)(1 - r^2)Var(Y) where rr is the correlation between XX and YY. This leads to the Data 8 formula for the SD of the residuals in simple linear regression.

[Use Alternative Form I of the regression equation, and preserve deviations as we did here.]

2. Let XX and YY be standard bivariate normal with correlation ρ\rho.

(a) Suppose I ask you for the least squares estimate of YY based on XX, but I don’t tell you XX. What is your estimate, and what is its mean squared error?

(b) Suppose I now show you XX. Now what is your least squares estimate of YY, and what is its mean squared error?

(c) What is your least squares estimate of YY based only on linear functions of XX, and what is its mean squared error?

3. Let (X,Y)(X, Y) be the weight and height of a person picked at random from a population, and suppose the distribution of (X,Y)(X, Y) is bivariate normal with correlation 0.6. Suppose also that

  • XX has mean 150 pounds and SD 25 pounds

  • YY has mean 68 inches and SD 3 inches

Sketch the conditional density of YY given X=170X = 170 pounds. Mark the numerical values of the conditional mean and SD appropriately on your sketch.

4. Let XX and YY have a bivariate normal distribution (not necessarily standard) with correlation ρ(0,1)\rho \in (0, 1). Suppose you are given that XX is on the 30th percentile.

(a) Pick the right option for the least squares estimate of YY, and explain.

(i) Below the 30th percentile

(ii) On the 30th percentile

(iii) Above the 30th percentile

(b) Write a single math expression for the percentile rank corresponding to the least squares estimate of YY. Your answer can involve ρ\rho and the standard normal cdf Φ\Phi.

5. Let XX and YY be standard bivariate normal with correlation ρ(0,1)\rho \in (0, 1).

(a) Without calculation, pick the right option and explain. P(X>0,Y<0)P(X > 0, Y < 0) is

(i) less than 0.25

(ii) equal to 0.25

(iii) greater than 0.25

(b) Now find P(X>0,Y<0)P(X > 0, Y < 0) in terms of ρ\rho.

[No integration is needed. Write YY in terms of XX and standard normal ZZ independent of XX, sketch the region, and use what you know about the joint density of (X,Z)(X, Z).]

6. Let XX and YY be standard bivariate normal with correlation ρ\rho. Find E(max(X,Y))E(\max(X, Y)). The easiest way is to use the fact that for any two numbers aa and bb, max(a,b)=(a+b+ab)/2\max(a, b) = (a + b + \vert a - b \vert)/2. Check the fact first, and then use it.

7. Suppose that XX is normal (μX,σX2(\mu_X, \sigma_X^2), YY is normal (μY,σY2)(\mu_Y, \sigma_Y^2), and the two random variables are independent. Let S=X+YS = X+Y.

(a) Find the conditional distribution of XX given S=sS=s.

(b) Find the least squares predictor of XX based on SS and provide its mean squared error.

(c) Find the least squares linear predictor of XX based on SS and provide its mean squared error.