Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Vectors and matrices give us a compact way of referring to random sequences like X1,X2,,XnX_1, X_2, \ldots, X_n. The algebra of vectors and matrices gives us powerful tools for studying linear combinations of random variables.

In this section we will develop matrix notation for random sequences and then express familiar consequences of linearity of expectation and bilinearity of covariance in matrix notation. The probability theory in this section is not new – it consists of expectation and covariance facts that you have known for some time. But the representation is new and leads us to new insights.

A vector valued random variable, or more simply, a random vector, is a list of random variables defined on the same space. We will think of it as an n×1n \times 1 column vector.

X = [X1X2Xn]\mathbf{X} ~ = ~ \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{bmatrix}

For ease of display, we will sometimes write X=[X1X2Xn]T\mathbf{X} = [X_1 X_2 \ldots X_n]^T where MT\mathbf{M}^T is notation for the transpose of the matrix M\mathbf{M}.

The mean vector of X\mathbf{X} is μ=[μ1 μ2  μn]T\boldsymbol{\mu} = [\mu_1 ~ \mu_2 ~ \ldots ~ \mu_n]^T where μi=E(Xi)\mu_i = E(X_i).

The covariance matrix of X\mathbf{X} is the n×nn \times n matrix Σ\boldsymbol{\Sigma} whose (i,j)(i, j) element is Cov(Xi,Xj)Cov(X_i, X_j).

The iith diagonal element of Σ\boldsymbol{\Sigma} is the variance of XiX_i. The matrix is symmetric because of the symmetry of covariance.

🎥 See More
Loading...

23.1.1Linear Transformation: Mean Vector

Let A\mathbf{A} be an m×nm \times n numerical matrix and b\mathbf{b} an m×1m \times 1 numerical vector. Consider the m×1m \times 1 random vector Y=AX+b\mathbf{Y} = \mathbf{AX} + \mathbf{b}.

We will call this a “linear transformation” of X\mathbf{X} though in fact it is an affine transformation, that is, a composition of the linear transformation AX\mathbf{AX} and the translation by b\mathbf{b}.

This representation gives us a compact way to describe multiple linear combinations of X\mathbf{X} simultaneously. For example, if b=[0  0  0]T\mathbf{b} = [0 ~~ 0 ~~ 0]^T and

A = [10000110001n1n1n1n1n]\mathbf{A} ~ = ~ \begin{bmatrix} 1 & 0 & 0 & 0 & \cdots & 0 \\ 1 & -1 & 0 & 0 & \cdots & 0 \\ \frac{1}{n} & \frac{1}{n} & \frac{1}{n} & \frac{1}{n} & \cdots & \frac{1}{n} \end{bmatrix}

then

AX+b = [X1X1X2Xˉn]\mathbf{AX} + \mathbf{b} ~ = ~ \begin{bmatrix} X_1 \\ X_1 - X_2 \\ \bar{X}_n \end{bmatrix}

In general, if Y=AX+b\mathbf{Y} = \mathbf{AX} + \mathbf{b} then the iith element of Y\mathbf{Y} is

Yi = AiX+b(i)Y_i ~ = ~ \mathbf{A}_{i*}\mathbf{X} + \mathbf{b}(i)

where Ai\mathbf{A}_{i*} denotes the iith row of A\mathbf{A} and b(i)\mathbf{b}(i) denotes the iith element of b\mathbf{b}. Written longhand,

Yi = ai1X1+ai2X2++ainXn+biY_i ~ = ~ a_{i1}X_1 + a_{i2}X_2 + \cdots + a_{in}X_n + b_i

where aija_{ij} is the (i,j)(i, j) entry of A\mathbf{A} and bi=b(i)b_i = \mathbf{b}(i).

Thus YiY_i is a linear combination of the elements of X\mathbf{X}. Therefore by linearity of expectation,

E(Yi) = Aiμ+b(i)E(Y_i) ~ = ~ \mathbf{A}_{i*} \boldsymbol{\mu} + \mathbf{b}(i)

Let μY\boldsymbol{\mu}_\mathbf{Y} be the mean vector of Y\mathbf{Y}. Then by the calculation above,

μY = Aμ+b\boldsymbol{\mu}_\mathbf{Y} ~ = ~ \mathbf{A} \boldsymbol{\mu} + \mathbf{b}
🎥 See More
Loading...

23.1.2Linear Transformation: Covariance Matrix

Cov(Yi,Yj)Cov(Y_i, Y_j) can be calculated using bilinearity of covariance.

Cov(Yi,Yj) = Cov(AiX,AjX)= Cov(k=1naikXk,l=1najlXl)= k=1nl=1naikajlCov(Xk,Xl)= k=1nl=1naikCov(Xk,Xl)tlj     where tlj=AT(l,j)\begin{align*} Cov(Y_i, Y_j) ~ &= ~ Cov(\mathbf{A}_{i*}\mathbf{X}, \mathbf{A}_{j*}\mathbf{X}) \\ &= ~ Cov\big( \sum_{k=1}^n a_{ik}X_k, \sum_{l=1}^n a_{jl}X_l \big) \\ &= ~ \sum_{k=1}^n\sum_{l=1}^n a_{ik}a_{jl}Cov(X_k, X_l) \\ &= ~ \sum_{k=1}^n\sum_{l=1}^n a_{ik}Cov(X_k, X_l)t_{lj} ~~~~~ \text{where } t_{lj} = \mathbf{A}^T(l, j) \\ \end{align*}

This is the (i,j)(i, j) element of AΣAT\mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T. So if ΣY\boldsymbol{\Sigma}_\mathbf{Y} denotes the covariance matrix of Y\mathbf{Y}, then

ΣY = AΣAT\boldsymbol{\Sigma}_\mathbf{Y} ~ = ~ \mathbf{A} \boldsymbol{\Sigma} \mathbf{A}^T
🎥 See More
Loading...

Let us see what this formula implies for the variance of a single component of Y\mathbf{Y}.

Any component of Y\mathbf{Y} is a linear combination of the elements of X\mathbf{X} and hence can be written as aX+b\mathbf{aX}+b for some 1×n1 \times n vector a\mathbf{a} and some real number bb.

The variance of this component of Y\mathbf{Y} is a diagonal element of ΣY\boldsymbol{\Sigma}_\mathbf{Y}. By our calculation above, the diagonal element is equal to aΣaT\mathbf{a}\boldsymbol{\Sigma}\mathbf{a}^T.

23.1.3Constraints on Σ\boldsymbol{\Sigma}

We know that Σ\boldsymbol{\Sigma} has to be symmetric. Also, no matter what A\mathbf{A} is, the elements on the main diagonal of ΣY\boldsymbol{\Sigma}_\mathbf{Y} must all be non-negative as they are the variances of the elements of Y\mathbf{Y}.

By the observation above, this implies

aΣaT  0    for all 1×n vectors a\mathbf{a} \boldsymbol{\Sigma} \mathbf{a}^T ~ \ge ~ 0 ~~~~ \text{for all } 1\times n \text{ vectors } \mathbf{a}

That is, Σ\boldsymbol{\Sigma} must be positive semidefinite.

Usually, we will be working with covariance matrices that are positive definite, defined by

aΣaT > 0    for all 1×n vectors a\mathbf{a} \boldsymbol{\Sigma} \mathbf{a}^T ~ > ~ 0 ~~~~ \text{for all } 1\times n \text{ vectors } \mathbf{a}

The reason is that if aΣaT=0\mathbf{a} \boldsymbol{\Sigma} \mathbf{a}^T = 0 for some a\mathbf{a}, then the linear transformation aX\mathbf{aX} has variance 0 and hence is a constant. Hence you can write some of the elements of X\mathbf{X} as linear combinations of the others and just study a reduced set of elements.