Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The state space of a process is the set of possible values of the random variables in the process. We will often denote the state space by SS.

For example, consider a random walk where a gambler starts with a fortune of aa dollars for some positive integer aa, and bets on successive tosses of a fair coin. If the coin lands heads he gains a dollar, and if it lands tails he loses a dollar.

Let X0=aX_{0} = a, and for n>0n > 0 let Xn+1=Xn+InX_{n+1} = X_n + I_n where I1,I2,I_1, I_2, \ldots is an i.i.d. sequence of increments, each taking the value +1 or -1 with chance 1/21/2. The state space of this random walk X0,X1,X2,X_0, X_1, X_2, \dots is the set of all integers. In this course we will restrict the state space to be discrete and typically finite.

🎥 See More
Loading...

10.1.1Markov Property

Consider a stochastic process X0,X1,X2,X_0, X_1, X_2, \ldots. The Markov property formalizes the idea that the future of the process depends only on where the process is at present, not on how it got there.

  • For each n1n \ge 1, the conditional distribution of Xn+1X_{n+1} given X0,X1,,XnX_0, X_1, \ldots , X_n depends only on XnX_n.

That is, for every sequence of possible values i0,i1,,in,in+1i_0, i_1, \ldots, i_n, i_{n+1},

P(Xn+1=in+1X0=i0,X1=i1,,Xn1=in1,Xn=in)=P(Xn+1=in+1Xn=in)P(X_{n+1} = i_{n+1} \mid X_0 = i_0, X_1 = i_1 , \ldots, X_{n-1} = i_{n-1}, X_n = i_n) = P(X_{n+1} = i_{n+1} \mid X_n = i_n)

The Markov property holds for the random walk described above. Given the gambler’s fortune at time nn, the distribution of his fortune at time n+1n+1 doesn’t depend on his fortune before time nn. So the process X0,X1,X2,X_0, X_1, X_2, \ldots is a Markov Chain representing the evolution of the gambler’s fortune over time.

Conditional Independence

Recall that two random variables XX and YY are independent if the conditional distribution of XX given YY is just the unconditional distribution of XX.

Random variables XX and YY are said to be conditionally independent given ZZ if the conditional distribution of XX given both YY and ZZ is just the conditional distribution of XX given ZZ alone. That is, if you know ZZ, then additional knowledge about YY doesn’t change your opinion about XX.

In a Markov Chain, if you define time nn to be the present, time n+1n+1 to be the future, and times 0 through n1n-1 to be the past, then the Markov property says that the past and future are conditionally independent given the present.

10.1.2Initial Distribution and Transition Probabilities

Let X0,X1,X2,X_0, X_1, X_2, \ldots be a Markov chain with state space SS. The distribution of X0X_0 is called the initial distribution of the chain.

A a trajectory or path is a sequence of states visited by the process. Let i0i1ini_0 i_1 \ldots i_n denote a path of finite length, with iji_j representing the value of XjX_j. By the Markov property, the probability of this path is

P(X0=i0,X1=i1,X2=i2,,Xn=in)= P(X0=i0)P(X1=i1X0=i0)P(X2=i2X1=i1)P(Xn=inXn1=in1)\begin{align*} & P(X_0 = i_0, X_1 = i_1, X_2 = i_2, \ldots, X_n = i_n) \\ & = ~ P(X_0 = i_0)P(X_1 = i_1 \mid X_0 = i_0)P(X_2 = i_2 \mid X_1 = i_1) \cdots P(X_n = i_n \mid X_{n-1} = i_{n-1}) \end{align*}

The conditional probabilities in the product are called transition probabilities. For states ii and jj, the conditional probability P(Xn+1=jXn=i)P(X_{n+1} = j \mid X_n = i) is called a one-step transition probability at time nn.

10.1.3Stationary Transition Probabilities

For many chains such as the random walk, the one-step transition probabilities depend only on the states ii and jj, not on the time nn. For example, for the random walk,

P(Xn+1=jXn=i)={12if j=i1 or j=i+10 otherwiseP(X_{n+1} = j \mid X_n = i) = \begin{cases} \frac{1}{2} & \text{if } j = i-1 \text{ or } j = i+1 \\ 0 & \text{ otherwise} \end{cases}

for every nn. When one-step transition probabilites don’t depend on nn, they are called stationary or time-homogenous. All the Markov chains that we will study in this course have time-homogenous transition probabilities.

For such a chain, define the one-step transition probability

P(i,j) = P(Xn+1=jXn=i) = P(X1=jX0=i)P(i, j) ~ = ~ P(X_{n+1} = j \mid X_n = i) ~ = ~ P(X_1 = j \mid X_0 = i)

Then the probability of every path of finite length is the product of a term from the initial distribution and a sequence of one-step transition probabilities:

P(X0=i0,X1=i1,X2=i2,,Xn=in) = P(X0=i0)P(i0,i1)P(i1,i2)P(in1,in)P(X_0 = i_0, X_1 = i_1, X_2 = i_2, \ldots, X_n = i_n) ~ = ~ P(X_0 = i_0)P(i_0, i_1)P(i_1, i_2) \cdots P(i_{n-1}, i_n)
🎥 See More
Loading...

10.1.4One-Step Transition Matrix

The one-step transition probabilities can be represented as elements of a matrix. This isn’t just for compactness of notation – it leads to a powerful theory.

The one-step transition matrix of the chain is the matrix P\mathbb{P} whose (i,j)(i, j)th element is P(i,j)=P(X1=jX0=i)P(i, j) = P(X_1 = j \mid X_0 = i).

Often, P\mathbb{P} is just called the transition matrix for short. Note two important properties:

  • P\mathbb{P} is a square matrix: its rows as well as its columns are indexed by the state space.

  • Each row of P\mathbb{P} is a distribution: for each state ii, and each nn, Row ii of the transition matrix is the conditional distribution of Xn+1X_{n+1} given that Xn=iX_n = i. Because each of its rows adds up to 1, P\mathbb{P} is called a stochastic matrix.

Let’s see what the transition matrix looks like in an example.

10.1.5Sticky Reflecting Random Walk

Often, the transition behavior of a Markov chain is easier to describe in a transition diagram instead of a matrix. Here is such a diagram for a chain on the states 1, 2, 3, 4, and 5. The diagram shows the one-step transition probabilities.

  • No matter at which state the chain is, it stays there with chance 0.5.

  • If the chain is at states 2 through 4, it moves to each of its two adjacent state with chance 0.25.

  • If the chain is at states 1 or 5, it moves to its adjacent state with chance 0.5.

Reflecting Lazy Walk

We say that there is reflection at states 1 and 5. The walk is sticky because of the positive chance of staying in place.

Transition diagrams are great for understanding the rules by which a chain moves. For calculations, however, the transition matrix is more helpful.

To start constructing the matrix, we set the array s to be the set of states and the transition function refl_walk_probs to take arguments ii and jj and return P(i,j)P(i, j).

s = np.arange(1, 6)

def refl_walk_probs(i, j):
    # staying in the same state
    if i-j == 0:
        return 0.5
    
    # moving left or right
    elif 2 <= i <= 4:
        if abs(i-j) == 1:
            return 0.25
        else:
            return 0
        
    # moving right from 1
    elif i == 1:
        if j == 2:
            return 0.5
        else:
            return 0
    
    # moving left from 5
    elif i == 5:
        if j == 4:
            return 0.5
        else:
            return 0

You can use the prob140 library to construct MarkovChain objects. The from_transition_function method takes two arguments:

  • an array of the states

  • a transition function

and displays the one-step transition matrix of a MarkovChain object.

reflecting_walk = MarkovChain.from_transition_function(s, refl_walk_probs)
reflecting_walk
Loading...

Compare the transition matrix P\mathbb{P} with the transition diagram, and confirm that they contain the same information about transition probabilities.

To find the chance that the chain moves to jj given that it is at ii, go to Row ii and pick out the probability in Column jj.

If you know the starting state, you can use P\mathbb{P} to find the probability of any finite path. For example, given that the walk starts at 1, the probability that it then has the path [2, 2, 3, 4, 3] is

P(1,2)P(2,2)P(2,3)P(3,4)P(4,3)0.4%P(1, 2)P(2, 2)P(2, 3)P(3, 4)P(4, 3) \approx 0.4\%
0.5 * 0.5 * 0.25 * 0.25 * 0.25
0.00390625

The MarkovChain method prob_of_path saves you the trouble of doing the multiplication. It takes as its arguments the starting state and the rest of the path (in a list or array), and returns the probability of the path given the starting state.

reflecting_walk.prob_of_path(1, [2, 2, 3, 4, 3])
0.00390625
reflecting_walk.prob_of_path(1, [2, 2, 3, 4, 3, 5])
0.0

You can simulate paths of the chain using the simulate_path method. It takes two arguments: the starting state and the number of steps of the path. By default it returns an array consisting of the sequence of states in the path. The optional argument plot_path=True plots the simulated path. Run the cells below a few times and see how the output changes.

reflecting_walk.simulate_path(1, 7)
array([1, 1, 1, 2, 1, 1, 2, 1])
reflecting_walk.simulate_path(1, 10, plot_path=True)
<Figure size 432x288 with 1 Axes>
🎥 See More
Loading...

10.1.6nn-Step Transition Matrix

For states ii and jj, the chance of getting from ii to jj in nn steps is called the nn-step transition probability from ii to jj. Formally, the nn-step transition probability is

Pn(i,j) = P(Xn=jX0=i)P_n(i, j) ~ = ~ P(X_n = j \mid X_0 = i)

In this notation, the one-step transition probability P(i,j)P(i, j) can also be written as P1(i,j)P_1(i, j).

The nn-step transition probability Pn(i,j)P_n(i, j) can be represented as the (i,j)(i, j)th element of a matrix called the nn-step transition matrix. For each state ii, Row ii of the nn-step transition matrix contains the distribution of XnX_n given that the chain starts at ii.

The MarkovChain method transition_matrix takes nn as its argument and displays the nn-step transition matrix. Here is the 2-step transition matrix of the reflecting walk defined earlier in this section.

reflecting_walk.transition_matrix(2)
Loading...

You can calculate the individual entries easily by hand. For example, the (1,1)(1, 1) entry is the chance of going from state 1 to state 1 in 2 steps. There are two paths that make this happen:

  • [1, 1, 1]

  • [1, 2, 1]

Given that 1 is the starting state, the total chance of the two paths is (0.5×0.5)+(0.5×0.25)=0.375(0.5 \times 0.5) + (0.5 \times 0.25) = 0.375.

Because of the Markov property, the one-step transition probabilities are all you need to find the 2-step transition probabilities.

In general, we can find P2(i,j)P_2(i, j) by conditioning on where the chain was at time 1.

P2(i,j) = P(X2=jX0=i)= kP(X1=k,X2=jX0=i)= kP(X1=kX0=i)P(X2=jX1=k)= kP(i,k)P(k,j)\begin{align*} P_2(i, j) ~ &= ~ P(X_2 = j \mid X_0 = i) \\ &= ~ \sum_k P(X_1 = k, X_2 = j \mid X_0 = i) \\ &= ~ \sum_k P(X_1 = k \mid X_0 = i)P(X_2 = j \mid X_1 = k) \\ &= ~ \sum_k P(i, k)P(k, j) \end{align*}

That’s the (i,j)(i, j)th element of the matrix product P×P=P2\mathbb{P} \times \mathbb{P} = \mathbb{P}^2. Thus the 2-step transition matrix of the chain is P2\mathbb{P}^2.

By induction, you can show that the nn-step transition matrix of the chain is Pn\mathbb{P}^n. That is,

Pn(i,j) = P(Xn=jX0=i) = (i,j) element of PnP_n(i, j) ~ = ~ P(X_n = j \mid X_0 = i) ~ = ~ (i, j) \text{ element of } \mathbb{P}^n

Here is a display of the 5-step transition matrix of the reflecting walk.

reflecting_walk.transition_matrix(5)
Loading...

This is a display, but to work with the matrix we have to represent it in a form that Python recognizes as a matrix. The method get_transition_matrix does this for us. It take the number of steps nn as its argument and returns the nn-step transition matrix as a NumPy matrix.

For the reflecting walk, we will start by extracting P\mathbb{P} as the matrix refl_walk_P.

refl_walk_P = reflecting_walk.get_transition_matrix(1)
refl_walk_P
array([[0.5 , 0.5 , 0. , 0. , 0. ], [0.25, 0.5 , 0.25, 0. , 0. ], [0. , 0.25, 0.5 , 0.25, 0. ], [0. , 0. , 0.25, 0.5 , 0.25], [0. , 0. , 0. , 0.5 , 0.5 ]])

Let’s check that the 5-step transition matrix displayed earlier is the same as P5\mathbb{P}^5. You can use np.linalg.matrix_power to raise a matrix to a non-negative integer power. The first argument is the matrix, the second is the power.

np.linalg.matrix_power(refl_walk_P, 5)
array([[0.24609375, 0.41015625, 0.234375 , 0.08984375, 0.01953125], [0.20507812, 0.36328125, 0.25 , 0.13671875, 0.04492188], [0.1171875 , 0.25 , 0.265625 , 0.25 , 0.1171875 ], [0.04492188, 0.13671875, 0.25 , 0.36328125, 0.20507812], [0.01953125, 0.08984375, 0.234375 , 0.41015625, 0.24609375]])

This is indeed the same as the matrix displayed by transition_matrix, though it is harder to read.

When we want to use P\mathbb{P} in computations, we will use this matrix representation. For displays, transition_matrix is better.

10.1.7The Long Run

To understand the long run behavior of the chain, let nn be large and let’s examine the distribution of XnX_n for each value of the starting state. That’s contained in the nn-step transition matrix Pn\mathbb{P}^n.

Here is the display of Pn\mathbb{P}^n for the reflecting walk, for n=25,50n = 25, 50, and 100. Keep your eyes on the rows of the matrices as nn changes.

reflecting_walk.transition_matrix(25)
Loading...
reflecting_walk.transition_matrix(50)
Loading...
reflecting_walk.transition_matrix(100)
Loading...

The rows of P100\mathbb{P}^{100} are all the same! That means that for the reflecting walk, the distribution at time 100 doesn’t depend on the starting state. The chain has forgotten where it started.

You can increase nn and see that the nn-step transition matrix stays the same. By time 100, this chain has reached stationarity.

Stationarity is a remarkable property of many Markov chains, and is the main topic of this chapter.