Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

To understand the relation between two variables we must examine the conditional behavior of each of them given the value of the other. Towards this goal, we will start by examining the example of the previous section and then develop the general theory.

🎥 See More
Loading...

In our example, the joint distribution of XX and YY is given by joint_table. Here we also display the marginal distribution of XX.

joint_table.marginal('X')
Loading...

Now suppose we know that Y=3Y = 3. Then the outcome space is reduced to just the cells in the row labeled Y=3.

def indicator_Y_equals_3(i, j):
    return j == 3

joint_table.event(indicator_Y_equals_3, 'X', 'Y')
P(Event) = 0.3125
Loading...

Of course, the probabilities along this row don’t sum to 1. Their sum is P(Y=3)=0.3125P(Y = 3) = 0.3125.

By the division rule, for each x=0,1,2x = 0, 1, 2 we have

P(X=xY=3) = P(X=x,Y=3)P(Y=3)P(X = x \mid Y = 3) ~ = ~ \frac{P(X = x, Y = 3)}{P(Y = 3)}

By normalizing all the probabilities in the row by their sum, we get the conditional distribution of XX given Y=3Y=3.

P(X=0Y=3) = 0.031250.3125=0.1P(X=1Y=3) = 0.18750.3125=0.6P(X=2Y=3) = 0.093750.3125=0.3\begin{align*} P(X = 0 \mid Y = 3) ~ &= ~ \frac{0.03125}{0.3125} = 0.1 \\ \\ P(X = 1 \mid Y = 3) ~ &= ~ \frac{0.1875}{0.3125} = 0.6 \\ \\ P(X = 2 \mid Y = 3) ~ &= ~ \frac{0.09375}{0.3125} = 0.3 \end{align*}

Compare this conditional distribution to the unconditional distribution of XX:

P(X=0) = 0.25,     P(X=1) = 0.5,     P(X=2) = 0.25P(X = 0) ~ = ~ 0.25, ~~~~~ P(X = 1) ~ = ~ 0.5, ~~~~~ P(X = 2) ~ = ~ 0.25

The two distributions are different. Given Y=3Y = 3, the chance that XX is large is higher than it is if we don’t have that condition.

This shows that XX and YY are dependent. We will define dependence and independence formally in the next section.

4.4.1Conditional Distribution of XX given Y=yY = y

The conditional_dist method operates on a joint distribution object and displays conditional distributions, as follows.

# conditional distribution of X given each different value of Y

joint_table.conditional_dist('X', 'Y') 
Loading...

To understand this table, start with the row labeled Y=3. The entries are the probabilities in the conditional distribution of XX given Y=3Y=3.

In the row labeled Y=1, the entries are the probabilities in the conditional distribution of XX given Y=1Y=1. Notice that if Y=1Y=1 then XX can’t be 2. You can go back and confirm that in the joint distribution table, P(X=2,Y=1)=0P(X = 2, Y = 1) = 0.

All the other rows can be understood in the same way. In row yy, the given condition is Y=yY=y, and the entries are the probabilities in the conditional distribution of XX given Y=yY=y.

It is easy to see why each row in the table of conditional distributions sums to 1. The value in each cell in the row is obtained from the joint distribution table by taking the corresponding cell in that table and dividing its entry by the sum of the entries in the row.

4.4.2The Theory

We can now generalize the calculations we did in the example above.

🎥 See More
Loading...

Let XX and YY be two random variables defined on the same space. If xx is a possible value of XX, and yy and possible value of YY, then

P(X=xY=y)=P(X=x,Y=y)P(Y=y)P(X = x \mid Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)}

The conditional probability P(X=xY=y)P(X = x \mid Y = y) is displayed in the (x,y)(x, y) cell of the table of conditional distributions above.

For a fixed value yy^* of YY, the conditional distribution of XX given Y=yY = y^* is the collection of probabilities

P(X=xY=y)=P(X=x,Y=y)P(Y=y)P(X = x \mid Y = y^*) = \frac{P(X = x, Y = y^*)}{P(Y = y^*)}

where xx ranges over all the values of XX. Keep in mind that xx represents the values of the variable here. The value yy^* is the particular value of YY that was observed, so it is a constant.

4.4.3The Probabilities in a Conditional Distribution Sum to 1

In a distribution, the probabilities have to sum to 1. To see that this is true for the conditional distribution defined above, start by using the fundamental rule.

Find P(Y=y)P(Y = y^*) by partitioning the event {Y=y}\{ Y = y^* \} according to the values of XX:

P(Y=y)=all xP(X=x,Y=y)P(Y = y^*) = \sum_{\text{all }x} P(X = x, Y = y^*)

Now sum the probabilities in the conditional distribution of XX given Y=yY = y^*:

all xP(X=xY=y) = all xP(X=x,Y=y)P(Y=y)= 1P(Y=y)all xP(X=x,Y=y)= 1P(Y=y)P(Y=y)= 1\begin{align*} \sum_{\text{all }x} P(X = x \mid Y = y^*) ~ &= ~ \sum_{\text{all }x} \frac{P(X = x, Y = y^*)}{P(Y = y^*)} \\ \\ &= ~ \frac{1}{P(Y = y^*)} \sum_{\text{all }x} P(X = x, Y = y^*) \\ &= ~ \frac{1}{P(Y = y^*)} \cdot P(Y = y^*) \\ \\ &= ~ 1 \end{align*}

Thus the conditional distribution is just an ordinary probability distribution: a set of values with probabilities that sum to 1.