{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove_cell"
]
},
"outputs": [],
"source": [
"# HIDDEN\n",
"from datascience import *\n",
"from prob140 import *\n",
"import numpy as np\n",
"from scipy import stats\n",
"from myst_nb import glue\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": [
"remove_cell"
]
},
"outputs": [],
"source": [
"# HIDDEN\n",
"\n",
"# X = number of heads in first two tosses; Y = number of heads in first five tosses\n",
"def joint_probability(x, y):\n",
" if y >= x:\n",
" return stats.binom.pmf(x, 2, 1/2) * stats.binom.pmf(y-x, 3, 1/2)\n",
" else:\n",
" return 0\n",
" \n",
"k_x = np.arange(3)\n",
"k_y = np.arange(6)\n",
"\n",
"joint_table = Table().values('X', k_x, 'Y', k_y).probability_function(joint_probability)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Marginal Distributions ##"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What does the joint distribution of $X$ and $Y$ tell us about the distribution of $X$ alone?\n",
"\n",
"Everything, of course. Let's see how."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# VIDEO: Marginal Distribution\n",
"from IPython.display import YouTubeVideo\n",
"\n",
"vid_marginal_dist = YouTubeVideo(\"E-m1o0bxFzo\")\n",
"glue(\"vid_marginal_dist\", vid_marginal_dist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{dropdown} See More\n",
":icon: video\n",
"{glue:}`vid_marginal_dist`\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is the joint distribution table of two random variables $X$ and $Y$."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
X=0
\n",
"
X=1
\n",
"
X=2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Y=5
\n",
"
0.00000
\n",
"
0.0000
\n",
"
0.03125
\n",
"
\n",
"
\n",
"
Y=4
\n",
"
0.00000
\n",
"
0.0625
\n",
"
0.09375
\n",
"
\n",
"
\n",
"
Y=3
\n",
"
0.03125
\n",
"
0.1875
\n",
"
0.09375
\n",
"
\n",
"
\n",
"
Y=2
\n",
"
0.09375
\n",
"
0.1875
\n",
"
0.03125
\n",
"
\n",
"
\n",
"
Y=1
\n",
"
0.09375
\n",
"
0.0625
\n",
"
0.00000
\n",
"
\n",
"
\n",
"
Y=0
\n",
"
0.03125
\n",
"
0.0000
\n",
"
0.00000
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X=0 X=1 X=2\n",
"Y=5 0.00000 0.0000 0.03125\n",
"Y=4 0.00000 0.0625 0.09375\n",
"Y=3 0.03125 0.1875 0.09375\n",
"Y=2 0.09375 0.1875 0.03125\n",
"Y=1 0.09375 0.0625 0.00000\n",
"Y=0 0.03125 0.0000 0.00000"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"joint_table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To find the distribution of $X$ we need the possible values of $X$ and all their probabilities.\n",
"\n",
"At a glance, you can see that the possible values of $X$ are 0, 1, and 2.\n",
"\n",
"Let's look at the event $\\{ X = 0 \\}$. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"P(Event) = 0.25\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
X=0
\n",
"
X=1
\n",
"
X=2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Y=5
\n",
"
0.00000
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Y=4
\n",
"
0.00000
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Y=3
\n",
"
0.03125
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Y=2
\n",
"
0.09375
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Y=1
\n",
"
0.09375
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Y=0
\n",
"
0.03125
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X=0 X=1 X=2\n",
"Y=5 0.00000 \n",
"Y=4 0.00000 \n",
"Y=3 0.03125 \n",
"Y=2 0.09375 \n",
"Y=1 0.09375 \n",
"Y=0 0.03125 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def indicator_X_equals_0(i, j):\n",
" return i == 0\n",
"\n",
"joint_table.event(indicator_X_equals_0, 'X', 'Y')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are the cells in the column labeled `X=0`. The sum of the probabilities in those cells is $P(X = 0) = 0.25$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Partitioning $\\{X = x \\}$ According to $Y$ ###\n",
"In every cell of the column labeled `X=0`, the value of $X$ is 0 and the value of $Y$ is some possible value of $Y$. So the column `X=0` partitions the event $\\{X = 0\\}$ according to the value of $Y$, and displays the probability of each piece of the partition.\n",
"\n",
"In other words, for every $x$ we have\n",
"$$\n",
"\\{X = x \\} = \\bigcup_{\\text{all } y} \\{X = x, Y = y\\}\n",
"$$\n",
"and this is a disjoint union. So by the addition rule,\n",
"\n",
"$$\n",
"P(X = x) = \\sum_{\\text{all } y} P(X = x, Y = y)\n",
"$$\n",
"\n",
"That is, $P(X = x)$ is the sum of the probabilities in the column `X=x`. Because $P(X = x)$ is the generic term in the distribution of $X$, we have learned that we can derive the distribution of $X$ from the joint distribution of $X$ and $Y$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{admonition} Quick Check\n",
"In the numerical example above, consider the probabilities $P(X=2, Y=y)$ for $0 \\le y \\le 5$. \n",
"\n",
"(a) How many of those probabilities are positive?\n",
"\n",
"(b) Fill in the blank with an event: The sum of the positive probabilities in Part (a) equals $P( \\underline{~~~~~~~~~~~~~~~~~~~} )$.\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{admonition} Answer\n",
":class: dropdown\n",
"(a) 4 $~~~~$ (b) $X=2$\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Marginal Distribution of $X$ ###\n",
"\n",
"To find the numerical values of the distribution of $X$, we will use a method called `marginal` that operates on a joint distribution object and takes the variable name as its argument. The reason for using the word \"marginal\" will become clear as soon as we see the output."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
X=0
\n",
"
X=1
\n",
"
X=2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Y=5
\n",
"
0.00000
\n",
"
0.0000
\n",
"
0.03125
\n",
"
\n",
"
\n",
"
Y=4
\n",
"
0.00000
\n",
"
0.0625
\n",
"
0.09375
\n",
"
\n",
"
\n",
"
Y=3
\n",
"
0.03125
\n",
"
0.1875
\n",
"
0.09375
\n",
"
\n",
"
\n",
"
Y=2
\n",
"
0.09375
\n",
"
0.1875
\n",
"
0.03125
\n",
"
\n",
"
\n",
"
Y=1
\n",
"
0.09375
\n",
"
0.0625
\n",
"
0.00000
\n",
"
\n",
"
\n",
"
Y=0
\n",
"
0.03125
\n",
"
0.0000
\n",
"
0.00000
\n",
"
\n",
"
\n",
"
Sum: Marginal of X
\n",
"
0.25000
\n",
"
0.5000
\n",
"
0.25000
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X=0 X=1 X=2\n",
"Y=5 0.00000 0.0000 0.03125\n",
"Y=4 0.00000 0.0625 0.09375\n",
"Y=3 0.03125 0.1875 0.09375\n",
"Y=2 0.09375 0.1875 0.03125\n",
"Y=1 0.09375 0.0625 0.00000\n",
"Y=0 0.03125 0.0000 0.00000\n",
"Sum: Marginal of X 0.25000 0.5000 0.25000"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"joint_table.marginal('X')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now at the bottom of the table you have all the column sums, which constitute the probabilities in the distribution of $X$. \n",
"\n",
"Because the sums appear in the margin of the table, the distribution is called *marginal*. It's a bit silly. But \"marginal\" is a commonly used term for the probability distribution of $X$ when the distribution has been derived from a joint distribution.\n",
"\n",
"You should recognize that $X$ has the same distribution as the number of heads in two tosses of a coin."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Both Marginals ###\n",
"What you can do for $X$, you can do as well for $Y$ by looking along the rows."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"