{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "# HIDDEN\n", "from datascience import *\n", "from prob140 import *\n", "import numpy as np\n", "from scipy import stats\n", "from myst_nb import glue\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "# HIDDEN\n", "\n", "# X = number of heads in first two tosses; Y = number of heads in first five tosses\n", "def joint_probability(x, y):\n", " if y >= x:\n", " return stats.binom.pmf(x, 2, 1/2) * stats.binom.pmf(y-x, 3, 1/2)\n", " else:\n", " return 0\n", " \n", "k_x = np.arange(3)\n", "k_y = np.arange(6)\n", "\n", "joint_table = Table().values('X', k_x, 'Y', k_y).probability_function(joint_probability)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Marginal Distributions ##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does the joint distribution of $X$ and $Y$ tell us about the distribution of $X$ alone?\n", "\n", "Everything, of course. Let's see how." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# VIDEO: Marginal Distribution\n", "from IPython.display import YouTubeVideo\n", "\n", "vid_marginal_dist = YouTubeVideo(\"E-m1o0bxFzo\")\n", "glue(\"vid_marginal_dist\", vid_marginal_dist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{dropdown} See More\n", ":icon: video\n", "{glue:}`vid_marginal_dist`\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the joint distribution table of two random variables $X$ and $Y$." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X=0X=1X=2
Y=50.000000.00000.03125
Y=40.000000.06250.09375
Y=30.031250.18750.09375
Y=20.093750.18750.03125
Y=10.093750.06250.00000
Y=00.031250.00000.00000
\n", "
" ], "text/plain": [ " X=0 X=1 X=2\n", "Y=5 0.00000 0.0000 0.03125\n", "Y=4 0.00000 0.0625 0.09375\n", "Y=3 0.03125 0.1875 0.09375\n", "Y=2 0.09375 0.1875 0.03125\n", "Y=1 0.09375 0.0625 0.00000\n", "Y=0 0.03125 0.0000 0.00000" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To find the distribution of $X$ we need the possible values of $X$ and all their probabilities.\n", "\n", "At a glance, you can see that the possible values of $X$ are 0, 1, and 2.\n", "\n", "Let's look at the event $\\{ X = 0 \\}$. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(Event) = 0.25\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X=0X=1X=2
Y=50.00000
Y=40.00000
Y=30.03125
Y=20.09375
Y=10.09375
Y=00.03125
\n", "
" ], "text/plain": [ " X=0 X=1 X=2\n", "Y=5 0.00000 \n", "Y=4 0.00000 \n", "Y=3 0.03125 \n", "Y=2 0.09375 \n", "Y=1 0.09375 \n", "Y=0 0.03125 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def indicator_X_equals_0(i, j):\n", " return i == 0\n", "\n", "joint_table.event(indicator_X_equals_0, 'X', 'Y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are the cells in the column labeled `X=0`. The sum of the probabilities in those cells is $P(X = 0) = 0.25$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Partitioning $\\{X = x \\}$ According to $Y$ ###\n", "In every cell of the column labeled `X=0`, the value of $X$ is 0 and the value of $Y$ is some possible value of $Y$. So the column `X=0` partitions the event $\\{X = 0\\}$ according to the value of $Y$, and displays the probability of each piece of the partition.\n", "\n", "In other words, for every $x$ we have\n", "$$\n", "\\{X = x \\} = \\bigcup_{\\text{all } y} \\{X = x, Y = y\\}\n", "$$\n", "and this is a disjoint union. So by the addition rule,\n", "\n", "$$\n", "P(X = x) = \\sum_{\\text{all } y} P(X = x, Y = y)\n", "$$\n", "\n", "That is, $P(X = x)$ is the sum of the probabilities in the column `X=x`. Because $P(X = x)$ is the generic term in the distribution of $X$, we have learned that we can derive the distribution of $X$ from the joint distribution of $X$ and $Y$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Quick Check\n", "In the numerical example above, consider the probabilities $P(X=2, Y=y)$ for $0 \\le y \\le 5$. \n", "\n", "(a) How many of those probabilities are positive?\n", "\n", "(b) Fill in the blank with an event: The sum of the positive probabilities in Part (a) equals $P( \\underline{~~~~~~~~~~~~~~~~~~~} )$.\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Answer\n", ":class: dropdown\n", "(a) 4 $~~~~$ (b) $X=2$\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Marginal Distribution of $X$ ###\n", "\n", "To find the numerical values of the distribution of $X$, we will use a method called `marginal` that operates on a joint distribution object and takes the variable name as its argument. The reason for using the word \"marginal\" will become clear as soon as we see the output." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X=0X=1X=2
Y=50.000000.00000.03125
Y=40.000000.06250.09375
Y=30.031250.18750.09375
Y=20.093750.18750.03125
Y=10.093750.06250.00000
Y=00.031250.00000.00000
Sum: Marginal of X0.250000.50000.25000
\n", "
" ], "text/plain": [ " X=0 X=1 X=2\n", "Y=5 0.00000 0.0000 0.03125\n", "Y=4 0.00000 0.0625 0.09375\n", "Y=3 0.03125 0.1875 0.09375\n", "Y=2 0.09375 0.1875 0.03125\n", "Y=1 0.09375 0.0625 0.00000\n", "Y=0 0.03125 0.0000 0.00000\n", "Sum: Marginal of X 0.25000 0.5000 0.25000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_table.marginal('X')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now at the bottom of the table you have all the column sums, which constitute the probabilities in the distribution of $X$. \n", "\n", "Because the sums appear in the margin of the table, the distribution is called *marginal*. It's a bit silly. But \"marginal\" is a commonly used term for the probability distribution of $X$ when the distribution has been derived from a joint distribution.\n", "\n", "You should recognize that $X$ has the same distribution as the number of heads in two tosses of a coin." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Both Marginals ###\n", "What you can do for $X$, you can do as well for $Y$ by looking along the rows." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X=0X=1X=2Sum: Marginal of Y
Y=50.000000.00000.031250.03125
Y=40.000000.06250.093750.15625
Y=30.031250.18750.093750.31250
Y=20.093750.18750.031250.31250
Y=10.093750.06250.000000.15625
Y=00.031250.00000.000000.03125
\n", "
" ], "text/plain": [ " X=0 X=1 X=2 Sum: Marginal of Y\n", "Y=5 0.00000 0.0000 0.03125 0.03125\n", "Y=4 0.00000 0.0625 0.09375 0.15625\n", "Y=3 0.03125 0.1875 0.09375 0.31250\n", "Y=2 0.09375 0.1875 0.03125 0.31250\n", "Y=1 0.09375 0.0625 0.00000 0.15625\n", "Y=0 0.03125 0.0000 0.00000 0.03125" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_table.marginal('Y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also get both marginals at once:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X=0X=1X=2Sum: Marginal of Y
Y=50.000000.00000.031250.03125
Y=40.000000.06250.093750.15625
Y=30.031250.18750.093750.31250
Y=20.093750.18750.031250.31250
Y=10.093750.06250.000000.15625
Y=00.031250.00000.000000.03125
Sum: Marginal of X0.250000.50000.250001.00000
\n", "
" ], "text/plain": [ " X=0 X=1 X=2 Sum: Marginal of Y\n", "Y=5 0.00000 0.0000 0.03125 0.03125\n", "Y=4 0.00000 0.0625 0.09375 0.15625\n", "Y=3 0.03125 0.1875 0.09375 0.31250\n", "Y=2 0.09375 0.1875 0.03125 0.31250\n", "Y=1 0.09375 0.0625 0.00000 0.15625\n", "Y=0 0.03125 0.0000 0.00000 0.03125\n", "Sum: Marginal of X 0.25000 0.5000 0.25000 1.00000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_table.both_marginals()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The bottom right corner cell is the sum of all the probabilities in the table, and also the sum of all the probabilities in each of the margins. Reassuringly, it's 1." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.4" } }, "nbformat": 4, "nbformat_minor": 1 }