Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In the Bayesian world, unknown parameters are random variables, not constants. Bayesians describe their degree of uncertainty about an unknown quantity by specifying a probability distribution for that quantity.

For example, if we are tossing a coin that has an unknown probability of landing heads, we can think of that unknown probability as a random variable with possible values in the unit interval, instead of an unknown but fixed number.

This change of paradigm leads to an entirely different approach to inference, for which we need some technique.

20.2.1Conditioning on a Continuous Variable

Let’s take a moment for a general discussion about conditioning on a continuous variable. Our observations will parallel discussions in an earlier chapter where we found conditional densities.

Suppose XX is a random variable and AA is an event that depends on XX.

If XX is a discrete random variable, then for any possible value xx of XX the quantity P(AX=x)P(A \mid X = x) has a clear definition by the division rule:

P(AX=x) = P(A,X=x)P(X=x)P(A \mid X = x) ~ = ~ \frac{P(A, X = x)}{P(X = x)}

When XX has a density, the denominator is 0. In this case there is one main idea to keep in mind:

  • P(AXdx)P(A \mid X \in dx) is essentially constant regardless of exactly where the infinitesimal interval dxdx is placed relative to xx. This constant value will be denoted P(AX=x)P(A \mid X = x).

So for continuous XX, we will define

P(AX=x) = P(AXdx)  P(A,Xdx)P(Xdx)P(A \mid X = x) ~ = ~ P(A \mid X \in dx) ~ \sim ~ \frac{P(A, X \in dx)}{P(X \in dx)}

We are assuming that the limit of the right hand side as dxdx goes to 0 exists and doesn’t depend exactly how dxdx is defined: around xx, or to the left of xx, or to the right, and so on. This will be true under regularity conditions. You can just assume it works.

We can now talk about tossing a coin that has a random probability of landing heads.

Suppose a coin lands heads with probability XX where XX has density fXf_X on the unit interval. This means that conditionally given X=p, the tosses are i.i.d. Bernoulli (p)(p) random variables.

A good mental image is of picking a value of pp according to the density fXf_X, then repeatedly tossing a coin that lands heads with that given probability pp. Keep in mind that pp is chosen once, and then the same coin is tossed repeatedly.

  • Let A1,1A_{1,1} be the event that the first toss lands heads. Then by our definition, P(A1,1X=p)=pP(A_{1,1} \mid X = p) = p. Notice that this is the conditional chance of A1,1A_{1,1} given the observed value of the random probability. It is not the unconditional chance of heads. That requires a calculation that we will do shortly.

  • Let A2,2A_{2, 2} be the event that the first two tosses land heads. Then P(A2,2X=p)=p2P(A_{2,2} \mid X = p) = p^2.

  • In general, let Ak,nA_{k, n} be the event that kk out of the the first nn tosses land heads. Then P(Ak,nX=p)=(nk)pk(1p)nkP(A_{k, n} \mid X = p) = \binom{n}{k}p^k(1-p)^{n-k}.

Our familiar binomial probabilities are now conditional probabilities given the chance of heads.

We can find the unconditional probabilities as weighted averages of these conditional probabilities, as follows.

20.2.2Average Conditional Probabilities

Let XX have density fXf_X and let AA be an event. Then

P(A,Xdx) = P(Xdx)P(AX=x)  fX(x)dxP(AX=x)P(A, X \in dx) ~ = ~ P(X \in dx)P(A \mid X = x) ~ \sim ~ f_X(x)dxP(A \mid X = x)

So

P(A) = all xP(A,Xdx) = all xP(AX=x)fX(x)dxP(A) ~ = ~ \int_{\text{all x}} P(A, X \in dx) ~ = ~ \int_{\text{all x}} P(A \mid X = x)f_X(x)dx

In more compact notation, P(A)=E(P(AX))P(A) = E(P(A \mid X)). This is an example of finding expectation by conditioning.

20.2.3Example: One Toss of a Random Coin

Let XX have any density on the unit interval (0,1)(0, 1). Think of the value of XX as the the probability that a coin lands heads. Toss the coin once. Recall that our definition of “given X=pX=p” means that

P(coin lands headsX=p)=pP(\text{coin lands heads} \mid X = p) = p

Let XX have density fXf_X. Then

P(coin lands heads) = 01pfX(p)dp = E(X)P(\text{coin lands heads}) ~ = ~ \int_0^1 p \cdot f_X(p)dp ~ = ~ E(X)

Thus if XX is uniform on (0,1)(0, 1), then the chance that the coin lands heads is 1/21/2. If XX has the beta (r,s)(r, s) distribution then the chance that the coin lands heads is r/(r+s)r/(r+s).

20.2.4Example: Two Tosses of a Random Coin

Let XX be uniform on (0,1)(0, 1). Given X=pX = p, toss a pp-coin twice and observe the results of the tosses.

We have just observed that P(first toss is a head)=1/2P(\text{first toss is a head}) = 1/2. The first toss behaves like the toss of a fair coin. The same calculation shows that the chance that the second toss is a head (based on no knowledge of the first toss) is also 1/21/2.

Now let’s figure out the chance that both the tosses land heads. We know that P(both tosses are headsX=p)=p2P(\text{both tosses are heads} \mid X = p) = p^2. So

P(both tosses are heads) = 01p21dp = 13P(\text{both tosses are heads}) ~ = ~ \int_0^1 p^2 \cdot 1dp ~ = ~ \frac{1}{3}

That’s greater than 1/41/4 which is the chance of two heads given that you are tossing a fair coin twice. The results of the two tosses are not independent.

Let’s see what’s going on here. We know that

P(both tosses are heads) = P(first toss is a head)P(second toss is a headfirst toss is a head)= 12P(second toss is a headfirst toss is a head)\begin{align*} P(\text{both tosses are heads}) ~ &= ~ P(\text{first toss is a head}) P(\text{second toss is a head} \mid \text{first toss is a head}) \\ &= ~ \frac{1}{2} P(\text{second toss is a head} \mid \text{first toss is a head}) \end{align*}

Therefore

P(second toss is a headfirst toss is a head) = 23 > 12P(\text{second toss is a head} \mid \text{first toss is a head}) ~ = ~ \frac{2}{3} ~ > ~ \frac{1}{2}

Knowing that the first toss is a head is telling us something about XX. Our updated opinion about XX is no longer uniform: we now lean towards higher values of XX, which is then reflected is the chance that the second toss is also a head. We will quantify this in the next section.