In the Bayesian world, unknown parameters are random variables, not constants. Bayesians describe their degree of uncertainty about an unknown quantity by specifying a probability distribution for that quantity.
For example, if we are tossing a coin that has an unknown probability of landing heads, we can think of that unknown probability as a random variable with possible values in the unit interval, instead of an unknown but fixed number.
This change of paradigm leads to an entirely different approach to inference, for which we need some technique.
20.2.1Conditioning on a Continuous Variable¶
Let’s take a moment for a general discussion about conditioning on a continuous variable. Our observations will parallel discussions in an earlier chapter where we found conditional densities.
Suppose is a random variable and is an event that depends on .
If is a discrete random variable, then for any possible value of the quantity has a clear definition by the division rule:
When has a density, the denominator is 0. In this case there is one main idea to keep in mind:
is essentially constant regardless of exactly where the infinitesimal interval is placed relative to . This constant value will be denoted .
So for continuous , we will define
We are assuming that the limit of the right hand side as goes to 0 exists and doesn’t depend exactly how is defined: around , or to the left of , or to the right, and so on. This will be true under regularity conditions. You can just assume it works.
We can now talk about tossing a coin that has a random probability of landing heads.
Suppose a coin lands heads with probability where has density on the unit interval. This means that conditionally given X=p, the tosses are i.i.d. Bernoulli random variables.
A good mental image is of picking a value of according to the density , then repeatedly tossing a coin that lands heads with that given probability . Keep in mind that is chosen once, and then the same coin is tossed repeatedly.
Let be the event that the first toss lands heads. Then by our definition, . Notice that this is the conditional chance of given the observed value of the random probability. It is not the unconditional chance of heads. That requires a calculation that we will do shortly.
Let be the event that the first two tosses land heads. Then .
In general, let be the event that out of the the first tosses land heads. Then .
Our familiar binomial probabilities are now conditional probabilities given the chance of heads.
We can find the unconditional probabilities as weighted averages of these conditional probabilities, as follows.
20.2.2Average Conditional Probabilities¶
Let have density and let be an event. Then
So
In more compact notation, . This is an example of finding expectation by conditioning.
20.2.3Example: One Toss of a Random Coin¶
Let have any density on the unit interval . Think of the value of as the the probability that a coin lands heads. Toss the coin once. Recall that our definition of “given ” means that
Let have density . Then
Thus if is uniform on , then the chance that the coin lands heads is . If has the beta distribution then the chance that the coin lands heads is .
20.2.4Example: Two Tosses of a Random Coin¶
Let be uniform on . Given , toss a -coin twice and observe the results of the tosses.
We have just observed that . The first toss behaves like the toss of a fair coin. The same calculation shows that the chance that the second toss is a head (based on no knowledge of the first toss) is also .
Now let’s figure out the chance that both the tosses land heads. We know that . So
That’s greater than which is the chance of two heads given that you are tossing a fair coin twice. The results of the two tosses are not independent.
Let’s see what’s going on here. We know that
Therefore
Knowing that the first toss is a head is telling us something about . Our updated opinion about is no longer uniform: we now lean towards higher values of , which is then reflected is the chance that the second toss is also a head. We will quantify this in the next section.