Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

As in the previous section, let XX have the beta (r,s)(r, s) prior, and given X=pX = p let the SnS_n be the number of heads in the first nn tosses of a pp-coin.

All the calculations we carried out in the previous section were under the condition that Sn=kS_n = k, but we never needed to find the probability of this event. It was part of the constant that made the posterior density of XX integrate to 1.

We can now find P(Sn=k)P(S_n = k) by writing the posterior density in two ways:

  • By recalling that it is the beta (r+k,s+nk)(r+k, s+n-k) density:

fXSn=k(p) = C(r+k,s+nk)pr+k1(1p)s+nk1,    0<p<1f_{X \vert S_n=k} (p) ~ = ~ C(r+k, s+n-k)p^{r+k-1}(1-p)^{s+n-k-1}, ~~~~ 0 < p < 1
  • By using Bayes’ Rule:

fXSn=k(p) = C(r,s)pr1(1p)s1(nk)pk(1p)nkP(Sn=k),    0<p<1f_{X \vert S_n=k} (p) ~ = ~ \frac{C(r, s) p^{r-1}(1-p)^{s-1} \cdot \binom{n}{k} p^k (1-p)^{n-k}}{P(S_n = k)}, ~~~~ 0 < p < 1

Now equate constants:

C(r,s)(nk)P(Sn=k) = C(r+k,s+nk)\frac{C(r, s) \binom{n}{k}}{P(S_n = k)} ~ = ~ C(r+k, s+n-k)

21.2.1Beta-Binomial Probabilities

So for kk in the range 0 through nn,

P(Sn=k) = (nk)C(r,s)C(r+k,s+nk)P(S_n = k) ~ = ~ \binom{n}{k} \frac{C(r, s)}{C(r+k, s+n-k)}

where C(r,s)C(r,s) is the constant in the beta (r,s)(r, s) density, given by

C(r,s) = Γ(r+s)Γ(r)Γ(s)C(r, s) ~ = ~ \frac{\Gamma(r+s)}{\Gamma(r)\Gamma(s)}

That’s not as awful as it looks. A better way to think of the formula is

P(Sn=k) = (nk)constant in the prior betaconstant in the posterior beta given k heads in n tossesP(S_n = k) ~ = ~ \binom{n}{k} \frac{\text{constant in the prior beta}}{\text{constant in the posterior beta given }k \text{ heads in } n \text{ tosses}}

This discrete distribution is called the beta-binomial distribution with parameters rr, ss, and nn. It is the distribution of the number of heads in nn tosses of a coin that lands heads with a probability picked according to the beta (r,s)(r, s) distribution.

🎥 See More
Loading...

One (r,s)(r, s) pair is particularly interesting: r=s=1r = s = 1. That’s the case when XX has the uniform prior. The distribution of SnS_n reduces to

P(Sn=k) = n!k!(nk)!1!0!0!k!(nk)!(n+1)! = 1n+1P(S_n = k ) ~ = ~ \frac{n!}{k!(n-k)!} \cdot \frac{1!}{0!0!} \cdot \frac{k!(n-k)!}{(n+1)!} ~ = ~ \frac{1}{n+1}

There’s no kk in the answer! The conclusion is that if you choose pp uniformly between 0 and 1 and toss a pp-coin nn times, the distribution of the number of heads is uniform on {0,1,2,,n}\{ 0, 1, 2, \ldots, n\}.

If you choose pp uniformly between 0 and 1, then for the conditional distribution of SnS_n given that pp was the selected value is binomial (n,p)(n, p). But the unconditional distribution of SnS_n is uniform.

21.2.2Checking by Integration

If you prefer, you can find the distribution of SnS_n directly, by conditioning on XX.

P(Sn=k) =01P(Sn=kX=p)fX(p)dp= 01(nk)pk(1p)nkC(r,s)pr1(1p)s1dp= (nk)C(r,s)01pr+k1(1p)s+nk1dp= (nk)C(r,s)1C(r+k,s+nk)\begin{align*} P(S_n = k) ~ &= \int_0^1 P(S_n = k \mid X = p)f_X(p)dp \\ \\ &= ~ \int_0^1 \binom{n}{k} p^k(1-p)^{n-k}C(r, s)p^{r-1}(1-p)^{s-1}dp \\ \\ &= ~ \binom{n}{k} C(r, s) \int_0^1 p^{r+k-1}(1-p)^{s+n-k-1} dp \\ \\ &= ~ \binom{n}{k} C(r, s) \frac{1}{C(r+k, s+n-k)} \end{align*}

21.2.3Expectation

Given X=pX = p, the conditional distribution of SnS_n is binomial (n,p)(n, p). Therefore

E(SnX=p) = npE(S_n \mid X = p) ~ = ~ np

or, equivalently,

E(SnX) = nXE(S_n \mid X) ~ = ~ nX

By iteration,

E(Sn) = E(nX) = nE(X) = nrr+sE(S_n) ~ = ~ E(nX) ~ = ~ nE(X) ~ = ~ n\frac{r}{r+s}

The expected proportion of heads in nn tosses is

E(Snn) = rr+sE\big( \frac{S_n}{n} \big) ~ = ~ \frac{r}{r+s}

which is the expectation of the prior distribution of XX.

In the next section we will examine the long run behavior of this random proportion.

21.2.4Endnote

The unconditional probability P(Sn=k)P(S_n = k) appeared in the denominator of our calculation of the posterior density of XX given SnS_n. Because of the simplifications that result from using conjugate priors, we were able to calculate the denominator in a couple of different ways. But often the calculation can be intractable, especially in high dimensional settings. Methods of dealing with this problem are covered in more advanced courses.