Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Suppose NHN_H is the number of heads in 100 tosses of a coin, and NTN_T the number of tails. Then NHN_H and NTN_T are far from independent. They are linear functions of each other because NT=100NHN_T = 100 - N_H.

The same is true of any fixed number of tosses: if you know the number of heads, then you also know the number of tails.

In any fixed number of Bernoulli trials, the number of successes and the number of failures are as dependent as it gets. If you know one, you know the other.

However, something remarkable happens when the number of trials is itself random and has a Poisson distribution. After we see what happens, we will be able to understand why it matters.

🎥 See More
Loading...

7.2.1Randomizing the Number of Bernoulli Trials

Let NN have the Poisson (μ)(\mu) distribution, let SS be the number of successes in NN i.i.d. Bernoulli (p)(p) trials. More formally:

  • Given N=0N = 0, define SS to be 0 with probability 1. Given that there are no trials, there are also no successes.

  • For n1n \ge 1, let the conditional distribution of SS given N=nN = n be binomial (n,p)(n, p).

Then the joint distribution of NN and SS is given by:

P(N=n,S=s) = eμμnn!n!s!(ns)!ps(1p)ns,  0snP(N=n, S=s) ~ = ~ e^{-\mu}\frac{\mu^n}{n!} \cdot \frac{n!}{s!(n-s)!} p^s(1-p)^{n-s}, ~~ 0 \le s \le n

You should check that the formula is correct when n=0n=0.

We can sum the terms in this joint distribution appropriately to get the marginal distribution of SS.

🎥 See More
Loading...

7.2.2A Poisson Number of Successes

The possible values of SS are 0,1,2,0, 1, 2, \ldots with no upper limit because there is no upper limit on the possible values of NN. For s0s \ge 0,

P(S=s)=n=sP(N=n,S=s)=n=seμμnn!n!s!(ns)!psqns    where q=1p=eμμspss!n=sμnsqns(ns)!=eμ(μp)ss!n=s(μq)ns(ns)!=eμ(μp)ss!j=0(μq)jj!=eμ(μp)ss!eμq=eμp(μp)ss!  because μp+μq=μ\begin{align*} P(S = s) &= \sum_{n=s}^\infty P(N=n, S=s) \\ \\ &= \sum_{n=s}^\infty e^{-\mu}\frac{\mu^n}{n!} \cdot \frac{n!}{s!(n-s)!} p^sq^{n-s} ~~~~ \text{where } q = 1-p \\ \\ &= e^{-\mu} \frac{\mu^sp^s}{s!} \sum_{n=s}^\infty \frac{\mu^{n-s}q^{n-s}}{(n-s)!} \\ \\ &= e^{-\mu} \frac{(\mu p)^s}{s!} \sum_{n=s}^\infty \frac{(\mu q)^{n-s}}{(n-s)!} \\ \\ &= e^{-\mu} \frac{(\mu p)^s}{s!} \sum_{j=0}^\infty \frac{(\mu q)^j}{j!} \\ \\ &= e^{-\mu} \frac{(\mu p)^s}{s!} e^{\mu q} \\ \\ &= e^{-\mu p} \frac{(\mu p)^s}{s!} ~~ \text{because } \mu p+ \mu q = \mu \end{align*}

Thus the distribution of SS is Poisson with parameter μp\mu p.

Notice what we have just proved.

  • If the number of trials nn is fixed, you know that the distribution of the number of successes is binomial (n,p)(n, p).

  • But if the the number of trials is random with a Poisson (μ)(\mu) distribution, then the distribution of the number of successes is Poisson (μp)(\mu p).

This is a major step in Poissonizing the binomial.

The best is yet to come, but let’s take a moment to look at the result numerically. Suppose you run a Poisson (12)(12) number of i.i.d. Bernoulli (1/3)(1/3) trials. Then the number of trials is most likely to be somewhere around 12, but you can’t say exactly what it will be because it’s random. What we have shown is that the number of successes is Poisson with parameter 12×13=412 \times \frac{1}{3} = 4.

The parameter 4 is not hard to understand intuitively. You’re most likely to see around 12 trials, and about 1/3 of them are going to be successes, so you’re most likely to see around 4 successes.

🎥 See More
Loading...

7.2.3Successes and Failures are Independent

Yes, you read that right. If you run a Poisson number of i.i.d. Bernoulli trials, then the number of successes and the number of failures are independent.

Randomizing parameters (in this case the number of trials) can have a dramatic effect on the relations between random variables.

Let’s prove our result, and then we will see a way in which it is used.

Suppose as before that we are running NN i.i.d. Bernoulli (p)(p) trials, where NN has the Poisson (μ)(\mu) distribution independent of the results of the trials. Also as before, let SS be the number of successes.

Now let FF be the number of failures. Then the distribution of FF is Poisson (μq)(\mu q) where q=1pq = 1-p. This follows by redefining “success” as “failure” in our previous argument.

The joint distribution of SS and FF is

P(S=s,F=f)=P(N=s+f,S=s)=eμμs+f(s+f)!(s+f)!s!f!psqf=(eμp(μp)ss!)(eμq(μq)ff!)=P(S=s)P(F=f)\begin{align*} P(S = s, F = f) &= P(N = s+f, S = s) \\ \\ &= e^{-\mu} \frac{\mu^{s+f}}{(s+f)!} \frac{(s+f)!}{s!f!} p^s q^f \\ \\ &= \big( e^{-\mu p} \frac{ (\mu p)^s}{s!} \big) \big( e^{-\mu q} \frac{ (\mu q)^f}{f!} \big) \\ \\ &= P(S = s)P(F = f) \end{align*}

This shows that SS and FF are independent.

7.2.4Summary: Poissonization of the Binomial

Suppose you run NN i.i.d. Bernoulli (p)(p) trials, where NN has the Poisson (μ)(\mu) distribution independent of the results of the trials. Let SS be the number of successes and FF the number of failures, and let q=1pq = 1-p. Then:

  • SS has the Poisson (μp)(\mu p) distribution

  • FF has the Poisson (μq)(\mu q) distribution

  • SS and FF are independent

For example, suppose 90% of the individuals in a population are of Class A and 10% are of Class B. Suppose you draw NN times at random with replacement from the population, where NN has the Poisson (20)(20) distribution independent of the results of your draws. Then in your sample,

  • the number of people of Class A has the Poisson (18)(18) distribution,

  • the number in Class B has the Poisson (2)(2) distribution,

  • and the counts in the two classes are independent.

Thus for example the chance that each class appears at least five times in your sample is

(i=5e1818ii!)(j=5e22jj!) = (1i=04e1818ii!)(1j=04e22jj!)\big( \sum_{i=5}^\infty e^{-{18}} \frac{18^i}{i!} \big) \big( \sum_{j=5}^\infty e^{-{2}} \frac{2^j}{j!} \big) ~ = ~ \big(1 - \sum_{i=0}^4 e^{-{18}} \frac{18^i}{i!} \big) \big(1- \sum_{j=0}^4 e^{-{2}} \frac{2^j}{j!} \big)

This is just over 5%.

(1 - stats.poisson.cdf(4, 18))*(1 - stats.poisson.cdf(4, 2))
0.052648585218160585