Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In the previous section we learned how to work with joint densities, but many of the joint density functions seemed to appear out of nowhere. For example, we checked that the function

f(x,y)=120x(yx)(1y),    0<x<y<1f(x, y) = 120x(y-x)(1-y), ~~~~ 0 < x < y < 1

is a joint density, but there was no clue where it came from. In this section we will find its origin and go on to develop an important family of densities on the unit interval.

17.4.1Order Statistics of IID Uniform (0,1)(0, 1) Variables

Let U1,U2,,UnU_1, U_2, \ldots, U_n be i.i.d. uniform on (0,1)(0, 1). Imagine each UiU_i as the position of a dart thrown at the unit interval. The graph below shows the positions of five such darts, each shown as a star.

<Figure size 432x288 with 1 Axes>

Based on the graph above, can you tell which star corresponds to U1U_1? You can’t, because U1U_1 could be any of the five stars. So also you can’t identify any of the five variables U1,U2,U3,U4,U5U_1, U_2, U_3, U_4, U_5.

What you can see, however, is the list of UiU_i’s sorted in increasing order. You can see the value of the minimum, the second on the sorted list, the third, the fourth, and finally the fifth which is the maximum.

These are called the order statistics of U1,U2,U3,U4,U5U_1, U_2, U_3, U_4, U_5, and are denoted U(1),U(2),U(3),U(4),U(5)U_{(1)}, U_{(2)}, U_{(3)}, U_{(4)}, U_{(5)}.

Remember that because the UiU_i’s are independent random variables with densities, there can’t be ties: the chance that two of them are equal is 0.

<Figure size 432x288 with 1 Axes>

In general for 1kn1 \le k \le n, the kkth order statistic of U1,U2,,UnU_1, U_2, \ldots, U_n is the kkth value when the UiU_i’s are sorted in increasing order. This can also be thought of as the kkth ranked value when the minimum has rank 1. It is denoted U(k)U_{(k)}.

🎥 See More
Loading...

17.4.2Joint Density of Two Order Statistics

Let n=5n = 5 as above and let’s try to work out the joint density of U(2)U_{(2)} and U(4)U_{(4)}. That’s the joint density of the second and fourth values on the sorted list.

The graph below shows the event {U(2)dx,U(4)dy}\{U_{(2)} \in dx, U_{(4)} \in dy\} for values xx and yy such that 0<x<y<10 < x < y < 1.

<Figure size 432x288 with 1 Axes>

To find P(U(2)dx,U(4)dy)P(U_{(2)} \in dx, U_{(4)} \in dy), notice that for this event to occur:

  • one of U1,U2,U3,U4,U5U_1, U_2, U_3, U_4, U_5 must be in (0,x)(0, x)

  • one must be in dxdx

  • one must be in (x,y)(x, y)

  • one must be in dydy

  • one must be in (y,1)(y, 1)

You can think of each of the five independent uniform (0,1)(0, 1) variables as a multinomial trial. It can land in any of the five intervals above, independently of the others and with the same chance as the others.

The chances are given by

P(U(0,x))=x,  P(Udx)1dx,  P(U(x,y))=(yx)P(Udy)1dy,  P(U(y,1))=1y\begin{align*} &P(U \in (0, x)) = x, ~~ P(U \in dx) \sim 1dx, ~~ P(U \in (x, y)) = (y-x)\\ &P(U \in dy) \sim 1dy, ~~ P(U \in (y, 1)) = 1-y \end{align*}

where UU is any uniform (0,1)(0, 1) random variable.

Apply the multinomial formula to get

P(U(2)dx,U(4)dy)  5!1!1!1!1!1!x1(1dx)1(yx)1(1dy)1(1y)1 120x(yx)(1y)dxdy\begin{align*} P(U_{(2)} \in dx, U_{(4)} \in dy) ~ &\sim ~ \frac{5!}{1!1!1!1!1!} x^1 (1dx)^1 (y-x)^1 (1dy)^1 (1-y)^1 \\ &\sim ~ 120x(y-x)(1-y)dxdy \end{align*}

and therefore the joint density of U(2)U_{(2)} and U(4)U_{(4)} is given by

f(x,y)=120x(yx)(1y),   0<x<y<1f(x, y) = 120x(y-x)(1-y), ~~~ 0 < x < y < 1

This solves the mystery of how the formula arises.

But it also does much more. The marginal densities of the order statistics of i.i.d. uniform (0,1)(0, 1) variables form a family that is important in data science.

17.4.3The Density of U(k)U_{(k)}

Let U(k)U_{(k)} be the kkth order statistic of U1,U2,,UnU_1, U_2, \ldots, U_n. We will find the density of U(k)U_{(k)} by following the same general process that we followed to find the joint density above.

The graph below displays the event {U(k)dx}\{ U_{(k)} \in dx \}. For the event to occur,

  • One of the variables U1,U2,,UnU_1, U_2, \ldots, U_n has to be in dxdx.

  • Of the remaining n1n-1 variables, k1k-1 must have values in (0,x)(0, x) and the rest in (x,1)(x, 1).

<Figure size 432x288 with 1 Axes>

Apply the multinomial formula again.

P(U(k)dx)  n!(k1)!1!(nk)!xk1(1dx)1(1x)nkP(U_{(k)} \in dx) ~ \sim ~ \frac{n!}{(k-1)! 1! (n-k)!} x^{k-1} (1dx)^1 (1-x)^{n-k}

Therefore the density of U(k)U_{(k)} is given by

fU(k)(x)=n!(k1)!(nk)!xk1(1x)nk,   0<x<1f_{U_{(k)}} (x) = \frac{n!}{(k-1)!(n-k)!} x^{k-1}(1-x)^{n-k}, ~~~ 0 < x < 1

For consistency, let’s rewrite the exponents slightly so that each ends with -1:

fU(k)(x)=n!(k1)!((nk+1)1)!xk1(1x)(nk+1)1,   0<x<1f_{U_{(k)}} (x) = \frac{n!}{(k-1)!((n-k+1)-1)!} x^{k-1}(1-x)^{(n-k+1)-1}, ~~~ 0 < x < 1

Because 1kn1 \le k \le n, we know that nk+1n-k+1 is a positive integer. Since nn is an arbitrary positive integer, so is nk+1n-k+1.

17.4.4Beta Densities

We have shown that if rr and ss are any two positive integers, then the function

f(x) = (r+s1)!(r1)!(s1)!xr1(1x)s1,   0<x<1f(x) ~ = ~ \frac{(r+s-1)!}{(r-1)!(s-1)!} x^{r-1}(1-x)^{s-1}, ~~~ 0 < x < 1

is a probability density function. This is called the beta density with parameters rr and ss.

By the derivation above, the kkth order statistic U(k)U_{(k)} of nn i.i.d. uniform (0,1)(0, 1) random variables has the beta density with parameters kk and nk+1n-k+1.

The shape of the density is determined by the two factors that involve xx. All the factorials are just parts of the constant that make the density integrate to 1.

Notice that the uniform (0,1)(0, 1) density is the same as the beta density with parameters r=1r = 1 and s=1s = 1. The uniform (0,1)(0, 1) density is a member of the beta family.

The graph below shows some beta density curves. As you would expect, the beta (3,3)(3, 3) density is symmetric about 0.5.

x = np.arange(0, 1.01, 0.01)
for i in np.arange(1, 7, 1):
    plt.plot(x, stats.beta.pdf(x, i, 6-i), lw=2)
plt.title('Beta $(i, 6-i)$ densities for $1 \leq i \leq 5$');
<Figure size 432x288 with 1 Axes>

By choosing the parameters appropriately, you can create beta densities that put much of their mass near a prescribed value. That is one of the reasons beta densities are used to model random proportions. For example, if you think that the probability that an email is spam is most likely in the 60% to 90% range, but might be lower, you might model your belief by choosing the density that peaks at around 0.75 in the graph above.

The calculation below shows you how to get started on the process of picking parameters so that the beta density with those parameters has properties that reflect your beliefs.

17.4.5The Beta Integral

The beta density integrates to 1, and hence for all positive integers rr and ss we have

01xr1(1x)s1dx = (r1)!(s1)!(r+s1)!\int_0^1 x^{r-1}(1-x)^{s-1}dx ~ = ~ \frac{(r-1)!(s-1)!}{(r+s-1)!}

Thus probability theory makes short work of an otherwise laborious integral. Also, we can now find the expectation of a random variable with a beta density.

Let XX have the beta (r,s)(r, s) density for two positive integer parameters rr and ss. Then

E(X)=01x(r+s1)!(r1)!(s1)!xr1(1x)s1dx=(r+s1)!(r1)!(s1)!01xr(1x)s1dx=(r+s1)!(r1)!(s1)!r!(s1)!(r+s)!       (beta integral for parameters r+1 and s)=rr+s\begin{align*} E(X) &= \int_0^1 x \frac{(r+s-1)!}{(r-1)!(s-1)!} x^{r-1}(1-x)^{s-1}dx \\ \\ &= \frac{(r+s-1)!}{(r-1)!(s-1)!} \int_0^1 x^r(1-x)^{s-1}dx \\ \\ &= \frac{(r+s-1)!}{(r-1)!(s-1)!} \cdot \frac{r!(s-1)!}{(r+s)!} ~~~~~~~ \text{(beta integral for parameters } r+1 \text{ and } s\text{)}\\ \\ &= \frac{r}{r+s} \end{align*}

You can follow the same method to find E(X2)E(X^2) and hence Var(X)Var(X).

The formula for the expectation allows you to pick parameters corresponding to your belief about the random proportion being modeled by XX. For example, if you think the proportion is likely to be somewhere around 0.4, you might start by trying out a beta prior with r=2r = 2 and s=3s = 3.

You will have noticed that the form of the beta density looks rather like the binomial formula. Indeed, we used the binomial formula to derive the beta density. Later in the course you will see another close relation between the beta and the binomial. These properties make the beta family one of the most widely used families of densities in machine learning.