Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

When we work with a discrete random variable XX, a natural component of our calculations is the chance that XX has a particular value kk. That’s the probability we denote by P(X=k)P(X = k).

What is the analog of P(X=k)P(X = k) when XX has a density? If your answer is P(X=x)P(X = x) for any number xx, prepare to be disconcerted by the next paragraph.

15.2.0.1If XX Has a Density, Each Individual Value Has Probability 0

If XX has a density, then probabilities are defined as areas under the density curve. The area of a line is zero. So if XX has a density, then for every xx,

P(X=x) = 0P(X = x) ~ = ~ 0

“But XX has to be some value!” is a natural reaction to this. Take a moment now to reflect on the wonders of adding uncountably many zeros. On the real line, each point has length zero but intervals have positive length. On the plane, each line has area zero but rectangles have positive area. Calculus is powerful.

The fact that the chance of any single value is 0 actually reduces some bookkeeping. When we are calculating probabilities involving random variables that have densities, we don’t have to worry about whether we should or should not include endpoints of intervals. The chance of each endpoint is 0, so for example,

F(x) = P(Xx) = P(X<x)   for all xF(x) ~ = ~ P(X \le x) ~ = ~ P(X < x) ~~~ \text{for all } x

Being able to drop the equal sign like this is a major departure from calculations involving discrete random variables; P(X=k)P(X = k) has disappeared. But it does have an analog if we think in terms of infinitesimals.

🎥 See More
Loading...

15.2.1An Infinitesimal Calculation

In the theory of Riemann integration, the area under a curve is calculated by discrete approximation. The interval on the horizontal axis is divided into tiny little segments. Each segment becomes the base of a very narrow rectangle with a height determined by the curve. The total area of all these rectangular slivers is an approximation to the integral. As you make the slivers narrower, the sum approaches the area under the curve.

Let’s examine this in the case of the density we used as our example in the previous section:

f(x) = 6x(1x),   0<x<1f(x) ~ = ~ 6x(1-x), ~~~ 0 < x < 1

Here is one of those narrow slivers.

<Figure size 432x288 with 1 Axes>

We will now set up some notation that will be used repeatedly in the course.

  • xx is a point on the horizontal axis

  • dxdx stands for two things (this considerably simplifies writing):

    • a tiny interval around xx

    • the length of the tiny interval

Now {Xdx}\{X \in dx \} is notation for “XX is in a tiny interval of length dxdx around the point xx”. Don’t worry about exactly what “around” means. It won’t matter as we’ll be taking limits as dxdx goes to 0.

In this notation, the area of the gold sliver is essentially that of a rectangle with height f(x)f(x) and width dxdx. We write

P(Xdx)  f(x)dxP(X \in dx) ~ \sim ~ f(x)dx

where as usual \sim means that the ratio of the two sides goes to 1 as dxdx goes to 0.

We have seen that f(x)f(x) is not a probability. But for a tiny dxdx, the product f(x)dxf(x)dx is essentially the probability that “XX is just around xx”.

This gives us an important analogy. When XX is discrete, then

P(aXb) = k=abP(X=k)P(a \le X \le b) ~ = ~ \sum_{k=a}^b P(X = k)

When XX has density ff, then

P(aXb) = abf(x)dxP(a \le X \le b) ~ = ~ \int_a^b f(x)dx

The calculus notation is clever as well as powerful. It involves two analogies:

  • f(x)dxf(x)dx is the chance the chance that XX is just around xx

  • the integral is a continuous version of the sum

15.2.2Probability Density

We can rewrite P(Xdx)f(x)dxP(X \in dx) \sim f(x)dx as

f(x)  P(Xdx)dxf(x) ~ \sim ~ \frac{P(X \in dx)}{dx}

The function ff represents probability per unit length. That is why ff is called a probability density function.

Let’s take another look at the graph of ff.

<Figure size 432x288 with 1 Axes>

If you simulate multiple independent copies of a random variable that has this density (exactly how to do that will be the subject of the next lab), then for example the simulated values will be more crowded around 0.5 than around 0.2.

The function simulate_f takes the number of copies as its argument and displays a histogram of the simulated values overlaid with the graph of ff.

simulate_f(10000)
<Figure size 432x288 with 1 Axes>

The distribution of 10,000 simulated values follows ff pretty closely.

Compare the vertical scale of the histogram above with the vertical scale of the graph of ff that we drew earlier. You can see that they are the same apart from a conversion of proportions to percents.

Now you have a better understanding of why all histograms in Data 8 are drawn to the density scale, with heights calculated as

height of bar = percent in binwidth of bin\text{height of bar} ~ = ~ \frac{\text{percent in bin}}{\text{width of bin}}

so that the units of height are “percent per unit on the horizontal axis”.

Not only does this way of drawing histograms allow you to account for bins of different widths, as discussed in Data 8, it also leads directly to probability densities of random variables. You can think of the density curve as what the empirical histogram of the simulated values would look like if you had infinitely many simulations and infinitely narrow bins.