Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Working with densities involves calculus which can sometimes be time-consuming. This course gives you two ways of reducing the amount of calculus involved.

  • Probabilistic methods can help reduce algebra and calculus. You’ve seen this with algebra in the discrete case. You’ll see it with calculus as we learn more about densities.

  • Python has a symbolic math module called SymPy that does algebra, calculus, and much other symbolic math. In this section we will show you how to do calculus using SymPy.

We will demonstrate the methods in the context of an example. Suppose XX has density given by

f(x)={105x2(1x)4   if 0x10         otherwisef(x) = \begin{cases} 105x^2(1-x)^4 ~~~ \text{if } 0 \le x \le 1 \\ 0 ~~~~~~~~~ \text{otherwise} \end{cases}

As you can see from its graph below, ff could be used to model the distribution of a random proportion that you think is likely to be somewhere between 0.2 and 0.4.

<matplotlib.figure.Figure at 0x1a187a5128>

The density ff is a polynomial on the unit interval, and in principle the algebra and calculus involved in integrating it are straightforward. But they are tedious. So let’s get SymPy to do the work.

First, we will import all the functions in SymPy and set up some printing methods that make the output look nicer than the retro typewritten pgf output you saw in a previous section. In future sections of this text, you can assume that this importing and initialization will have been done at the start.

from sympy import *
init_printing()

Next, we have to create tell Python that an object is symbolic. In our example, the variable xx is the natural candidate to be a symbol. You can use Symbol for this, by using the argument 'x'. We have assinged the symbol to the name x.

x = Symbol('x')

Now we will assign the name density to the expression that defines ff. The expression looks just like a numerical calculation, but the output is algebraic!

density = 105 * x**2 * (1-x)**4
density
Loading...

That’s the expression for f(x)f(x) defined by the equation at the start of the section. Notice that what we naturally think of as 1x1 - x is expressed as x+1-x + 1. That’s because SymPy is writing the polynomial leading with the term of highest degree.

Let’s not simply accept that this function is a density. Let’s check that it is a density by integrating it from 0 to 1. To display this, we use the method Integral that takes the name of a function and a tuple (a sequence in parentheses) consisting of the variable of integration and the lower and upper limits of integration. We have assigned this integral to the name total_area.

total_area = Integral(density, (x, 0, 1))
total_area
Loading...

The output of displays the integral, which is nice, but what we really want is its numerical value. In SymPy, this is achieved by abruptly instructing the method to doit().

total_area.doit()
Loading...

This confirms that the function ff is a density.

We can use Integral to find the chance that XX is in any interval. Here is P(0.2<X<0.4)P(0.2 < X < 0.4).

prob_02_04 = Integral(density, (x, 0.2, 0.4)).doit()
prob_02_04
Loading...

For xx in the unit interval, the cdf of XX is

F(x) = P(Xx) = 0xf(s)ds = I(s) 0x = I(x)I(0)F(x) ~ = ~ P(X \le x) ~ = ~ \int_0^x f(s)ds ~ = ~ I(s)~ \Big\rvert_0^x ~ = ~ I(x) - I(0)

where II is the indefinite integral of ff.

To get the indefinite integral, simply ask SymPy to integrate the density; there are no limits of integration.

indefinite = Integral(density).doit()
indefinite
Loading...

Now F(x)=I(x)I(0)F(x) = I(x) - I(0). You can see at a glance that I(0)=0I(0) = 0 but here is how SymPy would figure that out.

To evaluate I(0)I(0), SymPy must substitute xx with 0 in the expression for II. This is achieved by the method subs that takes the variable as its first argument and the specified value as the second.

I_0 = indefinite.subs(x, 0)
I_0
Loading...
cdf = indefinite - I_0
cdf
Loading...

To find the value of the cdf at a specified point, say 0.4, we have to substitute xx with 0.4 in the formula for the cdf.

cdf_at_04 = cdf.subs(x, 0.4)
cdf_at_04
Loading...

Thus P(X0.4)P(X \le 0.4) is roughly 58%. Earlier we calulated P(0.2<X<0.4)=43.2%P(0.2 < X < 0.4) = 43.2\%, which we can confirm by using the cdf:

cdf_at_02 = cdf.subs(x, 0.2)
cdf_at_04 - cdf_at_02
Loading...

The expectation E(X)E(X) is a definite integral from 0 to 1:

expectation = Integral(x*density, (x, 0, 1)).doit()
expectation
Loading...

Notice how simple the answer is. Later in the course, you will see why.

Here is E(X2)E(X^2), which turns out to be another simple fraction. Clearly, the density ff has interesting properties. We will study them later. For now, let’s just get the numerical answers.

expected_square = Integral((x**2)*density, (x, 0, 1)).doit()
expected_square
Loading...

Now you can find SD(X)SD(X).

sd = (expected_square - expectation**2)**0.5
sd
Loading...

15.5.1SymPy and the Exponential Density

One of the primary distributions in probability theory, the exponential distribution has a positive parameter λ\lambda known as the “rate”, and density given by

f(t) =λeλt,   t0f(t) ~ = \lambda e^{-\lambda t}, ~~~ t \ge 0

The density is 0 on the negative numbers. Here is its graph when λ=3\lambda = 3.

<matplotlib.figure.Figure at 0x1a20cb2588>

To check that ff is a density, we have to confirm that its integral is 1. We will start by constructing two symbols, t and lamda. Notice the incorrectly spelled lamda instead of lambda. That is because lambda has another meaning in Python, as some of you might know.

Note the use of positive=True to specify that the symbol can take on only positive values.

t = Symbol('t', positive=True)
lamda = Symbol('lamda', positive=True)

Next we construct the expression for the density. Notice the use of exp for the exponential function.

expon_density = lamda * exp(-lamda * t)
expon_density
Loading...

To see that the function is a density, we can check that its integral from 0 to \infty is 1. The symbol that SymPy uses for \infty is oo, a double lower case o. It looks very much like \infty.

Integral(expon_density, (t, 0, oo)).doit()
Loading...

Suppose TT has the exponential (λ)(\lambda) density. Then for t0t \ge 0 the cdf of TT is

FT(t) = P(Tt) = 0tλeλsdsF_T(t) ~ = ~ P(T \le t) ~ = ~ \int_0^t \lambda e^{-\lambda s}ds

This is a straightforward integral that you can probably do in your head. However, let’s get some more practice using SymPy to find cdf’s. We will use the same method that we used to find the cdf in the previous example.

0tλeλsds = I(t)I(0)\int_0^t \lambda e^{-\lambda s}ds ~ = ~ I(t) - I(0)

where II is the indefinite integral of the density. To get this indefinite integral we will use Integral as before, except that this time we must specify t as the variable of integration. That is because SymPy sees two symbols t and lamda in the density, and doesn’t know which one is the variable unless we tell it.

indefinite = Integral(expon_density, t).doit()
indefinite
Loading...

Now use FT(t)=I(t)I(0)F_T(t) = I(t) - I(0):

I_0 = indefinite.subs(t, 0)
I_0
Loading...
cdf = indefinite - I_0
cdf
Loading...

Thus the cdf of TT is

FT(t) = 1eλtF_T(t) ~ = ~ 1 - e^{-\lambda t}

The expectation of TT is

E(T) = 0tλeλtdt = 1λE(T) ~ = ~ \int_0^\infty t \lambda e^{-\lambda t} dt ~ = ~ \frac{1}{\lambda}

which you can check by integration by parts. But SymPy is faster:

expectation = Integral(t*expon_density, (t, 0, oo)).doit()
expectation
Loading...

Calculating E(T2)E(T^2) is just as easy.

expected_square = Integral(t**2 * expon_density, (t, 0, oo)).doit()
expected_square
Loading...

The variance and SD follow directly.

variance = expected_square - (expectation ** 2)
variance
Loading...
sd = variance ** 0.5
sd
Loading...

That’s a pretty funny way of writing 1λ\frac{1}{\lambda} but we’ll take it. It’s a small price to pay for not having to do all the integrals by hand.