# HIDDEN
import warnings
warnings.filterwarnings('ignore')
from datascience import *
from prob140 import *
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
from scipy import stats
import warnings
warnings.filterwarnings("ignore")
from IPython.display import YouTubeVideo
YouTubeVideo('YU3ShHl3wE8')
Loading...
🎥 See More
Loading...
Let f be a non-negative function on the real number line and suppose
∫−∞∞f(x)dx=1
Then f is called a probability density function or just density for short.
In the next section we will discuss the reason behind the name. For now, imagine the graph of f as a kind of continuous probability histogram. We will soon make that precise, but notice that by definition the total area under a density curve has to be 1.
Answer
Non-negative, total area under the graph is 1
As an example, the function f defined by
f(x)=⎩⎨⎧0if x≤06x(1−x)if 0<x<10if x≥1
is a density. It is easy to check by calculus that it integrates to 1.
Note: The calculus used in this text is very straightforward. You should be able to do it easily by hand. Later in this chapter we will give you some Python tools for calculus. We will also show how understanding probability can help us do calculus quickly.
Here is a graph of the function f. The density puts all the probability on the unit interval.
In the example above, f(0.5)=6/4=1.5>1. Indeed, there are many values of x for which f(x)>1. So the values of f are clearly not probabilities.
Then what are they? We’ll study that in the next section. In this section we will see that we can work with densities just as we did with the normal curve.
First, a labor-saving device: If f is positive only on a subinterval of the line, then usually we will just write its definition on the interval where it is positive. It will be assumed to be 0 elsewhere.
f(x)=6x(1−x),0<x<1
And we will draw the graph of f only over the region where it is positive:
# NO CODE
plt.plot(x, f(x), color='darkblue', lw=2)
plt.xlabel('$x$')
plt.ylabel('$f(x)$', rotation=0);
A random variable X is said to have density f if for every pair a<b,
P(a<X≤b)=∫abf(x)dx
This integral is the area between a and b under the density curve. The graph below shows the area corresponding to P(0.6<X≤0.8) for a random variable X that has the density in our example.
# NO CODE
plt.plot(x, f(x), color='darkblue', lw=2)
w = np.arange(0.6, 0.801, 0.01)
plt.fill_between(w, f(w), color='gold')
plt.xlabel('$x$')
plt.ylabel('$f(x)$', rotation=0)
plt.title('Gold Area = $P(0.6 < X \leq 0.8)$');
In terms of the graph of the density, F(x) is all the area to the left of x under the density curve. The graph below shows the area corresponding to F(0.8).
# NO CODE
plt.plot(x, f(x), color='darkblue', lw=2)
w = np.arange(0, 0.801, 0.01)
plt.fill_between(w, f(w), color='gold')
plt.xlabel('$x$')
plt.ylabel('$f(x)$', rotation=0)
plt.title('Gold Area $= P(X \leq 0.8) = F(0.8)$');
P(X≤0.8)=F(0.8)=3⋅0.82−2⋅0.83=0.896
As before, the cdf can be used to find probabilities of intervals. For every pair a<b,
P(a<X≤b)=F(b)−F(a)
# NO CODE
plt.plot(x, f(x), color='darkblue', lw=2)
w = np.arange(0, 0.601, 0.01)
plt.fill_between(w, f(w), color='darkblue')
w = np.arange(0.6, 0.801, 0.01)
plt.fill_between(w, f(w), color='gold')
plt.xlabel('$x$')
plt.ylabel('$f(x)$', rotation=0)
plt.title('Gold Area $= P(0.6 < X \leq 0.8) = F(0.8) - F(0.6)$');