Calculating expectation by plugging into the definition works in simple cases, but often it can be cumbersome or lack insight. The most powerful result for calculating expectation turns out not to be the definition. It looks rather innocuous:
8.4.1Additivity of Expectation¶
Let and be two random variables defined on the same probability space. Then
Before we look more closely at this result, note that we are assuming that all the expectations exist; we will do this throughout in this course.
And now note that there are no assumptions about the relation between and . They could be dependent or independent. Regardless, the expectation of the sum is the sum of the expectations. This makes the result powerful.
🎥 See More
Additivity follows easily from the definition of and the definition of expectation on the domain space. First note that the random variable is the function defined by
Thus a “value of weighted by the probability” can be written as
Sum the two sides over all to prove additivty of expecation.
Answer
(a) 2 because
(b) -7
By induction, additivity extends to any finite number of random variables. If are random variables defined on the same probability space, then
regardless of the dependence structure of .
If you are trying to find an expectation, then the way to use additivity is to write your random variable as a sum of simpler variables whose expectations you know or can calculate easily.
8.4.2 for a Poisson Variable ¶
Let have the Poisson distribution. In earlier sections we showed that and .
Now . The random variables and are both functions of , so they are not independent of each other. But additivity of expectation doesn’t require independence, so we can use it to see that
We will use this fact later when we study the variability of .
It is worth noting that it is not easy to calculate directly, since
is not an easy sum to simplify.
8.4.3Sample Sum¶
Let be a sample drawn at random from a numerical population that has mean , and let the sample sum be
Then, regardless of whether the sample was drawn with or without replacement, each has the same distribution as the population. This is clearly true if the sampling is with replacement, and it is true by symmetry if the sampling is without replacement as we saw in an earlier chapter.
So, regardless of whether the sample is drawn with or without replacement, for each , and hence
We can use this to estimate a population mean based on a sample mean.
8.4.4Unbiased Estimator¶
Suppose a random variable is being used to estimate a fixed numerical parameter . Then is called an estimator of .
The bias of is the difference . The bias measures the amount by which the estimator exceeds the parameter, on average. The bias can be negative if the estimator tends to underestimate the parameter.
If the bias of an estimator is 0 then the estimator is called unbiased. So is an unbiased estimator of if .
If an estimator is unbiased, and you use it to generate estimates repeatedly and independently, then in the long run the average of all the estimates is equal to the parameter being estimated. On average, the unbiased estimator is neither higher nor lower than the parameter. That’s usually considered a good quality in an estimator.
In practical terms, if a data scientist wants to estimate an unknown parameter based on a random sample , the data scientist has to come up with a statistic to use as the estimator.
Recall from Data 8 that a statistic is a number computed from the sample. In other words, a statistic is a numerical function of .
Constructing an unbiased estimator of a parameter therefore amounts to finding a statistic for a function such that .
8.4.5Unbiased Estimators of a Population Mean¶
As in the sample sum example above, let be the sum of a sample drawn at random from a population that has mean . The standard statistical notation for the average of is . So
Then, regardless of whether the draws were made with replacement or without,
Thus the sample mean is an unbiased estimator of the population mean.
It is worth noting that is also an unbiased estimator of , since . So is for any , also , or any linear combination of the sample if the coefficients add up to 1.
But it seems clear that using the sample mean as the estimator is better than using just one sampled element, even though both are unbiased. This is true, and is related to how variable the estimators are. We will address this later in the course.
Answer
Yes
🎥 See More
8.4.6First Unbiased Estimator of a Maximum Possible Value¶
Suppose we have a sample drawn at random from for some fixed , and we are trying to estimate .
How can we use the sample to construct an unbiased estimator of ? By definition, such an estimator must be a function of the sample and its expectation must be .
In other words, we have to construct a statistic that has expectation .
Each has the uniform distribution on . This is true for sampling with replacement as well as for simple random sampling, by symmetry.
The expectation of each of the uniform variables is , as we have seen earlier. So if is the sample mean, then
Clearly, is not an unbiased estimator of . That’s not surprising because is the maximum possible value of each observation and should be somewhere in the middle of all the possible values.
But because is a linear function of , we can figure out how to create an unbiased estimator of .
Remember that our job is to create a function of the sample in such a way that the expectation of that function is .
Start by inverting the linear function, that is, by isolating in the equation above.
This tells us what we have to do to the sample to get an unbiased estimator of .
We should just use the statistic as the estimator. It is unbiased because by the calculation above.
Answer
1; overestimate
8.4.7Second Unbiased Estimator of the Maximum Possible Value¶
The calculation above stems from a problem the Allied forces faced in World War II. Germany had a seemingly never-ending fleet of Panzer tanks, and the Allies needed to estimate how many they had. They decided to base their estimates on the serial numbers of the tanks that they saw.
Here is a picture of one from Wikipedia.

Notice the serial number on the top left. When tanks were disabled or destroyed, it was discovered that their parts had serial numbers too. The ones from the gear boxes proved very useful.
The idea was to model the observed serial numbers as random draws from and then estimate . This is of course a very simplified model of reality. But estimates based on even such simple probabilistic models proved to be quite a bit more accurate than those based on the intelligence gathered by the Allies. For example, in August 1942, intelligence estimates were that Germany was producing 1,550 tanks per month. The prediction based on the probability model was 327 per month. After the war, German records showed that the actual production rate was 342 per month.
The model was that the draws were made at random without replacement from the integers 1 through .
In the example above, we constructed the random variable to be an unbiased estimator of under this model.
The Allied statisticians instead started with , the sample maximum:
The sample maximum is a biased estimator of , because we know that its value is always less than or equal to . Its average value therefore will be somewhat less than .
To correct for this, the Allied statisticians imagined a row of spots for the serial numbers 1 through , with marks at the spots corresponding to the observed serial numbers. The visualization below shows an outcome in the case and .

There are spots in all.
From these, we take a simple random sample of size . Those are the gold spots.
The remaining spots are colored blue.
The sampled spots create blue “gaps” between sampled values: one before the leftmost gold spot, two between successive gold spots, and one after the rightmost gold spot that is at position .
A key observation is that because of the symmetry of simple random sampling, the lengths of all four gaps have the same distribution.
But of course we don’t get to see all the gaps. In the sample, we can see all but the last gap, as in the figure below. The red question mark reminds you that the gap to the right of is invisible to us.

If we could see the gap to the right of , we would see . But we can’t. So we can try to do the next best thing, which is to augment by the estimated size of that gap.
Since we can see all of the spots and their colors up to and including , we can see out of the gaps. The lengths of the gaps all have the same distribution by symmetry, so we can estimate the length of a single gap by the average length of all the gaps that we can see.
We can see spots, of which are the sampled values. So the total length of all visible gaps is . Therefore
So the Allied statisticians decided to improve upon by using the augmented maximum as their estimator:
By algebra, this estimator can be rewritten as
Is an unbiased estimator of ? To answer this, we have to find its expectation. Since is a linear function of , we’ll find the expectation of first.
Here once again is the visualization of what’s going on.

Let be the length of the last gap. Then .
There are gaps, made up of the unsampled values. Since they all have the same expected length,
So
Recall that the Allied statisticians’ estimate of is
Now
Thus the augmented maximum is an unbiased estimator of .
Answer
3
8.4.8Which Estimator to Use?¶
The Allied statisticians thus had two unbiased estimators of from which to choose. They went with instead of because has less variability.
We will quantify this later in the course. For now, here is a simulation of distributions of the two estimators in the case and . The simulation is based on 5000 repetitions of drawing a simple random sample of size 30 from the integers 1 through 300.
compare_T1_T2(300, 30, 5000)
You can see why is a better estimator than .
Both are unbiased. So both the empirical histograms are balanced at around 300, the true value of .
The emipirical distribution of is clustered much closer to the true value 300 than the empirical distribution of .
For a recap, take another look at the accuracy table of the Allied statisticians’ estimator . Not bad for an estimator based on a model that assumes nothing more complicated than simple random sampling!