Distribution of Sample Means
-
Sampling with Replacement
- Example 1: The population from which samples are selected is
{1,2,3,4,5,6}.
This population has a mean
of 3.5 and a standard deviation of 1.70783. The next display shows a histogram of
the
population.
Histogram of Population {1,2,3,4,5,6}
A computer was programmed to take all samples of size 4 (there are 1296) with replacement from this population.
A few of the samples are {1,1,1,1}, {1,1,1,2}, {1,1,1,3},
{1,1,1,4},...,{6,6,6,3}, {6,6,6,4}, {6,6,6,5}, and {6,6,6,6}.
For each of these samples a statistic, the sample mean (i.e. the average of the numbers
in the sample), was computed. The sample means for the first few samples shown above
are 1, 1.25, 1.5, 1.75,...,5.25, 5.5, 5.75, and 6. A histogram of all
1296 sample means is shown next.
Histogram of All Sample Means for Samples of Size 4 with
Replacement Taken from Population {1,2,3,4,5,6}
The mean of these 1296 sample means is 3.5 and the standard deviation of these
1296
sample means is 0.853913.
From the histogram of sample means it appears that the sample means for
samples of size 4 taken with replacement from the population {1,2,3,4,5,6} are
normally distributed, at least approximately.
-
Example 2: The population from which samples are selected is
{1,2,3,3,3,10}.
The observations made in Example 1 may
have been true because the population had a uniform symmetric
shape. This example shows a population that is neither
uniform nor symmetric.
Histogram of Population {1,2,3,3,3,10}
This population has a mean of 3.66667 and a standard
deviation of 2.92499. Then a computer found all 1296 samples of
size 4 with replacement from this population and calculated the mean of
each of these samples. The mean of these 1296 sample means is
3.66667 and the standard deviation is 1.46249.
A histogram of these sample means is shown next.
Histogram of All 1296 Sample Means for Samples of Size 4 Taken
with
Replacement from Population {1,2,3,3,3,10}
This histogram resembles a normal curve but it has some gaps and is
skewed to the right. If a larger sample size had been used the
curve would look more like a normal curve. This is suggested by
the following histogram showing 400 sample means for samples of size 36
taken with replacement from the same population. There are 6^36
sample means altogether--it would take too long to compute all of them,
and that is why only 400 samples are taken and the means computed for
each of them.
Histogram of 400 Sample Means for Samples of Size 36
Taken with
Replacement from Population {1,2,3,3,3,10}
The mean of the 400 sample means is 3.65278 and the
standard deviation of them is 0.498121. The mean of these sample
means is very close to the population mean, 3.66667, and the standard
deviation is close to 2.92499/Sqrt[36] = 2.92499/6 = 0.487498.
These few examples suggest the following concerning the collection of sample means from all random samples of size n taken from a
population, the sampling distribution of sample means:
- In sampling with replacement the mean of all sample means equals the mean of the population:
- When sampling with replacement the standard deviation of all sample means equals the standard deviation of the
population divided by the square root of the sample size when sampling with replacement.
- Whatever the shape of the population distribution, the distribution of sample means is
approximately normal with better approximations as the sample size, n,
increases.
|
This link takes you to a
page which discusses the sampling distribution of sample means. When you reach
the page click the red die in front of exercise 1 to run a simulation showing the
distribution of sample means.
-
Sampling without Replacement
- Example 1: The population from which samples are selected is
{1,2,3,4,5,6}.
A computer selected all samples of size 4 without replacement from this
population. There are 360 such samples. Then the mean of
each sample was taken. The mean of all of these sample means is
3.5, and the standard deviation is 0.540062. So the mean of the
sample means equals the mean of the population from which the samples
are selected. However, the standard deviation does not follow the
rule expressed above. Dividing the population standard deviation
(found in example 1 in the section on sampling with replacement),
1.70783, by the square root of the sample size, 2, results in the number
0.853915, which is not the standard deviation of the sample means,
0.540062.
In sampling without replacement, the formula for the standard deviation
of all sample means for samples of size n must be modified by including
a finite population correction. The formula becomes:
where N is the population size, N=6 in this example, and n is the sample
size, n=4 in this case. The finite population correction is the
the second square root in this formula. Using this formula, you
get the correct standard deviation for the the population of 360 sample
means, namely, 0.540062.
Most of the time sampling is done without replacement. However,
when n, the sample size, is less than 0.05 times the population size, N,
the finite population correction can be dropped. For example, if
N=1000, 0.05N = 0.05 1000=50, so if the sample size is 50 or less, the
finite population correction can be dropped.
The histogram of the 360 sample means is shown next:
Histogram of all 360 Sample Means for Samples of Size
4
Taken without
Replacement from Population {1,2,3,4,5,6}
The distribution of sample means is still approximately
normal.
-
Example 2: The population from which samples are selected is
{1,2,3,3,3,10}
As shown in Example 2 under Sampling with Replacement, this population has a mean of 3.66667 and a standard
deviation of 2.92499. Then a computer found all 360 samples of
size 4 with replacement from this population and calculated the mean of
each of these samples. The mean of these 360 sample means is
3.66667 and the standard deviation is 0.924962. This standard
deviation is related to the standard deviation of the population by
= (2.92499/Sqrt(4)) (Sqrt((6-4)/(6-1)) = (2.92499/2) (Sqrt(2/5) =
0.924962.
A histogram of these 360 sample means is shown next.
Histogram of all 360 Sample Means for Samples of Size
4
Taken without
Replacement from Population {1,2,3,4,5,6}
This distribution is certainly not normally distributed
but it can be shown that when larger samples are taken without replacement
from a population, the sample mean distribution will more closely
approximate a normal distribution. This is shown in the next two
graphs--the first graph shows the histogram of population of size 300 that
certainly appears to be non-normal
Four hundred samples of size 40 were taken from this
population (population mean = 1.21986 and population standard deviation =
1.2654), and the mean of each sample was calculated. A histogram of
these 400 sample means is shown next.
The mean of these sample means is 1.21703 (near 1.21986)
and the standard deviation is 0.196691 (near 1.21986/Sqrt(40) = 0.19288).
Note that the finite population correction factor is
Sqrt((300-40)/(300-1)) = 0.93.
|