Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

Copyright 2010 Pearson Education, Inc.

Chapter 18
Sampling Distribution
Models
Copyright 2010 Pearson Education, Inc.
Take out a piece of paper and be ready to write
down answers to these questions as you watch the
video.
1. What is the difference between parameters and statistics?
2. Does statistical process control inspect all the items produced after they are
finished?
3. The inspector samples five circuit boards at regular intervals and finds the
mean solder quality score x for these five boards. Do we expect x to be exactly
100 if the soldering process is functioning properly?
4. If the quality of individual boards varies according to a normal distribution with
mean = 100 and standard deviation = 4, what will be the distribution of the
sample averages, x ? (Recall the sample size is n = 5.)
5. In general, is the mean of several observations more or less variable than
single observations from a population? Explain.
6. The distribution of call lengths to a call center is strongly skewed. What does
the Central Limit Theorem say about the distribution of the mean call length x
from large samples of calls?
http://www.learner.org/courses/againstallodds/unitpages/unit22.html

Slide 18 - 2
Copyright 2010 Pearson Education, Inc.
Answers
1. Parameters describe an entire population and are generally
unknown. Statistics are computed from samples.
2. 2. No, it might be too expensive to inspect them all. In addition, it is
too costly to wait until the end to examine the finished product. If
somewhere in the process things are out of control, it doesnt make
sense to put in additional money to finish a defective product.
3. No, there will be variability in the mean scores. Sometimes the
mean score will be above 100 and sometimes below 100.
4. The distribution of x will be normal with mean 100 and standard
deviation 4/root 5 .
5. The sample mean is less variable than individual observations.
Thats because when averaging, high observations will balance out
low observations, which makes the mean less variable.
6. The sampling distribution of the sample mean is approximately
normally distributed if the sample size is sufficiently large.
Slide 18 - 3
Copyright 2010 Pearson Education, Inc.
Aim
How do we solve problems involving sampling
distribution models?
Read Chapter 18.
Homework due tomorrow!

Slide 18 - 4
Copyright 2010 Pearson Education, Inc. Slide 18 - 5
The Central Limit Theorem for Sample
Proportions
Rather than showing real repeated samples,
imagine what would happen if we were to actually
draw many samples.
Now imagine what would happen if we looked at
the sample proportions for these samples.
The histogram wed get if we could see all the
proportions from all possible samples is called
the sampling distribution of the proportions.
What would the histogram of all the sample
proportions look like?
Copyright 2010 Pearson Education, Inc. Slide 18 - 6
Modeling the Distribution of
Sample Proportions (cont.)
We would expect the histogram of the sample
proportions to center at the true proportion, p, in
the population.
As far as the shape of the histogram goes, we
can simulate a bunch of random samples that we
didnt really draw.
It turns out that the histogram is unimodal,
symmetric, and centered at p.
More specifically, its an amazing and fortunate
fact that a Normal model is just the right one for
the histogram of sample proportions.

Copyright 2010 Pearson Education, Inc. Slide 18 - 7
Modeling the Distribution of
Sample Proportions (cont.)
Modeling how sample proportions vary from
sample to sample is one of the most powerful
ideas well see in this course.
A sampling distribution model for how a
sample proportion varies from sample to sample
allows us to quantify that variation and how likely
it is that wed observe a sample proportion in any
particular interval.
To use a Normal model, we need to specify its
mean and standard deviation. Well put , the
mean of the Normal, at p.
Copyright 2010 Pearson Education, Inc. Slide 18 - 8
Modeling the Distribution of
Sample Proportions (cont.)
When working with proportions, knowing the
mean automatically gives us the standard
deviation as wellthe standard deviation we will
use is


So, the distribution of the sample proportions is
modeled with a probability model that is
,
pq
N p
n




pq
n
Copyright 2010 Pearson Education, Inc. Slide 18 - 9
Modeling the Distribution of
Sample Proportions (cont.)
A picture of what we just discussed is as follows:

Copyright 2010 Pearson Education, Inc. Slide 18 - 10
Because we have a Normal model, for example,
we know that 95% of Normally distributed values
are within two standard deviations of the mean.
So we should not be surprised if 95% of various
polls gave results that were near the mean but
varied above and below that by no more than two
standard deviations.
This is what we mean by sampling error. Its not
really an error at all, but just variability youd
expect to see from one sample to another. A
better term would be sampling variability.
The Central Limit Theorem for Sample
Proportions (cont)
Copyright 2010 Pearson Education, Inc. Slide 18 - 11
How Good Is the Normal Model?
The Normal model gets better as a good model
for the distribution of sample proportions as the
sample size gets bigger.
Just how big of a sample do we need? This will
soon be revealed
Copyright 2010 Pearson Education, Inc. Slide 18 - 12
Assumptions and Conditions
Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample proportions:
1. The Independence Assumption: The sampled
values must be independent of each other.
2. The Sample Size Assumption: The sample
size, n, must be large enough.
Copyright 2010 Pearson Education, Inc. Slide 18 - 13
Assumptions and Conditions (cont.)
Assumptions are hardoften impossibleto
check. Thats why we assume them.
Still, we need to check whether the assumptions
are reasonable by checking conditions that
provide information about the assumptions.
The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the Randomization
Condition, the 10% Condition and the
Success/Failure Condition.
Copyright 2010 Pearson Education, Inc. Slide 18 - 14
Assumptions and Conditions (cont.)
1. Randomization Condition: The sample should
be a simple random sample of the population.
2. 10% Condition: the sample size, n, must be no
larger than 10% of the population.
3. Success/Failure Condition: The sample size
has to be big enough so that both np (number of
successes) and nq (number of failures) are at
least 10.
So, we need a large enough sample that is not
too large.
Copyright 2010 Pearson Education, Inc. Slide 18 - 15
A Sampling Distribution Model
for a Proportion
A proportion is no longer just a computation from
a set of data.
It is now a random variable quantity that has a
probability distribution.
This distribution is called the sampling
distribution model for proportions.
Even though we depend on sampling distribution
models, we never actually get to see them.
We never actually take repeated samples from
the same population and make a histogram. We
only imagine or simulate them.
Copyright 2010 Pearson Education, Inc. Slide 18 - 16
A Sampling Distribution Model
for a Proportion (cont.)
Still, sampling distribution models are important
because
they act as a bridge from the real world of data
to the imaginary world of the statistic and
enable us to say something about the
population when all we have is data from the
real world.
Copyright 2010 Pearson Education, Inc. Slide 18 - 17
The Sampling Distribution Model
for a Proportion (cont.)
Provided that the sampled values are
independent and the sample size is large
enough, the sampling distribution of is
modeled by a Normal model with
Mean:
Standard deviation:
p
SD(

p)
pq
n
(

p) p
Copyright 2010 Pearson Education, Inc.
Independent Practice
The Center for Disease Control and Prevention report that
22% of 18-year old women in the United States have a
body mass index of 25 or more a value associated with
increased health risk. As part of a routine health check, a
college requires students to be measured and weighed.
This year, the department tries out a self-reporting system
where 200 female students reported their height and
weight. Only 31 of these students had a BMI greater then
25.
Question: Is this proportion of high-BMI students
unusually small?
Slide 18 - 18
Copyright 2010 Pearson Education, Inc.
Independent Practice
Suppose that 13% of the population is left
handed. A 200 seat school auditorium has been
built with 15 lefty seats that have a built in desk
on the left rather than the right arm.
Question: In a class of 90 students what is the
probability that there will be enough seats for
lefty students?
Slide 18 - 19
Copyright 2010 Pearson Education, Inc.
Aim
How do we solve problems involving sampling
distribution models?
Check homework
Read Chapter 18.
Tomorrows HW: Chapter 18, 17-31 ODD

Slide 18 - 20
Copyright 2010 Pearson Education, Inc. Slide 18 - 21
What About Quantitative Data?
Proportions summarize categorical variables.
The Normal sampling distribution model looks like
it will be very useful.
Can we do something similar with quantitative
data?
We can indeed. Even more remarkable, not only
can we use all of the same concepts, but almost
the same model.
Copyright 2010 Pearson Education, Inc. Slide 18 - 22
Simulating the Sampling Distribution of a Mean
Like any statistic computed from a random
sample, a sample mean also has a sampling
distribution.
We can use simulation to get a sense as to what
the sampling distribution of the sample mean
might look like
Copyright 2010 Pearson Education, Inc. Slide 18 - 23
Means The Average of One Die
Lets start with a simulation of 10,000 tosses of a
die. A histogram of the results is:
Copyright 2010 Pearson Education, Inc. Slide 18 - 24
Means Averaging More Dice
Looking at the average of
two dice after a simulation
of 10,000 tosses:
The average of three dice
after a simulation of
10,000 tosses looks like:
Copyright 2010 Pearson Education, Inc. Slide 18 - 25
Means Averaging Still More Dice
The average of 5 dice
after a simulation of
10,000 tosses looks like:
The average of 20 dice
after a simulation of
10,000 tosses looks like:
Copyright 2010 Pearson Education, Inc. Slide 18 - 26
Means What the Simulations Show
As the sample size (number of dice) gets larger,
each sample average is more likely to be closer
to the population mean.
So, we see the shape continuing to tighten
around 3.5
And, it probably does not shock you that the
sampling distribution of a mean becomes Normal.
Copyright 2010 Pearson Education, Inc.
CLT Video
Gather information from your video, and with your
group you will summarize the information learned.
Pay attention to important concepts in the video
and be ready to share!
Slide 18 - 27
Copyright 2010 Pearson Education, Inc. Slide 18 - 28
The Fundamental Theorem of Statistics
The sampling distribution of any mean becomes
more nearly Normal as the sample size grows.
All we need is for the observations to be
independent and collected with randomization.
We dont even care about the shape of the
population distribution!
The Fundamental Theorem of Statistics is called
the Central Limit Theorem (CLT).
Copyright 2010 Pearson Education, Inc. Slide 18 - 29
The Fundamental Theorem of Statistics (cont.)
The CLT is surprising and a bit weird:
Not only does the histogram of the sample
means get closer and closer to the Normal
model as the sample size grows, but this is
true regardless of the shape of the population
distribution.
The CLT works better (and faster) the closer the
population model is to a Normal itself. It also
works better for larger samples.
Copyright 2010 Pearson Education, Inc. Slide 18 - 30
The Fundamental Theorem of Statistics (cont.)
The Central Limit Theorem (CLT)
The mean of a random sample is a random
variable whose sampling distribution can be
approximated by a Normal model. The larger the
sample, the better the approximation will be.
Copyright 2010 Pearson Education, Inc. Slide 18 - 31
Assumptions and Conditions
The CLT requires essentially the same
assumptions we saw for modeling proportions:
Independence Assumption: The sampled
values must be independent of each other.
Sample Size Assumption: The sample size
must be sufficiently large.
Copyright 2010 Pearson Education, Inc. Slide 18 - 32
Assumptions and Conditions (cont.)
We cant check these directly, but we can think about
whether the Independence Assumption is plausible.
We can also check some related conditions:
Randomization Condition: The data values must
be sampled randomly.
10% Condition: When the sample is drawn without
replacement, the sample size, n, should be no
more than 10% of the population.
Large Enough Sample Condition: The CLT doesnt
tell us how large a sample we need. For now, you
need to think about your sample size in the context
of what you know about the population.

Copyright 2010 Pearson Education, Inc. Slide 18 - 33
But Which Normal?
The CLT says that the sampling distribution of
any mean or proportion is approximately Normal.
But which Normal model?
For proportions, the sampling distribution is
centered at the population proportion.
For means, its centered at the population
mean.
But what about the standard deviations?
Copyright 2010 Pearson Education, Inc. Slide 18 - 34
But Which Normal? (cont.)
The Normal model for the sampling distribution of
the mean has a standard deviation equal to


where is the population standard deviation.

SD y ( )

n
Copyright 2010 Pearson Education, Inc. Slide 18 - 35
But Which Normal? (cont.)
The Normal model for the sampling distribution of
the proportion has a standard deviation equal to



SD

p ( )
pq
n

pq
n
Copyright 2010 Pearson Education, Inc.
Independent Practice
A college physical education department asked 200
students for their self-reported height and weight, but the
percentage of students with BMIs over 25 was suspiciously
low. One possibility is that the respondents shaved down
their weight. The CDC reports that the mean weight of an 18
year old woman is 143.74 pounds, with a standard deviation
of 51.43 pounds, but these 200 randomly selected women
had a mean weight od only 140 pounds.
Question: Based on the Central Limit Theorem and the
68-95-99.7 Rule, does the mean weight in this sample
seem exceptionally low, or might this just be random
sample-to-sample variation?
Slide 18 - 36
Copyright 2010 Pearson Education, Inc.
Independent Practice
The Center for Disease Control and Prevention
reports that the mean weight of adult men in the
United States is 190 pounds with a standard
deviation of 59 pounds.
Question: An elevator in the building has a
weight limit of of 10 persons or 2500 pounds.
What is the probability that if 10 men get on
the elevator, they will overload the weight
limit?
Slide 18 - 37
Copyright 2010 Pearson Education, Inc. Slide 18 - 38
About Variation
The standard deviation of the sampling
distribution declines only with the square root of
the sample size (the denominator contains the
square root of n).
Therefore, the variability decreases as the sample
size increases.
While wed always like a larger sample, the
square root limits how much we can make a
sample tell about the population. (This is an
example of the Law of Diminishing Returns.)
Copyright 2010 Pearson Education, Inc. Slide 18 - 39
The Real World and the Model World
Be careful! Now we have two distributions to deal
with.

The first is the real world distribution of the sample,
which we might display with a histogram.
The second is the math world sampling distribution
of the statistic, which we model with a Normal
model based on the Central Limit Theorem.

Dont confuse the two!

Copyright 2010 Pearson Education, Inc. Slide 18 - 40
Sampling Distribution Models
Always remember that the statistic itself is a
random quantity.
We cant know what our statistic will be
because it comes from a random sample.
Fortunately, for the mean and proportion, the CLT
tells us that we can model their sampling
distribution directly with a Normal model.
Copyright 2010 Pearson Education, Inc. Slide 18 - 41
Sampling Distribution Models (cont.)
There are two basic truths about sampling
distributions:
1. Sampling distributions arise because
samples vary. Each random sample will have
different cases and, so, a different value of
the statistic.
2. Although we can always simulate a sampling
distribution, the Central Limit Theorem saves
us the trouble for means and proportions.
Copyright 2010 Pearson Education, Inc. Slide 18 - 42
The Process Going Into the Sampling
Distribution Model
Copyright 2010 Pearson Education, Inc. Slide 18 - 43
What Can Go Wrong?
Dont confuse the sampling distribution with the
distribution of the sample.
When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
The sampling distribution is an imaginary
collection of the values that a statistic might
have taken for all random samplesthe one
you got and the ones you didnt get.
Copyright 2010 Pearson Education, Inc. Slide 18 - 44
What Can Go Wrong? (cont.)
Beware of observations that are not independent.
The CLT depends crucially on the assumption
of independence.
You cant check this with your datayou have
to think about how the data were gathered.
Watch out for small samples from skewed
populations.
The more skewed the distribution, the larger
the sample size we need for the CLT to work.
Copyright 2010 Pearson Education, Inc. Slide 18 - 45
What have we learned?
Sample proportions and means will vary from
sample to samplethats sampling error
(sampling variability).
Sampling variability may be unavoidable, but it is
also predictable!
Copyright 2010 Pearson Education, Inc. Slide 18 - 46
What have we learned? (cont.)
Weve learned to describe the behavior of sample
proportions when our sample is random and large
enough to expect at least 10 successes and
failures.
Weve also learned to describe the behavior of
sample means (thanks to the CLT!) when our
sample is random (and larger if our data come
from a population thats not roughly unimodal and
symmetric).

You might also like