Statistics
Statistics
– Describing Variation
• Frequency Distribution & Histogram
• Numerical Summary of Data
• Probability Distribution
– Important Distributions
– Some Useful Approximations
1
Need for Statistics
How to do it?
2
Populations, Samples and Branches
of Statistics
Population Sample
Inferential
Statistics
3
Graphically Describing Variation
Method 1: Frequency Distribution &
Histogram
An Example:
Forged Piston Rings for Engines
• Variable & Data:
– The inside diameter
(Q.C) of forged piston
rings(mm)
– 125 observations, 25
samples of 5
observations each.
Population
Sample
Observation
5
Frequency Table & Frequency
Histogram
6
Histograms – Useful for large
data sets
7
8
9
Interpretation based on the
Frequency Histogram
Visual Display of Three Properties of Sample Data
• Shape:
– roughly symmetric and unimodal
• The center tendency or location
– the points tend to cluster near 450.
• Scatter or spread range
– From 413 to 487
• Outliers
10
The Box Plot
(or Box-and-Whisker Plot)
11
Comparative Box Plots
12
Method 2: Numerical
Summary of Data
• Definition of Statistic:
– Let x1, …, xn be a random sample of size n from a
population and let T(x1, …, xn) be a real-valued or
vector-valued function whose domain includes the
sample space of (x1, …, xn). Then random variable
or random-vector Y = T(x1, …, xn) is called a
statistic.
• In short: a statistic is a random value (or a random
vector) calculated from a function of a sample of
data.
13
• Central Tendency: sample average/mean
n
x i
x i 1
n
• Scatter/variability: sample variance or sample standard deviation
n n
( xi x ) 2
i
( x x ) 2
ˆ 2 S 2 i 1
; ˆ S i 1
;
n 1 n 1
• Median: A value such that at least 50% of the data values are at or
below this value and at least 50% of the data values are at or above
this value.
14
Example 2 - 1
Calculate the sample mean, median, variance, and standard
deviation of a sample of observations: x1=1, x2=3, x3=5.
15
Method 3: Probability Distribution
f(x)
f ( x )dx 1 p(xi)
p( x
i 1
i )1
p(x4)
p(x3) p(x5)
p(x2)
p(x6)
p(x1) p(x7)
x x
a b x1 x2 x3 x4 x5 x6 x7
16
Review of Probability Distribution
Calculation
Continuous Distribution Discrete Distribution
Probability
P { a x b ) f ( x )dx
b
P ( xi ) p( xi )
a
Distribution mean
xi p( xi )
xf ( x)dx
i 1
Distribution variance V ( x ) 2 ( xi )2 p( xi )
V ( x ) ( x ) f ( x )dx
2 2
i 1
n
Sample mean x i
x i 1
n
Sample variance n
( x i x )2
ˆ 2 S 2 i 1
n1
Probability Density (Mass) Function
• A function f (x) (or p(xi)) is a p.d.f (or p.m.f) of a
random variable x if and only if:
– f ( x) 0 for all x R or p( xi ) 0 for all possible values
– f ( x )dx 1
or p( xi ) 1
i
19
Hypergeometric Distribution
• Suppose that there is a FINITE population consisting of N items. Some
number , say D (DN), of these items fall into a class of interest. A
random sample of n items is selected from the population without
replacement, and the number of items in the sample that fall into the
class of interest, say x, is observed.
D
N
Items of
Interest Total # of items
x n
22
Example: A lot of size N = 30 contains five
nonconforming units. What is the probability that a
sample of five units selected at random contains exactly
one nonconforming units? What is the probability that it
contains one or more nonconformances?
23
Binomial Distribution
• Bernoulli trial: is an experiment with two and ONLY two
possible outcomes, either a “success” (1) or a “failure” (0)
1 with probability of p
Y 0 p 1
0 with probability of 1 - p
n
p(x) = x px (1 – p)n – x x = 0,1,2,...,n 0 p 1
x
p̂ -> Random number
n
p(1 p)
p̂ p 2p̂
n
• the probability distribution of p̂ is obtained from the binomial
x [ na ]
n x
P{p̂ a} P{ a} P{x na } p (1 p) n x
n x 0 x
27
Example: Sixty percent of pulleys are produced using
Lathe #1, 40% are produced using Lathe #2. What is
the probability that exactly three out of a random
sample of four production parts will come from Lathe #1
?
28
Example: A production process operates with 2% nonconforming
output. Every hour a sample of 50 units of product is taken, and the
number of nonconforming units counted. If one or more
nonconforming units are found, the process is stopped and the
quality control technician must search for the cause of
nonconforming production. Evaluate this decision rule.
29
Example: A firm claims that 99% of their products meet
specifications. To support this claim, an inspector draws a random
sample of 20 items and ships the lot if the entire sample is in
conformance. Find the probability of committing both of the
following errors:
(1) Refusing to ship a lot even though 99% of the items are in
conformance.
(2) Shipping a lot even though only 95% of the items are
conforming.
30
Example: A random sample of 100 units is drawn from a production
process every half hour. The fraction of nonconforming product
manufactured is 0.03. What is the probability that p̂ 0.04 if the
fraction nonconforming is actually 0.03?
31
Poisson Distribution
Poisson Distribution: the number of random events occur during a
specific “time” period with the average occurrence rate known:
e x
p ( x) , x 0,1,...
x!
, 2
Examples:
• A. number of random occurrence per unit of time: number of arrivals to
McDonald ’s drive-through window from 12:00~1:00pm
• B: number of “defect” per unit of area: number of typographical errors on a
page
• C: number of “defect” per unit: number of dents on a car
Assumptions:
• The average occurrence rate (per unit) is a known as a constant
• Occurrences are equally likely to occur within any unit of time/area
• Occurrences are statistically independent
34
Exercises of Discrete Distributions (1)
36
Normal Distribution
f(x)
1 2 2
f(x) = e–(x–) /2
22
and – x 2
E(x) = V(x) = 2
x
x ~ N(, 2 ) ; z ~ N(0,1)
Pr(x+)=68.26%
a a Pr(2x+2)=95.46%
Pr{x a} Pr{z } ( ) Pr(3x+3)=99.73%
If x1, x2 are independently normally distributed variables, then y=x1+x2
also follows the normal distribution, i.e. y~N(1+2,12+ 22)
The Center Limit Theorem: if x1, x2, …, xn are independent random variables, with
mean i and variance i2, and if y=x1+x2+…+xn, then the distribution of z
approaches the N(0,1) distribution as n approaches infinite. n n
z ( y i ) / i
2
i 1 i 1
Excel Function: NORMDIST(x,,,true)
37
38
Example 3-3
x ~ N (40,52 )
42.1 40
p( x 42.1) 1 p( x 42.1) 1 1 0.42
5
39
Example 3-6: Three shafts are made and assembled in a linkage. The length of
each shaft, in centimeters, is distributed as follows:
40
41
Chi–Squared Distribution (with
degrees of freedom )
1
( n / 2 ) 1 y 2( = (2 – 1) (2 – 2)... 3 • 2 • 1 for even
f ( y) n / 2 y e /2
2)
2 (n / 2) 5 3
= (2 – 1) (2 – 2)... 2 • 2 • 2 for odd
y>0
E(x) = V(x) = 2
y x12 x 22 x 2n
Y follows n2
If x1, x2, …, xn are normally
and independently distributed random variables
The most popular use of this distribution is for testing hypotheses
about variances of samples from normal distributions.
42
Student t Distribution (with degrees
of freedom )
+1 –(+1)
x
2
1 2 2
f(x) = 1+
2
E(x) = 0 V(x) = =0
–2 1
6
2 = 3 + for n > 4
n–4
(2 ) = (2 – 1) (2 – 2)... 3 • 2 • 1 for even
5 3
= ( – 1) ( – 2)... • • for odd
2 2 2 2 2
u u
u/2
2 x (u / 2)1
f ( x) ,0 x
u u
(u v ) / 2
2 2 2 x 1
If w and y are two independent chi-square random
variables with u and v degrees of freedom, respectively,
then the ratio w/u
Fu ,
y /
is distributed as F with u numerator degrees of
freedom and v denominator degrees of freedom.
Used for testing hypotheses about two population
variances.
44
45
Useful Results on Mean and Variance
If x is a random variable and a is a constant, then
E(a+x)=a+E(x)
E(a*x)=aE(x)
V(a+x)=V(x)
V(a*x)=a2V(x)
If x1, x2, …, xn are random variables,
E(x1+…+xn)=E(x1)+…+E(xn)
If they are mutually independent, and a1,…,an are constants
V(a1x1+…+ anxn)=a12V(x1)+…+an2V(xn)
46
INTERRELATIONSHIPS BETWEEN
DISTRIBUTIONS
Hypergeometric, Binomial, Poisson, Normal
Sampling without Hypergeometric
N: population size
replacement finite population
n:sample size
in finite population if n/N0.1
p=D/N, n
The sum of a sequence of Binomial
n Bernoulli trials in if large n, small p <0.1, or
infinite population with large n, large p > 0.9, p’=1-p
probability of success p
If np>10 and 0.1 ≤ p ≤ 0.9
=np, 2=np(1-p)
Number of defects
per unit Poisson
a 0.5 np
if 15
Pr( x a) a 0.5 np
np(1 p) np(1 p)
= , 2=
b 0.5 np
Pr( a x b) a 0.5 np
np(1 p ) np(1 p)
Normal
p p
Pr( pˆ )
p (1 p) / n p(1 p) / n
47
Example: An electronic component for a laser range-finder is produced in lots of size N =
25. An acceptance testing procedure is used by the purchaser to protect against lots that
contain too many nonconforming components. The procedure consists of selecting five
components at random from the lot (without replacement) and testing them. If none of the
components is nonconforming, the lot is accepted.
a. If the lot contains three nonconforming components, what is the probability of lot
acceptance?
b. Calculate the desired probability in (a) using the binomial approximation. Is this
approximation satisfactory'? Why or why not?
c. Suppose the lot size was N=150. Would the binomial approximation be satisfactory in
this case?
d. Suppose that the purchaser will reject the lot with the decision rule of finding one or
more nonconforming components in a sample of size n, and wants the lot to be rejected
with probability at least 0.95 if the lot contains five or more nonconforming components.
How large should the sample size n be?
48
49
Example: A textbook has 500 pages on which typographical errors could
occur. Suppose that there are exactly 10 such errors randomly located on
those pages. Find the probability that a random selection of 50 pages will
contain no errors. Find the probability that 50 randomly selected pages will
contain at least two errors.
50
Example: A sample of 100 units is selected from a production process that is
2% nonconforming. What is the probability that p̂ will exceed the true
fraction nonconforming by k standard deviations, where k = 1, 2, and 3?
51