Statistics Modules 8-11statistics Modules 8-11statistics Modules 8-11statistics Modules 8-11statistics Modules 8-11
Statistics Modules 8-11statistics Modules 8-11statistics Modules 8-11statistics Modules 8-11statistics Modules 8-11
Normal Probability
Distributions
October
7: Normal Probability Distributions 1
21
Normal Probability Density Function
• Continuous random
variables are described
using probability
density function
(pdfs) curves
• Normal pdfs are
characterized by their
typical bell-shape
f ( x) = e
• Bottom Figure: shaded 2
95%
… we can easily determine
the AUC in tails
7: Normal Probability Distributions 9
Example: Male Height
• Male height: Normal with μ = 70.0˝ and σ = 2.8˝
• 68% within μ ± σ = 70.0 2.8 = 67.2 to 72.8
• 32% in tails (below 67.2˝ and above 72.8˝)
• 16% below 67.2˝ and 16% above 72.8˝ (symmetry)
1
7: Normal Probability Distributions
0
Characteristics of the Standard
Normal Distribution
• Mean = 0
• Standard deviation = 1
• Total Area under the curve =1=Probability
• X ~ N(0, 1)
1
MCC 202, SPUP, 1st Trimester 2021-2022
1
Table A.0
1
MCC 202, SPUP, 1st Trimester 2021-2022
2
Table A.1. Used to determine the area under the curve for Z≤ z.
You are actually taking the cumulative probability up to the z-score
1
MCC 202, SPUP, 1st Trimester 2021-2022
3
Determining Normal Probabilities
1
7: Normal Probability Distributions
4
Step 1: State the Problem
• What percentage of gestations are less
than 40 weeks?
• Let X ≡ gestational length
• We know from prior research:
X ~ N(39, 2) weeks
• Pr(X ≤ 40) = ?
1
7: Normal Probability Distributions
5
Step 2: Standardize
• Standard Normal
variable ≡ “Z” ≡ a
Normal random variable
with μ = 0 and σ = 1,
• Z ~ N(0,1)
• Use Table to look up
cumulative
probabilities for Z
1
7: Normal Probability Distributions
6
Example: A Z
variable of 1.96 has
cumulative
probability 0.9750.
1
7: Normal Probability Distributions
7
Step 2 (cont.)
x−
Turn value into z score: z=
z-score = no. of σ-units above (positive z) or
below (negative z) distribution mean μ
2
Steps 3 & 4: Sketch & Table
3. Sketch
4. Use Table A1 to lookup Pr(Z ≤ 0.5) = 0.6915
1
7: Normal Probability Distributions
9
Probabilities Between Points
a represents a lower boundary
b represents an upper boundary
Pr(a ≤ Z ≤ b) = Pr(Z ≤ b) - Pr(Z ≤ a)
2
7: Normal Probability Distributions
0
Between Two Points
Pr(-2 ≤ Z ≤ 0.5) = Pr(Z ≤ 0.5) − Pr(Z ≤ -2)
.6687 = .6915 − .0228
.0228
.6687 .6915
-2 0.5 0.5 -2
2
7: Normal Probability Distributions
1
Values Corresponding to Normal
Probabilities
1. State the problem
2. Find Z-score corresponding to percentile (Table B)
3. Sketch
4. Unstandardize:
x = + z p
2
7: Normal Probability Distributions
2
z percentiles
2
7: Normal Probability Distributions
3
e.g., What is the 97.5th
percentile on the
Standard Normal curve?
z.975 = 1.96
Notation: Let zp
represents the z score
with cumulative
probability p,
e.g., z.975 = 1.96
2
7: Normal Probability Distributions
4
Step 1: State Problem
Question: What gestational length is smaller than 97.5% of
gestations?
• Let X represent gestations length
• We know from prior research that
X ~ N(39, 2)
–
1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233
2
7: Normal Probability Distributions
6
Unstandardize and sketch
x = + z p = 39 + (−1.96)(2) 35
2
Negative skew shows upward curve on Q-Q plot
7: Normal Probability Distributions
9
Positive Skew
3
7: Normal Probability Distributions
1
The log transform Normalize the skew
Leptokurtotic
3
Leptokurtotic distribution show S-shape on Q-Q
7: Normal Probability Distributions
2
plot
Measures of Skewness and
Kurtosis
3
MCC 202, SPUP, 1st Trimester 2021-2022
3
Skewness
Figure 1. Sketches showing general position of mean, median, and mode in a population.
3
MCC 202, SPUP, 1st Trimester 2021-2022
4
- Fisher-Pearson’s Coefficient of Skewness
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 3 /𝑛
𝑔1 =
𝑠3
where: s has divisor n
3
MCC 202, SPUP, 1st Trimester 2021-2022
5
Fisher-Pearson’s Coefficient of
Skewness
ത
𝑋−𝑀𝑒𝑑𝑖𝑎𝑛
• 𝑆𝑘2 = 3 𝑠
3
MCC 202, SPUP, 1st Trimester 2021-2022
7
Figure 6. 90% expected range for Pearson 2 skewness coefficient Sk2.
3
MCC 202, SPUP, 1st Trimester 2021-2022
8
Galton Skewness Formula
𝑄1 +𝑄3 −2𝑄2
• Galton skewness= 𝑄3 −𝑄1
• where 𝑄1 is the lower quartile, 𝑄3 is the upper
quartile, and 𝑄2 is the median.
3
MCC 202, SPUP, 1st Trimester 2021-2022
9
Measures of Kurtosis
4
MCC 202, SPUP, 1st Trimester 2021-2022
0
High vs Low Kurtosis
4
MCC 202, SPUP, 1st Trimester 2021-2022
1
Types of Kurtosis
4
MCC 202, SPUP, 1st Trimester 2021-2022
2
Types of Kurtosis
4
MCC 202, SPUP, 1st Trimester 2021-2022
5
Modules 9-11
4
MCC 202, SPUP, 1st Trimester 2021-2022
6
Parametric Test vs The
Non-Parametric Test
Assumptions Yes, assumptions are made No, assumptions are not made
The mean value is the central The median value is the central
Value for central tendency
tendency tendency
Used for Used for finding interval data Used for finding nominal data
4
MCC 202, SPUP, 1st Trimester 2021-2022
7
However…
4
MCC 202, SPUP, 1st Trimester 2021-2022
9
Tests of Hypothesis
Hypothesis
•A statement or tentative theory which aims to
explain facts about the real world
• An educated guess
•It is subject for testing. If it is found to be
statistically true, it is accepted. Otherwise, it gets
rejected.
5
MCC 202, SPUP, 1st Trimester 2021-2022
0
Kinds of Hypotheses
1. Null Hypothesis (Ho)
• It serves as the working hypothesis
• It is that which one hopes to accept or reject
•It must always express the idea of no
significant difference or relationship
5
MCC 202, SPUP, 1st Trimester 2021-2022
2
Type I and Type II Errors
5
MCC 202, SPUP, 1st Trimester 2021-2022
3
5
MCC 202, SPUP, 1st Trimester 2021-2022
4
Level of Significance
5
MCC 202, SPUP, 1st Trimester 2021-2022
5
Critical Region
The critical region (or rejection region) is the set of all values
of the test statistic that cause us to reject the null hypothesis.
Region of
rejection
Region of
acceptance
5
MCC 202, SPUP, 1st Trimester 2021-2022
6
Critical Value
5
MCC 202, SPUP, 1st Trimester 2021-2022
7
P - Value
The P-value (probability value) is the probability of
getting a value of the test statistic that is at least as
extreme as the one representing the sample data,
assuming that the null hypothesis is true. The null
hypothesis is rejected if the P-value is very small,
such as 0.05 or less.
5
MCC 202, SPUP, 1st Trimester 2021-2022
8
Two-tailed, Right-tailed and
Left-tailed Tests
5
MCC 202, SPUP, 1st Trimester 2021-2022
9
Two-tailed Tests
Given:
H0: = ; H1: ≠
6
MCC 202, SPUP, 1st Trimester 2021-2022
0
Right – tailed Tests
Given:
H0: = ; H1: >
6
MCC 202, SPUP, 1st Trimester 2021-2022
1
Left – tailed Tests
Given:
H0: = ; H1: <
6
MCC 202, SPUP, 1st Trimester 2021-2022
2
Steps in Hypothesis
Testing
1. Formulate the null hypothesis (Ho) that there is no
significant difference between the items compared. State
the alternative hypothesis (Ha) which is used in case Ho
is rejected.
6
MCC 202, SPUP, 1st Trimester 2021-2022
5
Steps in Hypothesis
Testing
5. Compute for z or t as needed. Vary your solutions using
the formulas:
➢ For z – test
i. Sample mean compared with a population mean
ii. Comparing two sample means
iii. Comparing two sample proportions
➢ For t – test
i. Sample mean compared with a population mean
ii. Comparing two sample means
6
MCC 202, SPUP, 1st Trimester 2021-2022
6
Steps in Hypothesis Testing
6. Compare the computed value with its
corresponding tabular value, then state your
conclusions based on the following guidelines:
6
MCC 202, SPUP, 1st Trimester 2021-2022
7
Decision Criterion
Traditional Method:
***Reject H0 (Accept H1 ) if the test
statistic falls within the critical region.
***Fail to reject H0 (Accept Ho) if the
test statistic does not fall within the critical
region.
6
MCC 202, SPUP, 1st Trimester 2021-2022
8
Decision Criterion
P - value method:
6
MCC 202, SPUP, 1st Trimester 2021-2022
9
Decision Criterion
Another option:
Instead of using a significance level
such as 0.05, simply identify the P-value and
leave the decision to the reader.
7
MCC 202, SPUP, 1st Trimester 2021-2022
0
Z - TEST
1. Sample Mean (X) Compared with a Population Mean (μ)
ത
𝑋−𝜇 𝑛
Z= 𝜎
Where:
X – sample mean
μ – population mean
n – number of items in the sample
𝑥1 −𝑥ത2
Z= 1 1
𝜎 +
Where: 𝑛1 𝑛1
ത
𝑥−𝜇 𝑛−1
t= 𝑠
Where:
𝑥ҧ –sample mean
μ – population mean
n – number of items in the sample
7
MCC 202, SPUP, 1st Trimester 2021-2022
4
T- TEST
5. Comparing Two Sample Means 𝑥1 − 𝑥2
𝑥1 − 𝑥2
t=
1 1 𝑛1 −1 𝑠2 2
1 + 𝑛2 − 1 𝑠2
+
𝑛1 𝑛2 𝑛1 + 𝑛2 −2
Where:
𝑥1– mean of the first sample
𝑥2 – mean of the second sample
σ 𝑑−𝑑ത 2
• 𝑠2 = 𝑛−1
• 𝑑 = 𝑉𝑎𝑓𝑡𝑒𝑟 − 𝑉𝑏𝑒𝑓𝑜𝑟𝑒
• 𝑛 − 𝑛𝑜. 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠
7
MCC 202, SPUP, 1st Trimester 2021-2022
6
Example 1
7
MCC 202, SPUP, 1st Trimester 2021-2022
7
Example 2
8
MCC 202, SPUP, 1st Trimester 2021-2022
1
Example 6
It is known from the records of the city
schools that the standard deviation of math test
scores on ABC test is 5. A sample of 200 students
from the system was taken and it was found out that
the sample mean is 75. Previous tests showed the
population mean to be 70. Is it safe to conclude that
the sample is significantly different from the
population at 0.01 level?
8
MCC 202, SPUP, 1st Trimester 2021-2022
2
Example 7
• Two types of rice varieties are being considered for
yield and a comparison is needed. Thirty hectares were
planted with the rice varieties exposed to fairly uniform
conditions. The results are tabulated below:
Variety A Variety B
Average yield 80 sack/hec 85 sack/hec
Sample Variance 5.90 12.10
Is there significant difference in the yield of the two
varieties at 0.05 level of significance?
8
MCC 202, SPUP, 1st Trimester 2021-2022
3
Example 8
A manufacturer of flashlight batteries claims
that the average life of his product will exceed 40
hours. A company is willing to buy a very large
shipment of batteries provided the claim is true. A
random sample of 36 batteries is tested, and it was
found out that the sample mean is 45 hours. If the
population of batteries has a standard deviation of 5
hours, is it likely that the batteries will be bought?
8
MCC 202, SPUP, 1st Trimester 2021-2022
4
Example 9
A company is trying to decide which brand of two types
to buy for their trucks. They would like to adopt Brand c unless
there is some evidence that Brand D is better. An experiment was
conducted where 16 from each brand were used. The tires were
run under uniform conditions until they wore out. The results
are:
Brand C: X1 = 40,000 km s1 = 5,400 km
Brand D: X2 = 38,000 km s2 = 3,200 km
8
MCC 202, SPUP, 1st Trimester 2021-2022
8
One-Way Analysis of Variance
Steps:
1. Compute for the sum of squares
( x ) 2
TSS = x2 −
N
σ 𝑥1 σ 𝑥2 σ 𝑥𝑟 σ𝑥 2
𝑆𝑆𝐵 = + +…+ −
𝑛1 𝑛2 𝑛𝑟 𝑁
8
MCC 202, SPUP, 1st Trimester 2021-2022
9
One-Way Analysis of Variance
𝑑𝑓𝑡 == 𝑁 – 1
𝑑𝑓𝐵 = 𝑣1 = 𝑘 − 1
𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑣 2 = 𝑑𝑓𝑡 − 𝑑𝑓𝐵
= 𝑁−𝑘
9
MCC 202, SPUP, 1st Trimester 2021-2022
0
One-Way Analysis of Variance
3. Compute for the mean sum of squares
𝑆𝑆𝐵
𝑀𝑆𝑆𝐵 =
𝑑𝑓𝐵
𝑆𝑆𝑊
𝑀𝑆𝑆𝑊 =
𝑑𝑓𝑊
4. Compute for the F – Ratio
MSSB
F=
MSSW 9
MCC 202, SPUP, 1st Trimester 2021-2022
1
Contingency Table for ANOVA
Sources of Sum of Degree of Mean Sum F – Ratio
Variation Squares Freedom of Squares
(df)
Between SSB 𝑑𝑓𝐵 = 𝑘 − 1 MSSB
Column
9
MCC 202, SPUP, 1st Trimester 2021-2022
2
Exercise
1. The weights in kilograms of three groups of 5 members
each are shown in the table below. Is there unusual
variation among the groups? ( use 𝛼 = 0.05)
Group
Members A B C
1 50 60 53
2 48 40 55
3 55 50 40
4 50 60 40
5 46 52 47
9
MCC 202, SPUP, 1st Trimester 2021-2022
3
Exercise
2. The following are the mileage obtained after several road tests were
run using 5 different kinds of gasoline on a Toyota Car.
Members Group
A B C D
1 98 100 87 90
2 78 95 92 93
3 95 90 105 95
4 110 85 88 97
9
MCC 202, SPUP, 1st Trimester 2021-2022
5
Chi – Square Test (𝜒2)
- Used to test significant difference or relationship
- Used if data are in frequencies (enumeration data)
USES:
1.to test the goodness of fit of a normal curve; that is to
find out whether or not a sample distribution conforms
with the hypothetical normal distribution
2. to find out whether or not an observed proportion is
equal to some given ideal or expected proportion
3.to test the independence of one variable from another
variable.
9
MCC 202, SPUP, 1st Trimester 2021-2022
6
Formulas:
i. For a 2 x 2 table (with YATE’s correction for continuity)
(OF − EF − 0.5)2
X2 =
EF
(OF − EF)2
X2 = EF
𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) 𝑂𝐹 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝐸𝐹 = 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 9
MCC 202, SPUP, 1st Trimester 2021-2022
7
Expected Frequency:
C D Total
A 𝑇𝐴 𝑇𝐶 𝑇𝐴𝑇𝐷 𝑇𝐴
= =
𝑇𝑜𝑡𝑎𝑙 𝑇𝑜𝑡𝑎𝑙
B 𝑇𝐵 𝑇𝐶 𝑇𝐵 𝑇𝐷 𝑇𝐵
= =
𝑇𝑜𝑡𝑎𝑙 𝑇𝑜𝑡𝑎𝑙
Total 𝑇𝐶 𝑇𝐷 Total
9
MCC 202, SPUP, 1st Trimester 2021-2022
8
Exercise
1. Test the hypothesis that educational attainment does not
depend on socio – economic status for the following 100
persons in a particular community.
Middle Class 28 25
Rich 14 5
9
MCC 202, SPUP, 1st Trimester 2021-2022
9
Exercise
2. At 1% significance level, does college academic grade
depend on the high school NSAT results for the following
200 students?
NSAT Rating
Academic
Grade Low Average High
Above 85 13 25 21
75 – 85 18 31 38
Below 75 14 20 20
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
0
Exercise
3. At ABC Company, there are 28 males and 32
females. Out of the 28 males, 10 holds executive
posts and the others do clerical work. Of the 32
females, only 5 hold executive position and the
others do clerical work. Prepare a contingency
table, then test the hypothesis that position is
independent on sex.
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
1
Exercise
• 4. To determine whether type of personality is related
to academic performance, a random sample of 180
high school students from a certain college were taken
and the data are as follows:
Low Average Average High Average
Introvert 35 30 25
Extrovert 31 23 36
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
3
Regression Analysis
- concerned with the problem of estimation and
forecasting
Where:
𝑦 = 𝑎 + 𝑏𝑥
𝑦 → predicted score
𝑎 → y – intercept
𝑏 → slope of the line
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
4
Regression Analysis
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑏=
𝑛 σ 𝑥2 − σ 𝑥 2
𝑎 = 𝑦ത − 𝑏𝑥ҧ
Where:
n( xy ) − ( x)( y)
r=
[n( x 2 ) − ( x) 2 ][n( y 2 ) − ( y) 2
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
6
Range of Values: r = [-1, 1]
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
7
Interpretation:
Pearson r Qualitative Description
±1 Perfect Correlation
0 – ± 0.20 Negligible
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
8
Testing the Significance of r
𝑛−2 2
𝑡=𝑟
1 − 𝑟2
1
MCC 202, SPUP, 1st Trimester 2021-2022 0
9
Exercise
1. It is generally known that the number of road accidents is inversely
proportional with road width. The following data shows the result of
a study indicating the number of accidents occurring per hundred
thousand vehicles.
Algebra (x) 75 80 93 65 87 71
Statistics (y) 82 78 86 72 91 80
1
MCC 202, SPUP, 1st Trimester 2021-2022 1
2
The End!!!
1
MCC 202, SPUP, 1st Trimester 2021-2022 1
3