LESSON 7: Non-Parametric Statistics: Tests of Association & Test of Homogeneity
LESSON 7: Non-Parametric Statistics: Tests of Association & Test of Homogeneity
PRESENTATION E-MODULE
LESSON 7: Non-parametric statistics: Tests of Association &
Test of Homogeneity
Learning Outcomes
After the completion of this lesson, you will be able to explore the commonly used
nonparametric test, specifically to:
Introduction
Most of the statistical inference procedures we have discussed up to this point are classified
as parametric statistics. One exception is our use of chi-square—as a test of goodness-of-fit
and as a test of independence. These uses of chi-square come under the heading of
nonparametric statistics. The obvious question now is, “What is the difference?” To answer,
let us recall the nature of the inferential procedures that we have categorized as parametric.
In each case, our interest was focused on estimating or testing a hypothesis about one or
more population parameters. Furthermore, central to these procedures was a knowledge of
the functional form of the population from which were drawn the samples providing the basis
for the inference.
Strictly speaking, only those procedures that test hypotheses that are not statements about
population parameters are classified as nonparametric, while those that make no
assumption about the sampled population are called distribution-free procedures. Despite
this distinction, it is customary to use the terms nonparametric and distribution free
interchangeably and to discuss the various procedures of both types under the heading
nonparametric statistics.
Lecture Notes
Parametric Statistics
o It is an approach which assumes random sample from a normal distribution and
involves hypothesis testing about the population parameter.
o The variables being analyzed must be measured in at least the interval scale so that
the four arithmetic operations can be applied without problems.
o Assumptions:
The samples are form normally distributed populations.
The samples are from populations with equal variances (homoscedasticity).
The observations are sampled randomly from clearly defined populations.
Nonparametric Statistics
o These are procedures that test hypotheses that are not statements about
population parameters.
o These are distribution-free procedures: those that make no assumption about the
sampled population.
o It is generally used in testing at least ordinal scale of measurement.
o Note: The terms nonparametric and distribution-free are used interchangeably to
discuss the various procedures of both types under the heading nonparametric
statistics.
o Advantages:
They allow for the testing of hypotheses that are not statements about
population parameter values. (Chi-square tests of independence)
These tests may be used when the form of the sampled population is
unknown.
These procedures may be applied when the data being analyzed consist
merely of ranking or classifications. (The data may not be based on a
measurement scale strong enough to allow the arithmetic operations
necessary for carrying out parametric procedures.)
Methods are available for nominal and ordinal.
May be the only type of test applicable when sample size is very small.
They make fewer assumption about the data.
They are generally easier to apply and the interpretation of results are more
direct than the parametric tests.
o Disadvantages:
The use of nonparametric procedures with data that can be handled with a
parametric procedure results in a waste of data.
The application of some of the nonparametric tests may be laborious for
large samples.
I. Chi-square Test
It is used when variables of interest are qualitative variables with mutually exclusive and
collectively exhaustive categories.
The qualitative data used are the frequencies (observed and expected frequencies)
associated with each category of the variables under study.
It compares the observed frequency of elements falling in different categories with the
expected frequency if the null hypothesis were true.
It is only applicable when the data in the contingency table has sufficiently large expected
frequencies.
o For a 2 x 2 table, the requirement is for all expected frequencies to be greater than
or equal to 5.
o If this requirement is not met, Fisher’s Exact Probability Test should be used.
It is used to find whether two or more populations have the same proportions for
the different categories of another variable.
When there are only two populations and the variable of interest has only two
categories, the 2 Test of Homogeneity may be used interchangeably with the z-test
for two proportions.
Test Statistic
Problem 1
Kodama et al. (1991) studied the relationship between age and several prognostic
factors in squamous cell carcinoma of the cervix. The following data were collected.
May we conclude that the populations presented by the four age-group samples are
not homogenous with respect to cell type? Use = 0.05
Null and Ho: The four age-group populations are homogenous with
Ho Ha
There is no association between the first There is an association between the first
and second variable. and second variable.
The first variable is not associated with The first variable and second variable are
the second variable. associated.
The first and second variable are The first and second variable are not
independent. independent.
Data Structure
o two variables with at least two categories each
o categories of one variable as rows and the other one as columns
Test Statistic
Problem 1
Null and Ho: There is no association between smoking status and type of
alternative school. / Smoking status is not associated with the type of
hypothesis school.
Ha: There is an association between smoking status and type of
school. / Smoking status and type of school are associated.
Level of = 0.01
significance
Test statistic ❑2=∑ ¿ ¿ Chi-square Test of Association
Critical region Note: Not applicable for computer output.
Computations
Statistical Since the p-value (<0.00001) is less than (0.01), reject Ho.
decision
Conclusion There is an association between smoking status and type of
school. / Smoking status and type of school are associated.
A test that may be used when the sample size requirements of the chi-square test are not
met.
The data are arranged in the form of a 2 x 2 contingency table.
Assumptions:
Sample Problem 1
Data on the preference of 11 female and 8 male office workers are shown below. Is there a
difference between female and male office workers with respect to jogging as the preferred
exercise? Let α = 0.05.
. tabi 4 7 \ 6 2, exact
col
row 1 2 Total
1 4 7 11
2 6 2 8
Total 10 9 19
This test got its name from the fact that plus and minuses, rather than numerical values,
provide the raw data used in the calculations.
When to use: to test hypothesis about a population median, to know whether a sample
measurement falls above or below the hypothesized median.
Data: consist of a single random sample X1, X2, …Xn of size n from a population with unknown
median, M.
Assumption: The distribution of the variable of interest is continuous, measured on at least
an ordinal scale. This rules out the use of nominal data.
Test statistic
o It is either the observed number of plus signs or the observed number of minus
signs.
o The nature of Ha determines which of these test statistics is appropriate.
o In a given test, any one of the following Ha is possible:
Ha : P(+) > (-) one sided alternative
Ha : P(+) < (-) one sided alternative
Ha : P(+) ≠ 1 (-) two-sided alternative
Problem 1
A random sample of 15 student nurses was given a test to measure their level of
authoritarianism with the following results. The investigators believed that the scores
achieved the level of an ordinal scale. Assume that the measurements are taken on a
continuous variable. Test at the 0.05 level of significance the H 0 that the median score of the
sampled population is 100.
Sign test
positive 6 7.5
negative 9 7.5
zero 0 0
all 15 15
One-sided tests:
Ho: median of amscore - 100 = 0 vs.
Ha: median of amscore - 100 > 0
Pr(#positive >= 6) =
Binomial(n = 15, x >= 6, p = 0.5) = 0.8491
Two-sided test:
Ho: median of amscore - 100 = 0 vs.
Ha: median of amscore - 100 != 0
Pr(#positive >= 9 or #negative >= 9) =
min(1, 2*Binomial(n = 15, x >= 9, p = 0.5)) = 0.6072
.
5. Statistical Since the p-value (0.6072) is greater than α (0.05), do not reject H 0.
Decision
6. Conclusion There is no sufficient evidence to say that the median authoritarianism
score for the sampled population is not 100.
When to use
o for analysis when data are measured on at least an interval scale
o makes use of the magnitude of the differences between measurements and a
hypothesized location parameter rather than just the signs of the differences
Assumptions
o The sample is random.
o The variable is continuous.
o The population is symmetrically distributed about its mean.
o The measurement scale is at least interval.
Test statistic
o It is either T+ (sum of ranks with positive signs) or T- (sum of ranks with negative
signs), depending on the nature of Ha.
o When Ha is two-sided (µ ≠ µ0), either a sufficiently small value of T + or a sufficiently
small value of T- will cause us to reject H0: µ = µ0.
Problem 1
. signrank cardiaco=5.05
positive 10 86 60
negative 5 34 60
zero 0 0 0
When to use
o It may be used with paired data under circumstances in which it is not
appropriate to use the paired-comparisons test.
o In such cases obtain each of the n di values, the difference between each of
the n pairs of measurements.
If we let µD (the mean of a population of such difference), we may follow the
procedure described above to test any of the following H0:
o H0 : µD = 0
o H0 : µD ≥ 0
o H0 : µD ≤ 0
Problem 1
In a study by Davis et al. (1988), maternal language directed toward children with mental
retardation and children matched either for language ability or chronological age was
compared in free-play and instruction situations. Results were consistent with the hypothesis
that mothers of children with retardation match their verbal behavior to their children’s
language ability. Among the data collected were the measurements on the number of
utterances per minute during free-play by mothers with retardation (A) and mothers of age-
matched children who were not mentally retarded (B).
The number of utterances per minute during free-play by mothers with mentally retarded
children (A) and mothers of age-matched children who were not mentally retarded (B) are
shown below. Can we conclude on the basis of these data that there is a difference in the
median number of utterances per minute during free-play of the two groups of mothers? Let
α = 0.01.
positive 10 55 27.5
negative 0 0 27.5
zero 0 0 0
all 10 55 55
.
5. Statistical Since the p-value (0.0051) is greater than α (0.01), reject H0.
Decision
6. Conclusion There is a difference in the median number of utterances per
minute during free-play by mothers with mentally retarded
children and mothers with age-matched children who are not
mentally retarded. Mothers of children with mental retardation
had higher median number of utterances per minute.
When to use: may be used to test the H0 that two independent samples have been
drawn from populations with equal medians
Assumptions
o The samples are selected independently and at random from their respective
populations.
o The populations are of the same form, differing only in location.
o The variable of interest is continuous.
The level of measurement must be at least ordinal.
The two samples do not have to be of equal size.
Test statistic
o It is X2 as computed for a 2 x 2 contingency table where a, b, c and d are the
observed cell frequencies.
Median test
Greater
than the hosp
median 1 2 Total
no 2 13 15
yes 13 2 15
Total 15 15 30
Continuity corrected:
Pearson chi2(1) = 13.3333 Pr = 0.000
.
5. Statistical Since the p-value (<0.0001) is less than α (0.05), reject H0.
Decision
6. Conclusion The population median scores on the level of care of the two
hospitals are the not equal / different.
When to use
o It is used for testing the H 0 of equal population location parameters for two
independent samples.
o It is based on the ranks of observations.
o It is sometimes called Mann-Whitney-Wilcoxon test or Wilcoxon Rank Sum test.
Assumptions
Test statistic
n(n − 1)
T =S −
2
Problem 1
1 30 577 840
2 25 963 700
.
5. Statistical Since the p-value (<0.0001) is less than α (0.05), reject H0.
Decision
6. Conclusion There is a difference in the population median hemoglobin
levels of the cadmium oxide exposed animals and unexposed
animals. Those animals exposed to cadmium oxide had lower
hemoglobin levels.
When to use: when the assumptions underlying one-way analysis of variance are
not met
Assumptions
o The samples are independent random samples from their respective
populations.
o The measurement scale employed is at least ordinal.
Test Statistic
Hypotheses
o H0: All of the k population medians are equal.
o Ha: At least one of k population medians is different.
Problem 1
In a study of pulmonary effects on guinea pigs, Lacroix et al. (2002) exposed ovalbumin
(OA)-sensitized guinea pigs to regular air, benzaldehyde or acetaldehyde. At the end of
exposure, the guinea pigs were anesthetized and allergic responses were assessed in
bronchoalveolar lavage (BAL). One of the outcome variables examined was count of
eosinophil cells. Can we conclude that the three populations represented by three samples
differ with respect to eosinophil cell count?
1 5 47.00
2 5 16.00
3 5 57.00
.
5. Statistical Since the p-value (0.0104) is less than α (0.05), reject the H0.
Decision
6. Conclusion There is a difference in the population median eosinophil
counts of ovalbumin-sensitized guinea pigs exposed to regular
air, benzaldehyde and acetaldehyde.
References
Asaad, AS. (2011). Simplified Biostatistics. Manila: REX Book Store
Biostat 201 Lecture Notes 1st Sem, SY 2017-2018
Biostat 301 Lecture Notes 1st Sem, SY 2019-2020 by Tolabing, MC, Asaad, AS &
Mira, NC