One Way ANOVA
One Way ANOVA
By
Mohammed Nawaiseh
One-Way ANOVA ("analysis of variance") compares the means of two or more independent groups in order to determine whether there is
statistical evidence that the associated population means are significantly different.
● One-Way ANOVA is a parametric test.
● for testing if 3(+) population means are all equal.
This test is also known as:
● One-Factor ANOVA /// One-Way Analysis of Variance /// Between Subjects ANOVA
The variables used in this test are known as:
● Dependent variable
● Independent variable (also known as the grouping variable, or factor)
● This variable divides cases into two or more mutually exclusive levels, or groups
The One-Way ANOVA is commonly used to test the following:
● Statistical differences among the means of two or more groups or two or more interventions or two or more change scores
Note: Both the One-Way ANOVA and the Independent Samples t Test can compare the means for two groups. However, only the One-Way
ANOVA can compare the means across three or more groups.
Note: If the grouping variable has only two groups, then the results of a one-way ANOVA and the independent samples t test will be equivalent. In
fact, if you run both an independent samples t test and a one-way ANOVA in this situation, you should be able to confirm that t2=F.
Notes
● (B) Factor: The independent variable. The categories (or groups) of the independent variable will
define which samples will be compared. The independent variable must have at least two
categories (groups), but usually has three or more groups when used in a One-Way ANOVA.
● (C) Contrasts: (Optional) Specify contrasts, or planned comparisons, to be conducted after the
overall ANOVA test.
a. When the initial F test indicates that significant differences exist between group means,
contrasts are useful for determining which specific means are significantly different when
you have specific hypotheses that you wish to test. Contrasts are decided before
analyzing the data (i.e., a priori).
● (E) Options: Clicking Options will produce a window where you can specify which Statistics to
include in the output (Descriptive, Fixed and random effects, Homogeneity of variance test,
Brown-Forsythe, Welch), whether to include a Means plot, and how the analysis will address
Missing Values (i.e., Exclude cases analysis by analysis or Exclude cases listwise). Click
Continue when you are finished making specifications.
Post Hoc
● Equal Variances Assumed → Tukey (preferred) or LSD or
bonferroni
● Equal Variances Not Assumed → Games-Howell
Notes
● (D) Post Hoc: (Optional) Request post hoc (also known as multiple comparisons) tests.
Specific post hoc tests can be selected by checking the associated boxes.
○ (1) Equal Variances Assumed: Multiple comparisons options that assume
homogeneity of variance (each group has equal variance).
○ (2) Test: By default, a 2-sided hypothesis test is selected. Alternatively, a
directional, one-sided hypothesis test can be specified if you choose to use a
Dunnett post hoc test.
■ Click the box next to Dunnett and then specify whether the Control
Category is the Last or First group, numerically, of your grouping
variable. In the Test area, click either “< Control” or “> Control”.
■ The one-tailed options require that you specify whether you predict that
the mean for the specified control group will be less than (> Control) or
greater than (< Control) another group.
○ (3) Equal Variances Not Assumed: Multiple comparisons options that do not
assume equal variances.
○ (4) Significance level: The desired cutoff for statistical significance. By default,
significance is set to 0.05.
● When the initial F test indicates that significant differences exist between group means,
post hoc tests are useful for determining which specific means are significantly different
when you do not have specific hypotheses that you wish to test. Post hoc tests compare
each pair of means (like t-tests), but unlike t-tests, they correct the significance estimate to
account for the multiple comparisons.
OUTPUT
● The Means plot is a visual representation of what we saw in the
Compare Means output. The points on the chart are the average of
each group. It's much easier to see from this graph that the current
smokers had the slowest mean sprint time, while the nonsmokers
had the fastest mean sprint time.
● Means plot is displayed.
○ Graph → chart builder → line → simple lines
ANOVA Assumptions
● Independent observations often holds if each case (row of cells in SPSS) represents a unique person or other statistical unit. That is, we usually don't
want more than one row of data for one person, which holds for our data;
● Normally distributed variables in the population seems reasonable if we look at the histograms we inspected earlier. Besides, violation of the normality
assumption is no real issue for larger sample sizes due to the central limit theorem.
○ Skewness and kurtosis (the p value should be between -1 and +1, to be considered normally distributed)
○ This meet our data
● Homogeneity means that the population variances of BDI in each medicine group are all equal, reflected in roughly equal sample variances. Again, our
split histogram suggests this is the case but we'll try and confirm this by including Levene's test when running our ANOVA.
Running ANOVA
● There's many ways to run the exact same ANOVA in SPSS. Today,
we'll go for General Linear Model because creates nicely detailed
output.
● The post hoc test we'll run is Tukey’s HSD (Honestly Significant
Difference), denoted as “Tukey”.
Options
● “Estimates of effect size” refers to partial eta squared. “Homogeneity tests” includes
Levene’s test for equal variances in our output.
SPSS ANOVA Output - Levene’s Test
● Levene’s Test checks if the population variances of BDI for the four medicine groups are all equal, which is
a requirement for ANOVA. As a rule of thumb,
● we reject the null hypothesis if p (or “Sig.”) < 0.05.
● In our case, p = 0.949 so we do not reject the null hypothesis of equal variances (or homogeneity). We
assume the population variances are all equal so this ANOVA assumption is met by our data.
SPSS ANOVA Output - Between Subjects Effects
1. Our null hypothesis is that the population means are equal for all medicines administered. P (“Sig.”) < 0.000 -way less than
0.05- so we reject this hypothesis: the population means are not all equal. Some medicines result in lower mean BDI scores
than other medicines.
2. The different medicines administered account for some 39% of the variance in the BDI scores. This is the effect size as
indicated by partial eta squared.
3. Partial Eta Squared is the Sums of Squares for medicine divided by the corrected total sums (5) of squares (2780 / 7071 =
0.39).
4. Sums of Squares Error represents the variance in BDI scores not accounted for by medicine. Note that 3+4 =5 .
SPSS ANOVA Output - Multiple Comparisons
● Tukey’s HSD
● Right, now comparing 4 means results in (4 - 1) x 4 x 0.5 = 6
distinct comparisons, each of which is listed twice in this
table. There's three ways for telling which means are likely to
be different:
● (1)Statistically significant mean differences are flagged with
an asterisk (*). For instance, the very first line tells us that
“None” has a mean BDI score of 6.7 points higher than the
placebo -which is quite a lot actually since BDI scores can
range from 0 through 63.
● (2) As a rule of thumb, “Sig.” < 0.05 indicates a statistically
significant difference between two means.
● (3) A confidence interval not including zero means that a
zero difference between these means in the population is
unlikely.
Results
● only 1 pair of means do not differ.
○ Homeopathic with placebo
● That means that the Homeopathic Treatment acts like a
placebo, because both decreased the mean of the
depression score similarly.
Tutorial (3)
● File → weights.sav
● Medical statistics book → P 113 to 1
● The spreadsheet weights.sav contains the data from a population sample of 550 term babies who had their weight recorded
at 1 month of age. The babies also had their parity recorded, that is their birth order in their family
● Q → Are the weights of babies related to their parity?
● Variables
○ Outcome (DV) variable = weight (continuous)
○ Explanatory (IV) variable = parity (categorical, four groups)
Analysis
● Descriptive statistics
● Assumptions
○ Normality
● Running the one-way ANOVA
○ homogeneity of variances
● Post-hoc tests
● Means plot
● Reporting the results
Descriptive statistics
● Analyze →Descriptive Statistics →Frequencies
● Sample size assumption
○ The Frequency table shows that the sample size of each group is large in that all cells have more than 30 participants. The cell
size ratio (cell with maximum number /cell with minimum number) is 62:192 or 1:3 and does not violate the ANOVA assumptions
○ ANOVA will be robust to some degrees of non-normality, outliers and unequal variances
Normality Assumption
● Analyze→Descriptive Statistics→ Explore
● The Descriptives table shows that means and medians for weight in each group are approximately equal and the values for skewness and
kurtosis are all between +1 and −1 suggesting that the data are close to normally distributed. The variances in each group are 0.384,
0.351, 0.366 and 0.287 respectively. The variance ratio between the lowest and highest values is 0.287:0.384 which is 1:1.3.
● Shapiro–Wilk → only One sibling group does not conform to normality ( P < 0.05)
● Histograms
○ confirm the tests of normality and show that the distribution for babies with one sibling has slightly spread tails
● Normal Q–Q plots
○ have small deviations at the extremities
○ normal Q–Q plot for babies with one sibling deviates slightly from normality at both extremities. Although the histogram for babies
with three or more siblings is not classically bell shaped, the normal Q–Q plot suggests that this distribution conforms to
normality
Outliers Assumption
● there are two outlying values, one in the group of babies with one sibling and one in the group of babies with two siblings.
● It is unlikely that these outlying values, which are also univariate outliers, will have a large influence on the summary statistics and ANOVA result
because the sample size of each group is large. However, the outliers should be confirmed as correct values and not data entry or data recording
errors.
● Once they are verified as correctly recorded data points, the decision to include or omit outliers from the analyses is the same as for any other statistical
tests. In a study with a large sample size, it is expected that there will be a few outliers.
● In this data set, the outliers will be retained in the analyses and the extreme residuals will be examined to ensure that these values do not have undue
influence on the results
Running the one-way ANOVA
Test of Homogeneity of Variances table
● the P value of 0.590 in the significance column, which is larger
than the critical value of 0.05, indicates that the variance of each
group is not significantly different from one another.
○ ⇒ Assumption has been met
ANOVA table
● The degrees of freedom for the between-group sum of squares is the number of groups minus 1, that is 4 − 1 = 3, and for the
within-group sum of squares is the number of cases in the total sample minus the number of groups, that is 550 − 4 = 546.
● In this model, the F value, which is the between-group mean square divided by the within-group mean square, is large at 3.239
and is significant at P = 0.022. This indicates that there is a significant difference in the mean values of the four parity groups.
Effect size
● First method
○ The amount of variation in weight that is explained by parity can be calculated as the between-group sum of squares
divided by the total sum of squares to provide a statistic that is called eta squared as follows:
○ Eta2 = Between-group sum of squares/Total sum of squares
■ = 3.477/198.842
■ = 0.017
○ This statistic indicates that only 1.7% of the variation in weight is explained by parity.
● Seconde Method
○ Alternatively, eta2 can be obtained using the commands Analyze→ Compare Means→Means, clicking on Options and requesting
ANOVA table and eta. This will produce the same ANOVA table as above and include eta2 but does not include a test of homogeneity
or allow for post-hoc testing.
Post-hoc tests
● Although the ANOVA statistics show that there is a significant difference in mean weights between parity groups, they do not indicate
which groups are significantly different from one another
○ To know that, post hoc tests are used
● planned and post-hoc tests should only be requested after the main ANOVA has shown that there is a statistically significant difference
between groups.
● When the F test is not significant, it is unwise to explore whether there are any between-group differences
● The choice of post-hoc test should be determined by equality of the variances, equality of group sizes and by the acceptability of the test in a particular
research discipline. For example, Scheffe is often used in psychological medicine, Bonferroni in clinical applications and Duncan in epidemiological
studies. The advantages of using a conservative post-hoc test have to be balanced against the probability of type II errors, that is missing real
differences
● The LSD test is the most liberal post-hoc test because it performs all possible tests between means. This test is not normally recommended when more
than three groups are being compared or when there are unequal variances or cell sizes.With no adjustments made for multiple tests or comparisons,
the results of the LSD test amount to multiple t-testing
● The Bonferroni post-hoc comparison is a conservative test in which the critical P value of 0.05 is divided by the number of comparisons made. Thus, if
five comparisons are made, the critical value of 0.05 is divided by 5 and the adjusted new critical value is P = 0.01. In SPSS the P levels in the Multiple
Comparisons table have already been adjusted for the number of multiple comparisons. Therefore, each P level obtained from a Bonferroni test in the
Multiple Comparisons table should be evaluated at the critical level of 0.05.
● By using the Bonferroni test, which is a conservative test, the significant differences between some groups identified by the LSD test are now
nonsignificant. The mean values are identical but the confidence intervals are adjusted so that they are wider
LSD Bonferroni
Just one significant Difference
Just one significant Difference
● The Duncan test shown in the Homogeneous Subsets table is one of the more liberal
post-hoc tests.
○ Under this test, there is a progressive comparison between the largest and smallest
mean values until a difference that is not significant at the P < 0.05 level is found
and the comparisons are stopped. In this way, the number of comparisons is limited.
○ The output from this test is presented as subsets of groups that are not significantly
different from one another.
○ The between-group P value (0.05) is shown in the top row of the Homogenous
subtests table and the within-group P values at the foot of the columns.
○ Thus in the table, the mean values for groups of singletons and babies with one
sibling are not significantly different from one another with a P value of 0.104.
○ Similarly, the mean values of groups with one sibling, two siblings, or three or more
siblings are not significantly different from one another with a P value of 0.403.
○ Singletons do not appear in the same subset as babies with two siblings or with
three or more siblings which indicates that the mean weight of singletons is
significantly different from these two groups at the P < 0.05 level.
Means plot
● there is a trend for weight to increase with increasing parity and helps in the interpretation of the post-hoc tests.
● It also shows why the group with one sibling is not significantly different from singletons or babies with two siblings or with three or
more siblings, and why singletons are significantly different from the groups with two siblings or with three or more siblings
Trend test
● Polynomial option → The increase in weight with increasing parity suggests that it is appropriate to test whether there is a significant
linear trend for weight to increase across the groups within this factor. A trend test can be performed by re-running the one-way ANOVA
and ticking the Polynomial option in the Contrasts box with the Degree: Linear (default) option used.
● If each of the parity cells had the same number of cases then the unweighted linear term would be used to assess the significance of the
trend. However, the cell sizes are unequal and therefore the weighted linear term is used. The table shows that the weighted linear term
sum of squares is significant at the P = 0.006 level indicating that there is a significant trend for mean weight to increase as parity or the
number of siblings increases
Reporting the results
● Weight was approximately normally distributed in each group and that the group sizes were all large (minimum 62) with a cell size ratio of 1:3 and a
variance ratio of 1:1.3. The significant difference in weight at 1 month between children with different parities can be described as F = 3.24, df = 3, 546,
P = 0.022 with a significant linear trend for weight to increase with increasing parity (P = 0.006). The degrees of freedom are conventionally shown as
the between-group and within-group degrees of freedom separated with a comma.
● Post hoc
○ If the Bonferroni post-hoc test had been conducted, it could be reported that the only significant difference in mean weights was between
singletons and babies with two siblings (P = 0.029) with no significant differences between any other groups.
○ If Duncan’s post-hoc test had been conducted, it could be reported that babies with two siblings and babies with three or more siblings were
significantly heavier than singletons (P < 0.05).
■ However, babies with one sibling did not have a mean weight that was significantly different from either singletons (P = 0.104) or from
babies with two siblings, or with three or more siblings (P = 0.403)
References
● https://libguides.library.kent.edu/SPSS/OneWayANOVA
● https://www.spss-tutorials.com/spss-one-way-anova-with-post-hoc-tests-example/
● Medical statistics book