The Impact of Web Page Text-Background Colour Combinations On Readability, Retention, Aesthetics and Behavioural Intention

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14


3, 183–195

The impact of web page text-background colour

combinations on readability, retention, aesthetics
and behavioural intention
{University of Missouri – Rolla, Missouri, USA; e-mail: [email protected]
{Matrikon Corporation, USA

Abstract. The purpose of this experiment was to examine the black text on white background (so-called positive
effect of web page text/background colour combination on text). White text on a black background (negative
readability, retention, aesthetics, and behavioural intention. text) is almost as good. Although the contrast
One hundred and thirty-six participants studied two Web
pages, one with educational content and one with commercial ratio is the same as for positive text, the inverted
content, in one of four colour-combination conditions. Major colour scheme throws people off a little and slows
findings were: (a) Colours with greater contrast ratio generally their reading slightly. Legibility suffers much more
lead to greater readability; (b) colour combination did not for colour schemes that make the text any lighter
significantly affect retention; (c) preferred colours (i.e., blues than pure black, especially if the background is
and chromatic colours) led to higher ratings of aesthetic quality
and intention to purchase; and (d) ratings of aesthetic quality made any darker than pure white.
were significantly related to intention to purchase.
Unfortunately, Nielson does not offer any references
for this statement. In fact, an examination of the
1. Introduction research that exists on this topic indicates that the
relationship between text-background colour combina-
The flexibility of the World Wide Web has made it tions and readability is not as clear as it might seem,
very simple for developers to create text and back- though it is generally true that a strong contrast leads to
ground combinations of a variety of differing colours, more readable text. In addition, colours are used on web
not to mention background textures. Luckily the use of pages for purposes other than maximizing readability.
textured backgrounds has, for the most part, come and These colours enhance the aesthetics of the page, which
gone, most likely driven by popular demand (and can potentially impact the user. This will also be
empirical evidence, Hill and Scharff 1999). However, a addressed below. We will begin with a discussion of
myriad of different text-background colour combina- the effect of page colour on readability.
tions still proliferate.
Web design guidelines often include recommendations
for appropriate colour combinations, many of which 1.1. Readability
recommend high contrast between text and background
with particular emphasis on the traditional black on A great deal of research on readability of text on a
white. ‘Web gurus’ are quick to make definitive computer screen pre-dates the World Wide Web and,
statements about design and readable text, as exempli- thus, was conducted with monitors that were less
fied by Jakob Nielsen (Nielsen 2000): effective in terms of luminance and luminance contrast,
which turn out to be important factors in mediating the
Use colours with high contrast between the text effect of font/background colour combinations (Bouma
and the background. Optimal legibility requires 1980, Mills and Weldon 1987). However, this research

Behaviour & Information Technology

ISSN 0144-929X print/ISSN 1362-3001 online # 2004 Taylor & Francis Ltd
DOI: 10.1080/01449290410001669932
184 R. H. Hall and P. Hanna

provides a useful background, and results are largely important role in information distribution and commu-
consistent with more recent studies. nication. Displays on the web are unique in that a designer
Much of the early work on text-background combi- cannot be very certain about the browser, system,
nations failed to identify specific colour combinations resolution, or other factors that may affect a given
that were the most readable (Radl 1980). For example, display. Further, many different types of multimedia
one study failed to find any significant difference among devices can and are used via the web, which allows for
24 different colour combinations on performance with a factors such as text dynamics to play a role in impacting
text search task (Pace 1984). On the other hand, colour perception. One interesting study, which relates to
regardless of the specific colour combination, higher the latter, was recently conducted by Wang and collea-
levels of contrast generally lead to greater readability gues (Wang et al. 2003), where scrolling text was
(Radl 1980, Bruce and Foster 1982). examined. They varied a number of factors associated
More recent research supports the contention that with the scrolling text. Among these factors was text-
contrast is an important predictor of readability. For background colour combination. They found that
example, Shieh and Lin (2000) compared the impact of combinations with positive polarity resulted in better
12 different colour combinations on participants’ ability performance (that is dark text on light background), and,
to perform a basic visual identification task. In addition as with studies mentioned previously, the greater the
to colour combination they considered screen type contrast between colour combinations the better the
(LCD vs. CRT) and ambient illumination. First of all, performance. It should be noted that a similar positive
colour combination had a greater impact on perfor- polarity effect on readability performance was found in
mance than the other factors, indicating the importance the Shieh study discussed above (Shieh and Lin 2000).
of colour combinations. Blue and yellow combinations A series of two experiments conducted by Hill and
lead to the best performance and purple and red the Scharff (1997, 1999) focused specifically on web pages,
worst. Consistent with previous research, blue and consisting of text presented via a web browser. In the
yellow also had the greatest luminance contrast and most recent study Hill and Scharff (1999) varied the
red and purple the least. In general, the trend across all background texture, colour, and saturation/lightness of
colour combinations was the higher the luminance a given page. Participants were required to search for
contrast, the better the performance. The Shieh and specific objects within the page and reaction time for
Lin (2000) study also included a measure of subjective completion of the search was thought to be indicative of
preference, and the results with respect to colour readability. In this study they used only black text, but
combinations, paralleled the readability results to a varied background colours (blue, grey, and yellow).
surprising degree. This is discussed in more detail in the They found a significant main effect for colour with
preference and aesthetics section below. better performance for the grey and yellow backgrounds
Recent research also indicates that inconsistency in than with the blue, again consistent with better
studies of readability as a function of font/background performance for higher contrast.
colour combinations may be due to the confounding of In an earlier study (Hill and Scharff 1997) six colour
contrast of hue with luminance contrast. Hue is the combinations were varied in addition to font type and
dimension that we normally think of as colour, which is word style (italicized vs. plain). Participants searched
defined by wavelength, while luminance is the ‘bright- web sites to find a target word and, again, reaction time
ness’ of a colour as defined by wave height. Colours not represented readability. A main effect for colour was
only differ from one another in hue, but they also differ found with the best performance for green text on a
to some degree in luminance. Lin (2003) conducted a yellow background and the worst for red on green. This
series of three experiments where chromatic (i.e., not poor performance for red and green was likely due to
black/white) colours were placed on a grey and more than just lower contrast ratio, in that opponent
luminance of colours was systematically varied. Read- colours such as these often appear to ‘vibrate’ when
ability performance in most cases could be accounted placed side by side (Clarke 2002). Though this finding
for by luminance contrast, not hue (colour). The one appears to be consistent with the high contrast effect, it
exception was at very low levels of luminance contrast. should be noted that black on white was one of the six
In this case, purple and cyan resulted in better combinations tested, and performance was better for
performance than yellow, despite equivalent luminance green text on the yellow background. The finding that
contrast of the colour and the background. performance with Black on White was not as good as a
There are few empirical studies on readability and text/ chromatic colour combination is inconsistent with the
colour combinations specifically aimed at web pages (Hill contrast effect and clearly inconsistent with Nielson’s
and Scharff 1997). Studies specifically aimed at the web recommendation in the quote above. This inconsistency
are important since the web has come to play such an with the contrast effect may be due to the fact that
Web text-background colour 185

luminescence was not controlled, which is representative importance of aesthetics as a component in usable
of the fact that colours on the web cannot be well designs (Nielsen 2000). However, Web design, like most
controlled, since they vary with the users browser and design endeavours is a balance between the functional
computer system. In addition, the study found that the and aesthetic. Factors such as aesthetically pleasing
colour effect was often mediated by other factors, such colour combinations can play an important role in
as font type. More specifically, the better performance generating positive affect, which may be particularly
for green on yellow was due to performance with Times important for a commercial web site where a company is
New Roman font, while the performance was much trying to encourage users to associate a given company
worse for this colour combination when Arial font was brand with positive feelings. Leaders in the HCI field,
used. such as Don Norman, have recently focused on the need
The 1997 study (Hill and Scharff 1997) also included a to consider aesthetics and emotion in design (Norman
comparison of grey and white backgrounds, which was 2002). Aesthetic factors may serve to affect behavioural
motivated by the fact that most web browsers at the time intention, which could presumably lead to behaviours
had grey backgrounds as a default. Due to the contrast that would be especially important for commercial sites,
effect one would expect that a white background would in particular purchasing.
result in better readability. Therefore, they replicated the There is a long history of research on the impact of
method of the first experiment with the exception that colours on emotions independent of computer displays.
only black text and three different background colours One consistent finding is that people in general tend to
(light grey, dark grey, and white) were used. Surpris- find short wavelength colours (blues and greens) as more
ingly, they found better performance with the grey pleasant than long wavelength colours (reds and
backgrounds than with the white background, a finding, yellows). For example, Guilford and Smith (Guilford
again, inconsistent with the contrast effect. (Ironically, 1959) asked participants to rate colours based on
despite these findings, the default background in web preference, which resulted in the following rank order-
browsers these days is, of course, white.) ing from most to least preferred: blue, green, purple,
In April of 2000 the World Wide Web consortium violet, red, orange and yellow. A similar result emerged
(w3c) published a working draft of a document for from a very different study (Osgood et al. 1957), in
‘Techniques for Accessibility Evaluation and Repair which participants across a number of cultures were
Tools’ ( This included asked to rate colour words (e.g., ‘red’, ‘green’) using a
an algorithm for determining the brightness (lumines- semantic differential scale. In this study participants
cence) contrast and colour (hue) contrast between two associated blue and green with ‘good’. However, there
colours based on the standard method of assigning RGB was some indication that the relationship between
(red, green and blue) values to colours (http:// wavelength and preference was not the only relevant The author dimension. Though yellow was associated with ‘bad’
of this technical document and colleagues also carried and ‘weak’, red was rated as ‘strong’ and ‘active’, which
out an initial evaluation study of the algorithms (http:// can not be conceived as the opposite end of the In this study, preference dimension. Thus there appears to be another,
42 different web pages were created that represented somewhat orthogonal dimension to preference, which is
different levels of contrast based on a combined score arousal. In fact, studies of the autonomic nervous
from the two w3c recommended algorithms. These system response to colours have also found that longer
pages included short text passages. In a within subject wavelength colours elicit higher levels of autonomic
design, 50 participants were asked to rate each of the arousal than short wavelength colours (Wilson 1966,
pages using a sliding scale that ranged from ‘impossible Jacobs and Hustmyer 1974). This arousal can, however,
to read’ to ‘effortless to read’. Although the relationship be negative or positive depending on context. For
between contrast and readability ratings was not perfect, example, in contrast to the relatively positive ‘strong’
and outliers were noted, a strong and significant and ‘active’ associated with red mentioned above,
relationship was found, adding further support to the another study found that long wavelength colours can
importance of contrast as effecting readability, and also also elicit higher levels of state anxiety (Jacobs and Suess
supporting the validity of the algorithm. 1975).
In a more recent examination of colours on emotions,
Valdez and Mehrabian (1995) systematically controlled
1.2. Affect, aesthetics, and preference hue, saturation, and brightness and utilized a pleasure-
arousal-dominance emotion model for conceptualizing
Experts such as Nielsen have long expressed the user responses. Users rated colours using a semantic
importance of design simplicity and de-emphasized the differential scale. In one experiment participants rated
186 R. H. Hall and P. Hanna

various colours within a given hue and in a second the author points out that, although there were a
experiment participants rated different hues. Overall, the number of colour effects, there was no consistent effect
expected relationship between pleasure and wavelength for hue on ratings or performance. The only exception
was found – short wavelength colours were preferred. was that short wavelength colours are preferred for
However, the effects for arousal were not consistent with combinations with negative polarity (light on dark).
previous research, in that the most arousing colours Thus, the only clear finding was consistent with the
included green and even blue (green-yellow, blue-green, research on colours cited above, that blues and greens
and green), which are short wavelength colours. The are preferred, but, in general, there was minimal effect
authors point out that they also found a strong positive for colour combinations on subjective measures of
relationship between saturation (i.e., a colour’s ‘vivid- preference/aesthetics/affect.
ness’) and arousal, while they controlled carefully for Lastly, we mentioned above that we would revisit the
saturation in comparing colours (hues). Thus, the highly Shieh and Lin study (2000) study in which subjective
arousing effect of red found in previous studies may preference was examined. In this study, the measure of
have been the result of the fact that samples of red tend preference partially included affect. Participants were
to be highly saturated, so the high levels of arousal asked to rate the different colour combinations on a 10
attributed to red may have been due to the confounding point scale with 1 representing ‘very poor’ and 10
of hue with saturation in these previous studies. representing ‘excellent. In their ratings, users were asked
There has been an increased interest in emotion as it to emphasize ‘clearness’, ‘aesthetic appearance’, and
relates to computers in the form of ‘affective comput- ‘visual comfort’ in making an overall preference rating.
ing’, which is an area that has become popular in the last Thus this question combined subjective rating of read-
decade (Picard 1997). Emotional responses have been ability with affect/aesthetics. In fact, as mentioned
identified and are related to characteristics of the above, the preference ratings strongly paralleled the
interface and computer system. For example, Riseberg readability performance. Blue and yellow combinations
and colleagues (Riseberg et al. 1998) purposely created were rated the highest on preference, while purple and
frustration in users by offering them a cash reward for red were rated the lowest.
performance on a video game, and then purposely
creating a ‘stuck mouse’ effect during the game.
Physiological measures of autonomic arousal differen- 1.3. Extension of previous research
tiated between frustrated and non-frustrated states in
users. Similarly, in a recent study, increased autonomic The current experiment extends the research discussed
arousal was found in response to video and audio that above in two basic ways. First this experiment will
was not properly synchronized (Ali and Marsden 2003). examine the affective impact of text-colour combina-
However, none of the studies that have emerged within tions as they are presented on web pages, and the
the affective computing research area have examined the associated impact on behavioural intention. As men-
impact of colour on perception of computer displays in tioned, an emphasis has been placed on the role of affect
general, or web pages in particular. This topic is and aesthetics in web design recently. Market research-
particularly important since colour and aesthetics can ers have recognized for some time the importance that
be a very important part of web design as mentioned aesthetic factors can play on consumer behaviour, and
above. this almost surely should impact on web design. Among
In general there are few studies that have examined the factors that have been emphasized in this context are
the impact of computer display colour combinations on more aesthetic visual displays, in which colour will
user emotions. One exception was a study conducted by certainly come to play an important role (Jennings
Pastoor (1990), which included two experiments. In 2000). The second basic way in which this research will
experiment 1, participants viewed a set of nouns on extend the research reviewed is that a measure of
coloured backgrounds in 792 different colour combina- retention will be included as an outcome. All of the
tions. The participants used a six step scale to rate the studies cited above use basic measures of readability,
words. They were instructed (Pastoor 1990) to ‘read which usually consist of some variation on a single-
some of the displayed words and to emphasize the word-search task. Though this is informative with
aesthetic appearance of the screen pages in forming their respect to basic processing, it does not address higher-
ratings’. In experiment 2 a greatly reduced set of 18 level outcomes of readability such as retention. Reten-
colour combinations was used and the outcome tion is a very important factor for the large number of
measures included a reading task, and search task, and information-based web sites that exist. It is, of course,
subjective ratings of aesthetics, power, legibility, and an important factor for e-learning applications, since the
strain. Summarizing the results of both of these studies user’s goal is usually to retain the information beyond
Web text-background colour 187

the time the page is being read. This also applies to The colour combinations used in this study varied
information included in e-commerce sites, since the along two dimensions: contrast and preference. The
users’ tasks are often facilitated when they can retain latter is discussed below. The four colour combinations
information from page to page. Therefore, measures of (font/background) used were black/white, white/black,
higher level processing, such as retention, are an light blue/dark blue, and cyan/black. Both black on
important next step in examining the impact of text- white and white on black colour combination repre-
background colour combinations. sented maximal contrast. We also used a combination of
light and dark blue, and cyan (blue-green) on black. The
former represented a greater degree of both brightness
2. Research model and hypotheses and colour contrast. The contrast ratios for all colours,
based on the w3c recommended algorithm discussed
Figure 1 is a graphical depiction of the framework above is presented in table 1.
that guided the current research, and represents the There is a large body of research, reviewed above, that
relationship between font colour, outcomes, and con- indicates that high levels of contrast leads to better
tent. readability. Though most of this was not specifically
The model is based on the contention that contrast aimed at web pages, we expect this effect will extend to
factors will impact readability and retention, and the web. Specifically, the black/white combinations
preference will impact aesthetics and intention in a should result in the highest levels of readability,
fairly straightforward manner. These are represented in followed by the dark/light blue combination, followed
the first four hypotheses presented below. Similarly, we by the cyan/black combination.
propose that these consequent measures, readability
with retention and aesthetics with intention, will be Hypothesis 2: Colour combinations with higher levels
related to one another, though in a more indirect of contrast will lead to greater retention than colour
fashion. Finally, it’s important to point out that this is a combinations with lower levels of contrast.
preliminary and exploratory framework for describing
these relations and is principally provided here as an There is a logical connection between the readability
organizational guide for a series of hypotheses and of text materials and the retention of the material, since
analyses that will follow, not as a representation of a the latter is not possible without the former. It follows
structural statistical model to be tested as a whole. that contrast should also positively impact retention. As
The hypotheses derived from this model and explana- with readability, we predict that the black/white
tions follow. Note that, in the following hypotheses, combinations should result in the highest levels of
when we use the term contrast, we are referring to retention, followed by the dark/light blue combination,
contrast of brightness and hue combined. followed by the cyan/black combination.

Hypothesis 1: Colour combinations with higher levels Hypothesis 3: Preferred colours will lead to higher
of contrast will receive higher ratings in readability. ratings of aesthetics.

The second dimension that the colours represent is

preference. With respect to the colours we selected we
conceive the dark and light blue combination as
ranking highest on this dimension, since blues are

Table 1. Colour combinations and contrast.

colour Brightness* Colour**
Black/white 255 765
White/black 255 765
Light blue/dark blue 210 588
Cyan/black 178 510
*Range from 0 – 255, w3c recommended minimum = 125.
Figure 1. Research model. **Range from 0 – 765, w3c recommended minimum = 500.
188 R. H. Hall and P. Hanna

consistently preferred across the colour studies re- quiz over participants’ retention of information con-
viewed. The cyan and black combination is second on tained on the web pages they viewed and the other three
this dimension, since the cyan is a combination of measures were subjective self-report measures, in which
green and blue, which are low-wavelength colours. This they were asked to rate statements, which referred to the
is balanced out by the presence of a black background. pages. We assume that readability will be a basic
Although most of the studies reviewed did not examine prerequisite to accurate retention, since information
achromatic colours (black and white), those that did cannot be retained if it is not acquired. As a
indicate that a chromatic colours are less preferred. consequence, a significant relationship between read-
For example, in the Osgood cross cultural study on ability and retention is predicted.
colour names (Osgood et al. 1957), black and grey were
associated with ‘bad’, and, though white was associated Hypothesis 6: Ratings of aesthetics will be significantly
with ‘good’ it was also associated with ‘weak’. In the related to behavioural intention.
Pastoor (1990) study discussed above, in experiment 2
achromatic colour combinations were included. Parti- Advertisers in print and television media have long
cipants rated the colour combinations that included known that the aesthetics of the media can impact
blue and cyan higher than the achromatic combina- buying behaviour (Jennings 2000). Though the web is a
tions in 15 of 16 combination comparisons on different medium, where interactivity plays a much
subjective ratings of aesthetics and power; though more important role, the impact of aesthetics should
achromatic colours were preferred in readability and still have an important impact on behaviour. E-
eye strain (achromatic colours were rated as causing commerce researchers have suggested that we need to
less eye strain). We propose that these findings from think of users as actors in a play as opposed to
previous research will extend to the web, such that the observers, as would be the case with traditional media
preferred colours will be rated as the most aesthetically (Laurel 1993). Jennings (2000) argues that principles of
pleasing. More specifically, we predict that the dark aesthetics in design focus principally on visual percep-
and light blue combination will lead to the highest tion, and that ‘pleasing visuals are important because
ratings in aesthetics and behavioural intention, fol- they create first impressions which result in a desire to
lowed by cyan and black, and this will then be explore further’. He also notes (Jennings 2000) that
followed by the achromatic (black and white) colour many web sites do not take this into account and for
combinations. such sites ‘visual improvements should be made before
considering more subtle issues’. Therefore, a significant
Hypothesis 4: Preferred colours will lead to higher relationship between ratings of aesthetics and beha-
ratings of behavioural intention. vioural intention is predicted.

It is our contention that colours that are preferred will

generate positive affect, which will, in turn, lead to a 2.1. Content
greater intention to purchase a given product. There-
fore, we also predict that preference will impact As noted in the model above, we used two different
behavioural intention, such that these same preferred types of content: educational and commercial. We do
colours will have a significant impact on behavioural not propose any specific hypotheses associated with the
intention, in the same order as presented above with different content, since we anticipated that the same
aesthetic ratings. relationships among colour combinations and outcome
measures will be found across content areas. We used
Hypothesis 5: Ratings of readability will be signifi- these two different content areas for a number of
cantly related to retention. reasons. First, we wanted to examine the generalizability
of the results. Second, many web design texts make a
Unlike most of the experiments reviewed above, distinction among basic types of web sites, and these two
readability in this experiment was rated via participants’ types of sites represent two of the basic categories (Lazar
subjective ratings. In the studies that used subjective 2001, Farkas and Farkas 2002). Third, we propose that
ratings, such as the Ridpath et al. study (http:// the focus of these two types of sites represent well the, results were different types of outcomes proposed in our model. With
similar to those that used objective measures of education the focus is more on retention, while, with
readability, such as search tasks, in that contrast was commercial sites, the focus is more on behavioural
predictive of readability. Retention, on the other hand, intention. Of course, aesthetic factors are important in
was an objective measure in our study consisting of a education and retention plays an important role in
Web text-background colour 189

commerce. However, the primary goal of education (1) The colour combination made the text easy to
oriented sites is to provide the user with information and read;
this often involves encouraging the user to retain the (2) The colour combination made the text easy to
information after they leave the sites. On the other hand, study;
the bottom line for most commercial sites is to increase (3) I found the colour combination pleasing to look
sales by directly or indirectly encouraging the user to at;
purchase something, and this is often done by focusing (4) I found the colour combination stimulating to
on the users’ affective states, encouraging them to the eye;
become excited about a product or service. (5) I found the colour combination to be profes-
sional looking.

3. Research methodology The following two items were also added to the
Hallaview survey:
3.1. Participants
(1) If I had available funds, I would like to buy this
One hundred and thirty-six students enrolled in product;
General Psychology classes at the University of Mis- (2) The colour combination made me want to buy
souri – Rolla participated in this experiment as partial this product.
fulfillment of a research participation requirement for
the class. This questionnaire was designed for this experiment.
We did not use the same preference measures as the
experiments reviewed in the introduction because in
3.2. Materials some cases they confounded readability and aesthetics,
and/or they asked a single question (Shieh and Lin
3.2.1. Stimulus materials – web pages: Two different 2000), which would negatively impact reliability.
web pages were used as stimulus material for this Further, we developed questions based on the model
experiment. One of these web pages covered information we posed. Within our questionnaire, items 1 and 2 were
that is used in an introductory level neuroscience class intended as measures of readability; items 3 – 5 were
and covered information on the Neuron. The other page intended to measure aesthetics; and items 6 and 7 were
advertised the ‘Hallaview 3000’, which was a fictional measures of behavioural intention. We conducted a
TV/DVD player. This content was created from factor analysis to assure the proper classification of the
information gathered from a number of technology measures, as well as coefficient alpha analyses in order to
and entertainment web sites. The passages were rela- assure adequate reliability (see Results section).
tively short; the Neuron page consisted of 338 words
and the Hallaview page was 279 words.
Four different font-background colour combinations 3.3. Procedure
were used for each of these sites: black text on white
background (BW); white text on black background This experiment took place in 10 experimental
(WB); light blue text on dark blue background (B); and sessions, made up of groups of 10 – 30 students over
cyan text on black background (CB). The hexagonal the course of two semesters. For each session, students
codes for these colours were: black (000000); white were randomly assigned to one of four-colour condi-
(FFFFFF); light blue (DED9FB); dark blue (000066); tions: BW, WB, B, or CB (see section on web pages
cyan (00FFFF). The materials used in this experiment above for description of colours). When students arrived,
can be viewed on the web at an introductory web site was displayed on their
font_color. computers with written directions. The entire experiment
was on-line and time was strictly controlled, so that
3.2.2. Outcome measures: A 10 question, multiple- students did not proceed to the first study page until told
choice quiz was developed covering information on to do so. They then viewed the page for 10 min, after
both web pages (Neuron and Hallaview). In addition, which they were required to go to the quiz/questionnaire
surveys were developed for both of the web pages. page for 10 min, etc. The content areas were counter-
Students responded to questions on a 10-point Likert balanced so that, in every other experimental session,
scale with 1 labelled ‘strongly disagree’ and 10 labelled students studied the commercial page first, while in the
‘strongly agree’. Both surveys included the following five other sessions; they studied the educational page first.
items: The experimental session schedule is displayed in table 2.
190 R. H. Hall and P. Hanna

4. Results Table 4. Factor loadings for Hallaview outcomes (rotated

4.1. Classification of measures Factor

Two factor analyses were conducted, one for the Items Aesthetics Readability Intention
neuron outcomes and one for the Hallaview outcomes. Easy to read (0.49) 0.78 ( 7 0.02)
In both cases a principal components with a Varimax Easy to study (0.52) 0.73 (0.06)
rotation was used. In the first analysis a two-factor Pleasing to look at 0.88 (0.22) (0.16)
solution was forced to represent readability and Stimulating to the 0.86 (0.15) (0.22)
aesthetics (there were no behavioural intention items Professional looking ( 7 0.02) 0.84 (0.23)
in the first post-questionnaire). The items loaded Like to buy (0.16) ( 7 0.03) 0.85
consistent with expectations, with the exception of the Colours made me (0.13) (0.26) 0.78
professional looking item which loaded on the read- want to buy
ability factor. These loadings are displayed in table 3.
The rotated solution accounted for 86% of the variance
and the aesthetics and readability factors accounted for
45% (Eigenvalue = 2.25) and 41% (Eiganvalue = 2.05) behavioural intention scale for the Hallaview ques-
of the variance respectively. tionnaire. These measures were constructed by aver-
In the second, Hallaview, analysis a three-factor aging the items that primarily loaded on a given
solution was selected to represent readability, aesthetics, factor (the bold items in tables 1 and 2 for each
and behavioural intention. Again, the items loaded factor). To assess the reliability of these newly created
logically as anticipated with the exception that the scales, coefficient alphas were computed at the item
‘professional looking’ item again loaded on the read- level and these were a = 0.85, a = 0.89, a = 0.80,
ability factor. The items and loadings are displayed in a = 0.85, and a = 0.55 for the neuron-aesthetics,
table 4. The rotated solution accounted for 78% of the neuron-readability, Hallaview-aesthetics, Hallaview-
variance and the aesthetics, readability, and behavioural readability, and Hallaview-behavioural intention scales
intention factors accounted for 30% (Eigenva- respectively. Despite the low alpha level for the
lue = 2.07), 28% (Eigenvalue = 1.97), and 21% (Eigen- behavioural intention scale we made the decision to
value = 1.45) of the variance accordingly. use the scale in subsequent analysis. The decision was
Five factor scores were created for further analyses, based on the identification of the scale in the factor
consisting of aesthetics and readability scales for both analysis, and our reluctance to use a single item
the neuron and Hallaview questionnaires, and a measure, by dividing the scale. Further, the low alpha
score is most likely partly attributable to the small
number of items in the scale (2), since alpha value is
Table 2. Experimental session schedule. known to decreases with the number of items
(Nunnaly 1978).
Time Activity
0 – :10 Introduction, consent
:10 – :20 Study content 1 4.2. Hypotheses 1 and 2: Impact of colour-combinations
:20 – :30 Quiz and questionnaire 1 on readability and retention
:30 – :40 Study content 2
:40 – :50 Quiz and questionnaire 2
In order to address the first two hypotheses that
colours with higher contrast would have a greater
impact on readability and retention, a one-way
Table 3. Factor loadings for neuron outcomes (rotated between-subjects multivariate analysis of variance
(MANOVA) was computed with experimental group
Factor (BW vs WB vs B vs CB) as the independent variable
and neuron readability, Hallaview readability, neuron
Items Aesthetics Readability quiz score, and Hallaview quiz score as the dependent
Easy to read (0.52) 0.76 variables. The number of participants per group were:
Easy to study (0.55) 0.72 29, 31, 39, and 35 for the BW, WB, B, and CB groups
Pleasing to look at 0.91 (0.27) respectively. The MANOVA was significant
Stimulating to the eye 0.93 (0.12) L(12,336) = 0.771, p 5 0.001. Due to the significant
Professional looking (0.01) 0.92
MANOVA, a series of four univariate ANOVAs were
Web text-background colour 191

conducted, one for each of the four dependent 4.4. Hypotheses 5 and 6: Readability-retention
variables. The two readability ANOVAs were statis- relationship and aesthetic-intention relationship
tically significant, while the two retention ANOVAs
were not. Tukey’s post hoc tests were then computed In order to address hypotheses 5 and 6, Pearson
for both of the readability ANOVAs. For both correlations between readability and retention were
ANOVAS the CB group scored significantly lower computed for both the neuron and Hallaview sites.
than all other groups. In addition, for the neuron The readability and retention (quiz) scores were
ANOVA, the BW group was marginally significantly significantly related for the neuron page but not for
higher (p = 0.062) than the WB group. For the the Hallaview page. To address hypothesis 5, a
Hallaview ANOVA, the BW group was also signifi- correlation between aesthetics and behavioural intention
cantly higher than the B group and marginally higher was computed for the Hallaview page (there was not a
(p = 0.062) than the WB group. No other mean behavioural intention factor for the Neuron page). This
comparisons were significant. The readability and correlation was statistically significant. The correlations
retention descriptive statistics are displayed in table 5. and significance/probability levels for these analyses are
displayed in table 7.

4.3. Hypotheses 3 and 4: Impact of colour-combinations

on aesthetics and behavioural intention 5. Discussion

In order to address the third and fourth hypoth- 5.1. Hypotheses 1 and 2: Impact of colour-combinations
eses, a one-way between-subjects multivariate analysis on readability and retention
of variance (MANOVA) was computed with experi-
mental group (BW vs. WB vs. B vs. CB) as the According to hypothesis 1, colours with higher
independent variable and neuron aesthetics, Hallaview levels of contrast were expected to lead to higher
aesthetics, and Hallaview behavioural intention as the
dependent variables. The number of participants per
group were: 30, 32, 39, and 35 for the BW, WB, B,
and CB groups respectively. The MANOVA was Table 6. Aesthetics and behavioural intention scores for the
marginally significant L(9,316) = 0.889, p = 0.08. Due neuron and Hallaview page as a function of colour. Mean
to the marginally significant MANOVA a series of (standard deviation).
three univariate ANOVAs were performed on neuron Neuron Hallaview
aesthetic ratings, Hallaview aesthetics, and Hallaview Font/background
behavioural intention. The neuron aesthetics ANOVA colour Aesthetics Aesthetics Behaviour
was statistically significant but neither of the Halla-
Black/white 5.53 (2.54) 5.47 (2.23) 4.43 (2.10)
view ANOVAs were significant. Tukey’s post hoc White/black 5.70 (2.58) 6.08 (2.44) 3.98 (2.16)
tests were conducted to compare the means for the Light blue/dark 6.97 (1.86) 6.60 (2.08) 4.94 (2.30)
neuron aesthetics ANOVA and the mean difference blue
between the blue and black/white group means was Cyan/black 6.06 (2.39) 6.13 (2.17) 4.87 (2.33)
marginally significant (p = 0.058). The descriptive F (degrees of 2.72 (3,132)* 1.48 (3,132)ns 1.33 (3,132)ns
statistics associated with these ANOVAs are pre-
sented in table 6. *p 5 0.05; ns
not significant.

Table 5. Readability and retention scores for the neuron and Hallaview page as a function of colour. Mean (standard deviation).
Neuron Hallaview

Font/background colour Readability Retention Readability Retention

Black/white 7.63(2.20) 8.93(1.51) 7.66(2.02) 8.45(1.76)
White/black 6.25(2.19) 8.29(1.44) 6.43(1.93) 8.06(1.61)
Light blue/dark blue 6.47(1.99) 9.00(1.36) 6.25(1.84) 8.08(1.53)
Cyan/black 5.05(1.96) 8.49(1.63) 5.03(1.88) 8.06(1.37)
F (degrees of freedom) 8.52(3,132)** 1.975(3,131)ns 10.36(3,131)** 0.497(3,132)ns
**p 5 0.001; ns
not significant.
192 R. H. Hall and P. Hanna

Table 7. Readability/retention and aesthetics/behaviour moderate at best, as indicated by the correlational

correlations for neuron and Hallaview. analysis. It is also possible that the difference in
Neuron Hallaview contrast ratio for the different colour combinations
was not great enough to have an impact. Note that
Readability/ Readability/ Aesthetics/ all of the colour combinations that were used in this
Measures retention retention behaviour
experiment were above the minimum based on w3c
r (degrees of 0.211* 0.134ns 0.340*** recommendations (see table 1). There is some
freedom) evidence that contrast ratio only has an impact on
*p 5 0.05 (2-tailed); ***p 5 0.001; ns
not significant. readability performance when the contrast ratio for
some colours is below a minimum baseline (Lin
2003). Though this minimum contrast finding refers
to readability, and we did find a significant contrast
readability ratings and retention (quiz) scores. This effect on readability in this study, it is possible that
hypothesis was largely supported with respect to this minimum baseline effect is even stronger for
participants’ perceived readability. For both types of higher level processes such as retention.
material, the means were significantly different, and
were in the correct order, with the exception that the
mean for the light blue on dark blue rating was 5.2. Hypotheses 3 and 4: Impact of colour-combinations
higher than the white on black rating with the on aesthetics and behavioural intention
educational page. The traditional black on white page
was clearly the most readable based on participant The third and fourth hypotheses were partially
ratings. Tukey’s post hoc tests indicated that the supported in that, overall, differences among colour
black on white page was significantly or marginally groups were marginally significant with respect to
significantly higher than all other colours. Surpris- measures of aesthetics and behavioural intention.
ingly, the white on black and light blue on dark blue Further, for the education passage the mean aesthetic
pages were largely equivalent on readability ratings, ratings differed significantly. Moreover the order of
despite the fact that the white on black page the means was consistent with expectations in that
represents maximum contrast. Two potential factors the blue group was highest on aesthetics and
could be responsible for this unexpected result. First, behavioural intention scores followed by the cyan
users are more familiar with black on white, which on black group (table 3). These results also substan-
may in turn have a positive impact on readability. tially contrast with the readability and retention
This would be partially consistent with the Nielsen outcomes, since learners consistently viewed the
quote that begins this paper (Nielsen 2000), though combinations that included chromatic colours as
white on black was not found to be ‘almost as good’ more pleasing, stimulating, and more likely to lead
as black on white, as stated in the quote. Another them to buy the product in the case of the
factor that may have influenced the high rating of commercial site.
the blue page is that previous research has found a It is somewhat surprising that the white on black
significant relationship between readability and sub- colour (negative polarity) was rated higher than the
jective preference (Shieh and Lin 2000), and the blue black on white (positive polarity). As noted above,
page was the most preferred page as predicted. black often has negative associations (Osgood et al.
Although, it’s important to note that we cannot say 1957) and, when a difference is found, users generally
if the readability lead to the preference or vice versa. prefer positive polarity (dark on light) (Shieh and Lin
The second hypothesis was not supported. Reten- 2000, Wang et al. 2003). Though this is a difficult
tion scores did not differ significantly as a function of finding to explain, one possible explanation is that
colour for either type of content. Further, the order the novelty of the white/black combinations somehow
of the means was not even as anticipated. Though affects aesthetic ratings in comparison to the tradi-
those in the black on white group scored higher than tional black/white. Two disclaimers worth noting
other groups with the commercial content, a lower about this unexpected effect are that these two colour
contrast colour combination (light blue/dark blue) combinations did not significantly differ, and the
resulted in a slightly higher score than the black on white/black combination was rated lowest in the
white with the educational content. It may simply be degree to which participants were encouraged to buy
that colours do not affect retention the way they the product (behavioural intention) based on colour
impact readability. The relationship between these (perhaps reflecting the negative connotations of the
two factors, though significant with one passage, was black colour).
Web text-background colour 193

5.3. Hypotheses 5 and 6: Readability-retention 5.5. Implications for designers

relationship and aesthetic-intention relationship
As stated in the introduction, one of the primary
The fifth hypothesis that readability would be purposes of this experiment was to provide a systematic
significantly related to retention was supported for the and empirical investigation of the impact of colour
commercial site, but not for the educational site. The combinations on outcomes, in order to provide
correlation was also relatively low (0.21) even for the designers with practical evidence-based guidelines. It is
commercial site. It may simply be that low level important to keep in mind that this is a, controlled,
processes of readability are not as strongly related to single experiment, conducted with college students,
retention as was anticipated. It may also be due to the therefore results should be interpreted accordingly.
fact that the measure of readability was a subjective Despite these constraints, we do feel confident that
rating, while the measure of retention was objective there are a number of guidelines that can be derived
recall. from these results that can aid the designer in selecting
The aesthetic factor score proved to be significantly background/text colour combinations.
related to behavioural intention, which is consistent with
the sixth hypothesis. It appears, then, the degree to . For educational sites, where retention and read-
which the participants saw the pages as pleasing and ability, especially readability, are a major concern;
stimulating was linked with the degree to which they black on white or a closely related combination of
intended to purchase a given product. This effect is not text should be used. This advantage appears to be
surprising given the fact that aesthetics had been the result of both the contrast ratio of black and
identified with other media as being an important factor white and the convention or familiarity, since
in influencing consumer behaviour. However, this white on black text (equivalent contrast, but much
relationship is relatively unexplored with respect to less common) was rated much lower on read-
web pages. This supports the view expressed by Jennings ability. Therefore, if other colour combinations
(2000) that visual aesthetics are a fundamental compo- are the convention for a given context, then the
nent in determining the effectiveness of e-commerce convention should weigh as heavily in the decision
sites. as contrast.
. A site that is viewed as readable is also viewed as
professional, so these same readability guidelines
5.4. Classification of measures should be applied if ‘professional’ is an important
part of the image to be projected.
When the questionnaire was designed it was antici- . For commercial sites, where aesthetic and pur-
pated that outcome scores would fall into two factors chasing behaviour factors are a major concern,
for the neuron questionnaire (readability and aesthetics) chromatic (coloured) text/background combina-
and three factors for the commercial page (readability, tions should be used. Chromatic colours are more
aesthetics, and behavioural intention). For the most part likely to lead the viewer to see a site as more
measures loaded as anticipated with the exception that visually pleasing and stimulating. Most impor-
the item, which asked participants to rate the degree to tantly, these colours are more likely to lead a
which the page was professional looking, loading most viewer to the intention to purchase products
strongly on the readability, rather than the aesthetic advertised on the site. Combinations involving
factor. This finding is interesting, though not too the colour blue, and including two chromatic
surprising, that readers view the professional nature of colour (e.g., light blue on dark blue) appear to be
a site to be more tied to its function than its appearance. preferable to a combination with less contrast and
It’s also interesting that the ‘easy to study’ and ‘easy including a chromatic colour (e.g., cyan on black)
to read’ items had relatively large loadings on the for promoting positive affect and behavioural
aesthetics factor, indicating that aesthetics and read- intention.
ability were not completely independent. In fact, this
result is consistent with the Shieh and Lin (2000) study
reviewed in the introduction, where preference for 5.6. Limitations
colours paralleled users’ performance on readability
measures in both studies. Thus, while this factor analysis Though systematic and controlled, it’s important to
indicates that it is reasonable to conceive aesthetics and keep in mind that this was an initial exploratory
readability as different outcomes, they are certainly experiment on the impact of web page colour combina-
related. tions on a number of outcomes. It’s important to note
194 R. H. Hall and P. Hanna

limitations to better provide a context for interpretation. created and examined, or existing sites could be
First, we used a relatively small set of colour combina- evaluated in an applied context. Finally, a more general
tions as a starting point, and purposely selected colours examination of the impact of colours in other web-based
that varied on a number of dimensions in an effort to contexts would be interesting, and more complex
gather as much initial information as possible. As a measures of aesthetic and affective qualities such as
consequence, this does not allow for the specific flow could be considered.
isolation of the impact of individual factors. Second,
colour preference can certainly be influenced by
experience and culture (Morton 1997), and the sample Acknowledgements
of participants consisted of college students at a
technology oriented school in the USA Midwest, which This research was supported in part by the Instruc-
is a relatively restricted sample. Third, due to time tional Software Development Center at the University
constraints we did not include any pre-tests for of Missouri – Rolla.
determining participants pre-knowledge and skills. We
could have used this information to remove variance
associated with these individual difference and/or
examined the impact of these factors in mediating References
outcomes. Fourth, we did not include behavioural
intention measures for the educational site outcomes, ALI, A. N. and MARSDEN, P. H. 2003, Affective muli-modal
since it did not explicitly involve the possibility of interfaces: The case of mcgurk effect. Proceedings of the
product purchase. However, we could have included Intelligent User Interfaces Conference, pp. 224 – 226.
BOUMA, H. 1980, Visual reading processes and the quality of
intentions in the form of intentions or motivation to use text displays. In E. Grandjean and E. Vigliani (eds)
or study the educational information, which would have Ergonomic Aspects of Visual Display Terminals (London:
provided additional information on the relationship Taylor & Francis), pp. 101 – 114.
between cognitive outcomes and behavioural intention. BRUCE, M. and FOSTER, J. J. 1982, The visibility of colored
Fifth, due to the limited and focused nature of an characters on colored backgrounds in viewdata displays.
Visible Language, 16, 382 – 390.
experiment such as this, our test stimuli could only CLARKE, J. 2002, Building accessible web sites (Boston, MA:
consist of small amounts of material on single web New Riders).
pages, whereas most web users form impression based FARKAS, D. K. and FARKAS, J. B. 2002, Principles of web design
on experience with sites consisting of a number of linked (New York: Longman).
pages. Despite these limitations there are a number of GUILFORD, J. P. 1959, A system of color preferences. American
Journal of Psychology, 72, 487 – 502.
interesting findings that emerged, raising a number of HILL, A. L. and SCHARFF, L. V. 1997, Readability of screen
important issues to be addressed in future research. displays with various foreground/background color combi-
nations, font styles, and font types. Proceedings of the
Eleventh National Conference on Undergraduate Research,
5.7. Future research pp. 742 – 746.
HILL, A. L. and SCHARFF, L. V. 1999, Legibility of computer
displays as a function of colour, saturation, and texure
This research could be extended in a number of backgrounds. In D. Harris (ed) Engineering Psychology and
directions. First, a more controlled systematic study of Cognitive Ergonomics (Sydney: Ashgate), pp. 123 – 130.
colour combinations could be conducted. Hues could be JACOBS, K. W. and HUSTMYER, F. E. 1974, Effects of four
selected to better represent wavelengths across the psychological primary colors on gsr, heart rate, and
respiration rate. Perceptual and Motor Skills, 38, 763 – 766.
spectrum – in particular including long wavelength JACOBS, K. W. and SUESS, J. F. 1975, Effects of four
colours. Further, these different colour combinations psychological primary colors on anxiety state. Perceptual
could be presented more systematically using a fully and Motor Skills, 41, 207 – 210.
crossed factorial design. Second, a number of alternative JENNINGS, M. 2000, Theory and models for creating engaging
outcomes could be explored. Objective measures of and immersive e-commerce websites. Proceedings of the
ACM Computer Personnel Conference, pp. 77 – 85.
readability could be utilized, such as most previous LAUREL, B. 1993, Computers as Theater (Reading, MA:
studies and retention measures could be expanded to Addison-Wesley).
include even more complex learning measures such as LAZAR, J. 2001, User-centered Web Development (Sudbury,
problem solving and structural knowledge. Physiologi- MA: Jones and Bartlett).
cal measures of affect, which are popular within the area LIN, C. 2003, Effects of contrast ratio and text color on visual
performance with tft-lcd. International Journal of Industrial
of affective computing could be used. Third, a more Ergonomics, 31, 65 – 72.
applied direction could be pursued. More realistic and MILLS, C. B. and WELDON, L. J. 1987, Reading text from
detailed e-learning or e-commerce prototypes could be computer screens. ACM Computing Surveys, 19, 329 – 358.
Web text-background colour 195

MORTON, J. 1997, Guide to color symbolism (Manoa, HI: RISEBERG, J., KLEIN, J., FERNANDEZ, R. and PICARD, R. 1998,
Colorcom). Frustrating the user on purpose: Using biosignals in a pilot
NIELSEN, J. 2000, Designing Web Usability: The Practice of study to detect the user’s emotional state. Proceedings of the
Simplicity (Indianapolis, IN: New Riders Publishing). ACM Special Interest Group on Computer-Human Inter-
NORMAN, D. A. 2002, Emotions & design: Attractive things actions, pp. 227 – 228.
work better. Interactions Magazine, ix, 36 – 42. SHIEH, K. and LIN, C. 2000, Effects of screen type, ambient
NUNNALY, J. 1978, Psychometric Theory (New York: McGraw- illumination, and color combination on vdt visual perfor-
Hill). mance and subjective preference. International Journal of
OSGOOD, C. E., SUCI, G. J. and TANNENBAUM, P. H. 1957, The Industrial Ergonomics, 26, 527 – 536.
Measurement of Meaning (Urbana, IL: University of Illinois VALDEZ, P. and MEHRABIAN, A. 1995, Effects of color on
Press). emotions. Journal of Experimental Psychology, 123, 394 –
PACE, B. J. 1984, Color combinations and contrast reversals on 409.
visual display units. Proceedings of the Human Factors WANG, A., FANG, J. and CHEN, C. 2003, Effects of vdt leading-
Society 28th Annual Meeting, pp. 326 – 331. display design on visual performance of users in handling
PASTOOR, S. 1990, Legibility and subjective preference for color static and dynamic display information dual-tasks. Interna-
combinations in text. Human Factors, 32, 157 – 171. tional Journal of Industrial Ergonomics, 32, 93 – 104.
PICARD, R. 1997, Affective Computing (Cambridge, MA: M.I.T. WILSON, G. D. 1966, Arousal properties of red versus green.
Press). Perceptual and Motor Skills, 23, 942 – 949.
RADL, G. W. 1980, Experimental investigations for optimal
presentation-mode and colours of symbols on the crt-screen.
In E. Grandjean and E. Vigliani (eds) Ergonomic Aspects of
Visual Display Terminals (London: Taylor & Francis),
pp. 127 – 136.

You might also like