Taking The Fear Out of Data Analysis: Completely Revised, Significantly Extended and Still Fun 2nd Edition Adamantios Diamantopoulos
Taking The Fear Out of Data Analysis: Completely Revised, Significantly Extended and Still Fun 2nd Edition Adamantios Diamantopoulos
com
https://ebookmeta.com/product/taking-the-fear-out-
of-data-analysis-completely-revised-significantly-
extended-and-still-fun-2nd-edition-adamantios-
diamantopoulos/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookmeta.com/product/loss-data-analysis-the-maximum-
entropy-approach-2-extended-edition-henryk-gzyl/
https://ebookmeta.com/product/econometrics-6th-revised-and-
extended-edition-badi-h-baltagi/
https://ebookmeta.com/product/the-fun-of-dying-find-out-what-
really-happens-next-roberta-grimes/
https://ebookmeta.com/product/a-grammatical-overview-of-lingala-
revised-and-extended-edition-lincom-studies-in-african-
linguistics-michael-meeuwis/
Primary Mathematics 3A Hoerst
https://ebookmeta.com/product/primary-mathematics-3a-hoerst/
https://ebookmeta.com/product/environmental-data-analysis-
methods-and-applications-2nd-2nd-edition-zhihua-zhang/
https://ebookmeta.com/product/the-soul-s-logical-life-towards-a-
rigorous-notion-of-psychology-fifth-edition-revised-and-extended-
by-an-index-wolfgang-giegerich/
https://ebookmeta.com/product/beginning-data-science-in-r-4-data-
analysis-visualization-and-modelling-for-the-data-scientist-2nd-
edition-thomas-mailund/
https://ebookmeta.com/product/environmental-data-analysis-
methods-and-applications-2nd-edition-zhihua-zhang/
TAKING THE FEAR
OUT OF DATA
ANALYSIS
…to our families.
SECOND EDITION
ADAMANTIOS DIAMANTOPOULOS
Professor of International Marketing, Department of Marketing and
International Business, University of Vienna, Austria
BODO B. SCHLEGELMILCH
Professor of International Marketing Management,
Department of Marketing, WU Vienna, Austria
GEORGIOS HALKIAS
Associate Professor of Marketing and Behavioral Research,
Department of Marketing, Copenhagen Business School, Denmark
Published by
Edward Elgar Publishing Limited
The Lypiatts
15 Lansdown Road
Cheltenham
Glos GL50 2JA
UK
EEP BoX
C O N T E N T S I N BR I E F
Full contentsvii
To the readerxii
Instead of a prefacexiii
About the authorsxv
Pre-publication reviews from around the worldxviii
Introduction to Taking the Fear out of Data Analysis xxi
4 Have you cleaned your data and found the mistakes you made? 42
5 Why do you need to know your objective before you fail to achieve it? 59
7 Can you use few numbers in place of many to summarize your data? 90
8 What about using estimation to see what the population looks like? 120
Index310
F UL L C O N TE N T S
To the readerxii
Instead of a prefacexiii
About the authorsxv
Pre-publication reviews from around the worldxviii
Introduction to Taking the Fear out of Data Analysis xxi
4 Have you cleaned your data and found the mistakes you made? 42
The role of data cleaning 42
The role of data coding 47
Finding your mistakes 52
Transforming variables 54
Summary57
Questions and problems 58
Further reading 58
5 Why do you need to know your objective before you fail to achieve it? 59
The need for analysis objectives 59
Setting analysis objectives 60
The question of focus 61
Choosing the method of analysis 63
Summary68
Questions and problems 68
Further reading 69
7 Can you use few numbers in place of many to summarize your data? 90
Characterizing frequency distributions 90
Measuring central location 93
The mode 94
The median 97
The mean 99
Measuring variability 103
The index of diversity 104
The range and interquartile range 105
The variance and standard deviation 106
FULL CONTENTS ix
8 What about using estimation to see what the population looks like? 120
The nature of estimation 120
Setting confidence intervals 123
Estimating the population proportion 125
Estimating the population mean 128
Estimating other population parameters 132
Summary133
Questions and problems 133
Further reading 134
Summary237
Questions and problems 238
Further reading 238
Index310
TO THE READER
Effective data analysis requires the effective use of statistics. Unfortunately, statistics is boring.
It is boring to learn, it is boring to teach, and it is usually boring people who actually like sta-
tistics. Indeed, the comment that a statistician is ‘a person who didn’t have enough charisma
to be a cost accountant’ says it all!
Statistics is also hard. It is hard to learn, it is hard to teach (properly), and it is even harder to
remember what little you may have learned. In short, statistics is rarely fun. But it can be – as
you will soon find out. Trust us.
I N S T E A D O F A PR E F A C E
The first edition of this text goes back to 1997. Cars had just been invented, and computers
were still coal fired – at least from the perspective of our new generation of students. There was
no Facebook, no Amazon, and no iPhone. And, no, you could not watch videos on your mobile
phone or surf the Internet on it. Times were clearly (very) different. Most people learning statis-
tics were still blissfully unaware of some powerful techniques now described in this book, such
as the much-feared multivariate analyses. It was also an innocent time! Questionnaires could
still request the respondents’ ‘sex’ in a dichotomous (male/female) fashion, and researchers did
not even think about being politically incorrect for omitting dozens of other gender options.
However, as unlikely as it sounds, even in these innocent times, the two original authors of this
text, Adamantios Diamantopoulos and Bodo Schlegelmilch, in the profession widely referred
to as the terrible twins (or more unkindly as the ‘gruesome twosome’), still managed to get
into trouble.
Upon seeing a couple of draft chapters of the first edition of this book, reviewers took
exception to several of our jokes, found some of our examples politically incorrect, and
queried the wisdom of not being ‘sufficiently serious’. Having said that, one reviewer openly
admitted that he/she might be getting to be an ‘old sourpuss’(!), with which we politely agreed.
Notwithstanding these obstacles, through a magic mix of constant encouragement and occa-
sional threats of physical violence from the commissioning editor, we somehow managed to
finish the book and get it published! Since then, astonishingly, the original version has been
reprinted no fewer than six times, and we received numerous pieces of fan mail (well, we can
at least recall two positive ones in the late 90s!).
So why a new, completely revised, and significantly extended version of the book? First,
statistical software has progressed enormously and is much easier to handle; we now include
up-to-date applications to illustrate the various techniques discussed. Second, partly as
a result of the ‘software revolution’, sophisticated analytical techniques have become more
accessible; we now include a discussion of the most important ones. And yes, we feel that we
have a responsibility to our readers here: the mere mention of terms like ‘non-hierarchical
clustering’ or ‘orthogonal rotation’ will substantially improve your chances on the job market
and the dating scene! Finally, we now live in different times, so we had to (reluctantly) sanitize
some non-PC jokes.
All these goodies – up-to-date software applications, new statistical tests, and (somewhat!)
sanitized jokes – were largely made possible by taking a new and substantially younger
co-author on board, Georgios Halkias. He is the one you should blame for any comments
that may be (wrongly!) construed to be still not 100% PC (despite our best efforts). Any other
mistakes or omissions you may find are clearly and squarely the responsibility of our won-
derful support team, namely Martina Roth, Doris Lehdorfer, and Thomas Winter from the
University of Vienna, as well as Hanife Özdemir, Erin Silangil, Sarina Mansour Fallah, Sanem
Öztürk, Stephie Habicher, and Reiko Domai from WU Vienna – thank you all for doing a fan-
tastic job! While the attribution of blame to others may contradict the spirit of this preface,
there is an important reason for it. The two senior authors are striving for a second career in
the clergy after retirement and, hence, we cannot possibly be responsible for anything bad!
xiv TAKING THE FEAR OUT OF DATA ANALYSIS
And speaking of our future careers, if anyone knows of a mixed-gender monastery with
basic amenities such as an infinity pool, a nice sea view, and 24/7 room service, where we do
not have to get up in the middle of the night to pray, please let us know (and you’ll get a gen-
erous 2% discount off the price of this book!).
We hope you enjoy reading the book as much as we enjoyed writing it!
Adamantios Diamantopoulos
Bodo B. Schlegelmilch
Georgios Halkias
DISCLAIMER
All examples, numerical figures, data, names, characters, places, products, and incidents men-
tioned in this book are fictitious and only reflect the authors’ imagination. Any resemblance
to actual people, elements, and events is coincidental. No such identification is intended or
should anyhow be inferred.
A B O UT T HE A U TH O R S
Georgios acts as a reviewer for several top-tier journals and sits on the editorial review Board
of the Journal of International Marketing (American Marketing Association). He has received
various international distinctions and awards, including the Outstanding Reviewer Award
2019 and the Outstanding Paper Award 2020 from the International Marketing Review in the
Emerald Literati Awards of Excellence. He has been repeatedly nominated for teaching awards
in courses on quantitative research methods, consumer behavior, and branding, and is the
recipient of the Best Teaching Award 2017 granted for outstanding teaching at the graduate
level (University of Vienna).
When not engaged in highly scientific activities, Georgios practices the guitar because he
wants to form a progressive metal band when he grows up. His wife desperately tries to make
him realize that this ain’t happening.
P R E- P U BL I C A TI O N R E V IEW S FROM AROU ND T H E
WORLD
‘Written with wry wit and incredible clarity, the authors provide the reader a detailed under-
standing of seminal issues in data analysis. A masterful work that truly does “take the fear out of
data analysis” – this book is a rare treat indeed.’
– David A. Griffith, Mays Business School, Texas A&M University, USA
‘Written by a proficient team of authors, Taking the Fear Out of Data Analysis is a fascinating
… ah, forget the marketing blurb. This is a great text, you should read it! And your family too,
provided that they want to learn about the basics of data analysis in the most entertaining way.
There is no doubt that you will devour this book in no time and learn a lot about statistics on
the way.’
– Marko Sarstedt, Ludwig-Maximilians-University (LMU), Germany
‘[H]idden behind some bizarrely memorable examples and illustrations is a very fine introduc-
tion to data analysis. This book will be of value to those approaching quantitative analysis for
the first time, and should be on the reading list for project-based courses and for new research
students.’
– Richard Speed, La Trobe University, Australia
‘In the age of big data, at least a rudimentary understanding of data analysis is a must. Business
students need to know about data analytics, but they are often intimidated by statistics and thus
fail to appreciate the value of data-based and model-supported decision making. Even seasoned
researchers are sometimes uncomfortable with conducting statistical analyses of their data
because they lack the confidence to apply methods that are perceived as abstract and confusing.
In this entertaining book, the authors gently guide the reader through the steps of the data anal-
ysis process and they brilliantly succeed in explaining difficult topics in an engaging, witty, and
highly informative manner.’
– Hans Baumgartner, Smeal College of Business, Penn State University, USA
‘Statistics. I know – you hate it. It’s hard and confusing. Students of all levels find the topic
hard. I tell them to get this book. And no! They cannot borrow mine, I don’t want to lose it.
Diamantopoulos, Schlegelmilch and Halkias knock another one out of the park with this excel-
lent introduction to a great array of statistical issues. They start right at the beginning – which is
always a good place to start if you’re a beginner – and gently, often hilariously, and successfully
guide the reader through the various learning moments that need to be negotiated if one is to
become fearless in the face of columns of data. Priceless.’
– John Cadogan, School of Business and Economics, Loughborough University, UK
‘What happens when three applied researchers write a book on data analysis? Well, as
a minimum, you get a resource book written for the user, and not the statistician. In this
PRE-PUBLICATION REVIEWS FROM AROUND THE WORLD xix
revised edition of the popular book Taking the Fear out of Data Analysis, Diamantopoulos,
Schlegelmilch and Halkias provide a highly practical and helpful book for the applied social
scientist. Their writing (considerably more accessible than the spelling of their surnames) is plain,
straightforward and fun to read. In an era when aspiring researchers will remain a one-legged
scholar unless they master the foundational skills of data analysis and research methodology, this
book is considered an essential read and frequent reference.’
– S. Tamer Cavusgil, Georgia State University, USA
‘ Taking the Fear Out of Data Analysis is one of the few books to provide a comprehensive,
conceptually solid yet accessible overview of the theory and practice of data analysis. Building
on their own extensive research experience, the authors use a pragmatic, elegantly ironic and
competent style to “translate” the complex science of managing data and involve readers in an
enjoyable learning journey. All the concepts and techniques are presented in a manner that is
easy to read and understand and several (and mostly humorous) examples and illustrations have
been integrated throughout the text. Definitely a must-have for anybody who is interested in
discovering not only how to deal with data but also how pleasant it can be to learn it.’
– Alessandro De Nisco, UNINT, Rome, Italy
‘These guys never give me any tips and now they have threatened to buy their bento boxes else-
where if I don’t give them a review. So here it is: I like the book; it has a lot of words!’
– Sōta Tuna, Head Waiter, Harakiri Sushi Shop, Vienna, Austria
‘This book is a real page-turner! Why? Because the book strikes an exceptionally good balance
between fun (funny examples, witty remarks) and a sound introduction to statistics and data
analysis. Thus, the book encourages its readers in an entertaining way to delve into the statis-
tical concepts and methods covered. Despite its reassuring title, the book also does not conceal
the mysterious “dark side of empirical data analysis”, such as p-hacking or HARKing. But this,
of course, makes perfect sense since you can only defeat your fear if you know the danger. The
upshot is that this work definitely delivers on its promises.’
– Dirk Temme, Schumpeter School of Business and Economics, University of Wuppertal,
Germany
‘Read this book! It should not only be on the prescribed list of any student of the social sciences,
it should also be compulsory reading for journalists and the media.’
– Leyland Pitt, Simon Fraser University, Canada
‘It’s been tried and tested and we can now reject the null hypothesis that “this book is no different
to other data analysis books”. It is, and in ways that make it an easier read and an easier ride for
those starting out on their analysis journey. Like any great product, it lives up to its name and
really does help to take some of the fear out of analysing research data. Congratulations to the
authors for updating this now classic text.’
– Vince Mitchell, The University of Sydney Business School, Australia
‘The authors gave me a choice of either giving a testimonial or reading the book cover to cover
… you know the rest.’
– Anonymous PhD Student, Anonymous University, Anonymous Country
xx TAKING THE FEAR OUT OF DATA ANALYSIS
‘The new edition of this book provides excellent guidance to data knowledge and competence
using a problem-solving approach. With the digital becoming increasingly important, analytical
skills should be key competencies in everybody’s daily life. To achieve this goal, Taking the Fear
Out of Data Analysis is highly recommended.’
– Zhongming Wang, Zhejiang University, China
‘ Taking the Fear Out of Data Analysis is the best book for someone who has heard a lot of buzz
about data analysis but doesn’t have a firm grasp of the subject. The book is an eye-opening read
for anyone who wishes to learn about data analysis: understanding of data, preparing the data
for analysis and different analysis techniques. If I had to pick one book for an absolute newbie to
the field of data analysis, it would be this one.’
– Manish Gangwar, Associate Dean, Research and RCI Management and Executive Director,
ISB Institute of Data Science, Telangana, India
‘When I began my academic career as a Research Assistant, the first edition of this book enjoyed
a prominent position on the shelves in the office I shared with three other early career researchers.
We liked this book because it unpacked the complicated, procedure-heavy world of quantitative
methods in a user-friendly way. It poked fun at the subject matter in a manner that was so dis-
arming that even our office’s hard-core qualitative researcher loved it. The significantly extended
new edition is increasingly relevant as the world of quantitative methods has kept on expanding,
in part due to an explosion in software programs that scholars can use seemingly without much
understanding. Do not let the light-hearted nature of this book fool you. It is a statistics book
that carefully leads readers through all the necessary stages of analysis. It effortlessly explains
the analysis details and assumptions that PhD examiners, journal reviewers, and conference
presentation audience members insist on raising. This excellent new edition is destined to be very
well thumbed.’
– Matthew Robson, Cardiff Business School, UK
NOTE
Many testimonials were quite lengthy. While our tolerance for receiving praise tends to be
alarmingly high, we felt it would be in the best interests of our readers to shorten the testimo-
nials so that the part of the book talking about statistics would be slightly longer than the part
devoted to testimonials. An extended list can be found on the publisher’s website for this book.
I N T R O D U C T I O N T O TA KING T H E F EAR O U T O F
DATA ANALYSIS
The trouble with numbers is that they frighten a lot of people.
–Leslie W. Rodger, Statistics for Marketing
This book has been written for people who do not like data analysis, who do not want to
become analysts or statisticians, but who have to learn about data analysis for whatever reason
(e.g., to pass a course at college or university, complete a dissertation, get/keep a job, or to
impress a new date). It has also been written for people who think they cannot understand data
analysis and statistics, having had bad experiences with textbooks full of formulae and little
substance, and teachers full of confidence and no humor. In fact, this book has been written
for anybody suffering from the ‘I hate numbers’ syndrome – and we know there are plenty of
you out there.
What we tried to do in this book is quite simple: take your hand and lead you through the
entire data analysis process without boring you to death along the way. Our specific aims have
been threefold:
1. To provide a comprehensive but digestible introduction into the strange world of data
analysis, assuming no prior knowledge on your part.
2. To indicate the linkages among the various stages of the data analysis process and highlight
the implications of good/bad early decisions on subsequent ones.
3. To demonstrate that learning about data analysis can be an enjoyable experience; hope-
fully, after you have finished reading this book, you will feel that way too.
Our philosophy behind the content and structure of the book is based on a few basic premises.
First and foremost, our main concern is with understanding rather than memorizing. Thus,
we urge you to channel your learning effort towards grasping and digesting the various con-
cepts/techniques of data analysis rather than mechanically reproducing a bunch of formulae.
Moreover, at this introductory level, we feel that your attention should be directed towards key
issues and major building blocks rather than statistical refinements and details. Consistent with
this view, we keep the number of formulae down to an absolute minimum and do not bother
with providing mathematical proofs (the latter being a sure way of sending you to sleep!). We
also firmly believe that a point is best driven home by means of an example. Consequently, we
make liberal use of (mostly silly) examples and illustrations to show how the various concepts
and techniques can be applied. Lastly, we see no harm in making you smile from time to time
– a hefty dosage of humor is often the only way to keep one sane while learning/doing data
analysis!
You can use this book in a number of ways, depending upon your background and objec-
tives. For those of you with no prior experience of data analysis and little, if any, statistical
knowledge, we strongly recommend that you go through the book chapter by chapter. In this
way, you will be introduced to more complex material in a gradual manner and should not
feel lost at any time. On the other hand, those of you with some previous exposure to research
methods and/or statistics may opt for a more flexible approach. For example, you may wish to
xxii TAKING THE FEAR OUT OF DATA ANALYSIS
concentrate on Chapter 4 (dealing with data preparation and coding) onwards and refer back
to Chapters 1 to 3 (which are really background chapters) on an ‘as required’ basis. Finally, for
those of you who are particularly interested in specific types of analysis, you should primarily
focus on Chapters 7 to 14, which cover different analytical techniques (and presuppose famil-
iarity with basic data analysis principles).
There are also some key chapters that everyone should read. These include Chapter 5 (on
setting analysis objectives), Chapter 8 (on the nature of statistical estimation), Chapter 9 (on
the principles of hypothesis testing), and Chapter 15 (on evaluating and presenting the anal-
ysis). Moreover, all readers would do well to heed the numerous HINTS and WARNINGS
dispersed throughout the various chapters. These serve to emphasize key points, awareness of
which can prevent problems and make life easier for you.
The Further Reading section at the end of each chapter should also be consulted. Rather
than provide a long list of references, we have intentionally limited ourselves to a selection of
a few key sources which we feel best amplifies and complements the material covered in the
chapter. Each suggested reading has been briefly annotated, and there is no duplication of
sources in the Further Reading sections of the various chapters. Having said that, many of the
suggested readings may also be useful for issues discussed in the other chapters – so keep an
open mind and be flexible.
The book is organized into three main parts, containing a total of fifteen chapters.
Part I. Understanding Data, provides the necessary background by looking at the nature of
data, the sampling process, and the notion of measurement. These are essential building blocks
underpinning data analysis and a prerequisite for understanding the application of statistical
techniques.
Part II, Preparing Data for Analysis, focuses on the various tasks associated with converting
raw data into a form that can be analyzed and on setting objectives for the analysis. Careful
attention to the issues raised here will prevent a lot of problems at the actual analysis stage.
Part III. Carrying out the Analysis, examines the rationale behind different types of analysis
and introduces a wide variety of analytical methods appropriate for different circumstances.
Starting with simple approaches to describing and summarizing data, a number of techniques
are considered, which, if properly applied, will enable you to get the most out of your data.
A good taste of multivariate analysis is also provided in this edition, navigating the interested
reader through more complex analytical techniques. Finally, several issues are raised relating
to the evaluation and presentation of a data analysis project and the preparation of written and
oral research reports.
By the time you finish reading this book, you should be in a position to know what analysis
to apply, for which purpose, to what kind of data. The chapter-by-chapter overview that follows
should help clarify what all this means.
Chapter 1 lays the groundwork for the rest of the book by introducing you to the data
matrix, which is the raw material you will work with whenever you do data analysis. Here
you will get acquainted with units of analysis, variables, and values, which are the essential
ingredients of data. You will also learn about different types of data and about the distinction
between data and information. By the time you finish this chapter, you should be able to talk
about data as if you knew something about it!
INTRODUCTION xxiii
Chapter 2 looks at sampling to demonstrate how units of analysis may come about in
a particular project. You will understand the rationale for taking a sample rather than stud-
ying the whole population and the different sampling methods that you can use. You will
also encounter the concept of sampling error, which, no doubt, will haunt you forever after!
Finally, the numerous considerations for determining sample size will be piled upon you – if
only to confuse you even further!
Chapter 3 describes what measurement is, why it is important and how it can be done. You
will recognize the advantages and disadvantages of different measurement scales and get to
know a variety of scaling formats. Following this, you will encounter a second kind of error,
the notorious measurement error, which will also haunt you forever after (either with the
sampling error or on its own). By the end of this chapter, you should have a firm grasp of the
options available for measuring your variables and interpreting the resulting values, as well as
for assessing the validity and reliability of measurement.
Chapter 4 moves into the practicalities of preparing data for analysis, focusing on the process
of data editing. You will learn how to detect many errors you may make when processing data
and how to deal with ambiguous, inconsistent, and missing data. You will be shown how to
properly code your data, and, lastly, you will see how easy it is to perform variable transfor-
mations in order to take advantage of the full potential of your data set.
Chapter 5 emphasizes the need for having clear analysis objectives before embarking on the
actual analysis (otherwise, you will not know how to start or when to stop). These will ensure
that your analysis is relevant, comprehensive, and efficient. You will be introduced to different
analytical perspectives, namely description, estimation, and hypothesis-testing (which will
be dealt with in more detail in later chapters). The chapter will conclude with a discussion of
the factors governing the choice of the method of analysis (which will undoubtedly become
the subject of your worst nightmares for years to come!)
Chapter 6 focuses on data description, which is usually the first type of analysis that you will
want to do. You will get to know the various forms of frequency distributions and the steps
to take in grouping data. Your artistic talents will also be thoroughly stimulated by an exami-
nation of different types of graphical representations. At this stage, you will wonder what the
purpose of life would be without percentiles, true class limits, and gives!
Chapter 7 soldiers on with data description and introduces you to different summary meas-
ures that can be used to capture typical or average responses as well as the extent of variability
in your data. By using different summary measures in conjunction with one another, you will
be able to identify the shape of a frequency distribution and compare it to known forms. In this
context, the famous normal distribution will serve as a useful point of reference.
Chapter 8 deals with the process of estimation and shows how you can talk about the
population when you only have a sample. First, you will become familiar with the concept of
a sampling distribution and then learn how to set confidence intervals for different popula-
tion parameters that you may wish to estimate. Being able to make inferences from a sample
to a population will surely set you apart from the uninitiated punter!
Chapter 9 ventures into the mystical world of hypothesis-testing and provides you with an
understanding of the basic principles associated with developing and testing specific research
propositions. You will learn about different types of hypotheses and the rationale behind sig-
xxiv TAKING THE FEAR OUT OF DATA ANALYSIS
nificance testing, which, like estimation, also enables you to make inferences from a sample to
a population. Concepts like ‘null hypothesis’, ‘p-values’, and ‘regions of rejection’ will become
second nature to you and a topic of conversation at every possible opportunity! Finally, in
this chapter, you will be introduced to the concepts of effect size and statistical power and be
alerted toward issues that go beyond statistical significance.
Chapter 10 deals with the simplest type of hypotheses, namely those involving a single
variable and a single sample. Here you will be shown how to examine the fit of your frequency
distributions against prior expectations or against a theoretical distribution. In addition, you
will be able to determine whether your sample is likely to have been drawn from a population
with known parameter values, including tests for central location, proportions, and variabil-
ity, as well as whether your sample is, in fact, random. By the time you complete the chapter,
you should feel confident enough to face more complex hypotheses (involving more than one
sample or more than one variable).
Chapter 11 extends your journey into hypothesis-testing by concentrating on comparisons.
First, you will learn how to compare two or more groups on the same variable – what is known
as an independent measures (or samples) comparison. Next, you will address the issue of
comparing two or more responses from the same group, involving a related measures (or
samples) comparison. In both cases, you will be testing to see whether significant differences
exist, that is, whether any observed differences on your sample results are likely to reflect ‘true’
differences in the population. By becoming an expert on comparisons, you will be able to
impress your friends with profound statements of the sort: ‘Basketball players are, on average,
significantly taller than jockeys’ and ‘There is no significant difference between the propor-
tions of male and female construction workers with a passion for ornithology.’
Chapter 12 shows you how to investigate relationships between two variables. Here you
will be exposed to measures of association that can tell you whether two different variables
are related to one another and which can be subjected to significance tests to see whether any
observed relationship (based on your sample data) is also likely to hold in the population. Once
you have established a significant relationship, your association measure will also enable you
to assess its strength and directionality (i.e., positive or negative). This chapter will also intro-
duce you to fundamental techniques that can not only help you identify a relationship between
two variables, but also allow you to make specific predictions about the expected value of an
outcome variable based on a given level of a predictor variable. The chapter concludes with an
important note on the distinction between correlation and causality – make sure you don’t
miss this!
In Chapter 13, we show you that there is much more to data analysis – multivariate analysis
procedures open up a whole lot of new opportunities for extracting information from your
data and answering more complex research questions. We start by making a basic distinction
between dependence and interdependence methods and proceed by giving you a good taste
of the techniques included in the former. Here you will learn a great deal about complex com-
parisons and relationships that involve multiple variables at the same time. Advanced data
analysis using multivariate procedures cannot be done justice in a single chapter – that is why
we added another!
INTRODUCTION xxv
Chapter 14 deals with the remaining group of techniques in the multivariate universe, i.e.,
the interdependence methods. In this chapter, you will become acquainted with the most
fundamental techniques of identifying structures in the data both in terms of variables and
the units of analysis involved. If you reach this point of the book, you should feel rather com-
fortable in dealing with a wide range of complex analytical techniques.
Chapter 15 is an oasis of sanity at the end of your data analysis odyssey. Unbelievable as
it may sound, in this chapter, you will not have to grasp new theoretical concepts, learn yet
another technique, or interpret more statistical results. Responding to your pleas for mercy,
the purpose of this final chapter is to make you sit back and think about the ‘Now what?’
question. Here we talk you through how to present your analysis to your audience(s): Is it too
technical? Are the practical implications clear? Is the presentation attractive? Unless the pres-
entation of your analysis (written and/or oral) is effective for the specific audience involved, all
your efforts will have been in vain. Not only must you do the right thing and do it right: you
must convince others that you have done so.
Have fun.
PART I
UNDERSTANDING DATA
1
What is data (and can you do it in
your sleep)?
In today’s digital economy, it is virtually impossible not to produce data. Automatic data
capture through, for example, digital TVs, refrigerators or washing machines that read RFID
tags embedded in your T-shirt or yogurt cup, traffic cameras (you should always look your
best in public), and temperature sensors (China and some other countries want to ensure that
new arrivals are healthy) constantly produce data. Tracking their shoppers’ purchasing habits,
supermarkets often learn earlier about their female customers’ pregnancies than the respective
fathers (of course, some fathers never learn about their fatherhood – but that is a different
story). When driving a car, automatic measurement devices can let insurance companies
know how safe or unsafe your driving style is (perhaps this explains why your insurance
premium quadrupled last year). Taking your dog for a walk, holistic dog food bakeries and
other stores can detect you via location-based services as potential customers and send you
irresistible coupons on your smartphone, enticing you to visit. When you are surfing the web,
cookies trace your behavior or misbehavior (couples beware when visiting dating sites). Even
when you are sleeping, wearable devices can let your doctor, pharmacist, or manufacturers of
anti-snoring aids know whether you urgently need their products or services.
Although we are swamped with data, sometimes market researchers still need more or
different data. Most of us have already been accosted in the street by extremely ‘friendly’ (and
sometimes obnoxious) characters who conduct personal interviews and ask to ‘please answer
a few questions’ regarding our opinions on certain stores (e.g., ‘Should El-Cheapo supermar-
ket also offer Japanese bubble tea?’), product preferences (e.g., ‘Do you prefer Superclean
over Ultrasteril washing powder?’), voting intentions (e.g., ‘If a general election were called
tomorrow, would you vote for (a) The Conversation Party, (b) The Favor Party, or (c) The
Anti-Everything Party?’), or opinions regarding the European Union (e.g., ‘Do you feel that
Greek producers of eucalyptus-flavored dog biscuits have benefited from EU membership?’).
To top it all off, even the privacy of our own home is not enough to prevent the hungry
information-seekers from reaching us. How many times have we had to miss a crucial part
of our favorite Netflix series in order to answer the phone, only to find out that the caller is
Belinda Pain, from Persistence Research, Inc., who was wondering whether you would partic-
ipate in a survey regarding time-share holidays in Tirana?
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 3
Market researchers also like to make use of web-based survey instruments. Having just
been to a hotel or restaurant or completed your weekly visit to a spiritual healer, you receive
an email requesting you to fill in a questionnaire on various aspects of your experience.
Countless university-based researchers have also discovered crowdsourcing systems (like
Amazon’s Mechanical Turk), where herds of volunteers complete questionnaires against
micro-payments. Finally, very often, we have to provide details about ourselves (e.g., our age,
income, place of residence) when applying for a driving license, passport, or bank account, or
when filing a tax return (very painful) or booking a flight or holiday online.
While the approaches used to obtain data vary in each instance, the objective is always to
learn about individuals with regard to certain characteristics of interest. Now, in statistical
jargon (yes, you do have to learn some of it), the individuals are called units of analysis (or
sometimes ‘observations’, ‘cases’, or ‘subjects’), the characteristics studied are termed var-
iables, and the responses linking the individuals to the characteristics are known as values.
Together, units of analysis, variables, and values make up what we call ‘data’. Thus when
we refer to data, we implicitly address three distinct issues, notably (a) the respondents (as
indicated by the units of analysis), (b) the topic of interest (as described by the set of variables,
and (c) the responses of the latter in relation to the topic of interest (as reflected in the values
of the variables). Variables can assume different values for different units of analysis; if this is
not the case (i.e., when all units of analysis have the same value), then we are not dealing with
a variable but with (surprise, surprise) a constant.
The examples at the beginning of this chapter will have made you realize that units of anal-
ysis do not have to be individuals (although in a great deal of social research, they are). They
can be ordinary objects (e.g., nasal-hair removers, vodka brands, or horsewhips), time-periods
(e.g., months, years, decades, centuries), events (e.g., strikes, accidents, shareholder meetings,
visits to the dental hygienist), or other entities (e.g., firms, cities, nations, zoos). Neither do
variables have to refer to human properties; they can be product features (e.g., speed, dura-
bility, color), organizational dimensions (e.g., centralization, formalization, span of control),
or national characteristics (e.g., inflation rates, government spending, interest rates) – in fact,
anything that can be used to characterize the particular unit of analysis.
When we have a number of units of analysis (e.g., 200 first-year parapsychology students),
a number of variables (e.g., age, gender, parents’ income, preferred method for reaching out
to spirits), and a set of values linking the units of analysis to the variables, the result is a data
set. This can be best visualized as a matrix, the rows of which represent the units of analysis,
the columns the variables, and the matrix cells the relevant values; for our example, the idea is
shown in Table 1.1. By the way, if you are afraid of numbers, you can give names to our tables.
For example, Table 1.1 could become ‘Sissi’ or ‘Rudoph’.
If there are n units of analysis (respondents, objects, events, etc.) and m variables, the data
matrix looks like Table 1.2, where Rij is the response that unit i gives to variable j (in other
words, Rij is the value for unit i on variable j). The subscripts i and j are simply used for count-
ing purposes; in other words, i = 1,2, …, n and j = 1,2, …, m. Obviously, depending upon how
many units of analysis and variables we have, n and m will vary; for example, in Table 1.1, n =
200 and m = 4.
4 TAKING THE FEAR OUT OF DATA ANALYSIS
The main benefit of arranging data in matrix form is that the threefold nature of data (i.e.,
units of analysis, variables, and values) becomes immediately visible; in fact, most data sets
can be represented in the form of Table 1.2. There are some exceptions (e.g., multivariate
time-series data) but these need not concern us at this point.
The data matrix is the starting point for analysis, and its structure determines the kind of
analysis that can be legitimately carried out. Specifically, the number of rows, n, indicates how
many units are being studied; that is, the sample or population size (we will have much more
to say about samples and populations in Chapter 2). The number of columns, m, on the other
hand, indicates how many variables are used to characterize the units of analysis; depending
upon the number of variables, k (k ≤ m), that are simultaneously manipulated by applying
statistical techniques, we talk about univariate (k = 1), bivariate (k = 2), and multivariate (k >
2) data analysis, respectively (we shall return to this issue in Chapter 5). Finally, the natures
of the values, Rij, reflect the level of measurement of the variables and thus indicate what can
and cannot be said about the units of analysis (measurement will be dealt with in some detail
in Chapter 3, so don’t worry if you feel totally confused at the moment!).
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 5
WARNING 1.1 Questions and variables may not be the same: beware of
multi-response questions.
Now, here’s something you must always watch out for when you are dealing with a data set
that is based on questioning respondents: a question and a variable are not necessarily the
same thing. Very often, answering what appears to be a single question in a questionnaire
may, in fact, require multiple responses and will thus result in a number of variables. Table 1.3
illustrates this point: irrespective of whether Question A or Question В is asked, the response
alternatives are identical. However, the nature and number of the resulting variables are not.
If Question A is asked, then only one option needs to be ticked to answer it. Consequently, to
capture all possible responses, a single variable would be sufficient; this would be called some-
thing like ‘most preferred publication’ and would take five values, one for each publication
involved (e.g., 1 = Wall Street Journal, 2 = The Times, …, 5 = Unspeakable Acts). If Question
В is asked instead, a single variable is not sufficient to capture all possible responses, because
a respondent may wish to subscribe to more than one publication (e.g., The Wall Street Journal
and Unspeakable Acts) and thus legitimately tick multiple options. In this case, to ensure that
all possible responses are captured, five variables would be needed (i.e., one for each publi-
cation). The reason for this is that Question В is not really a single question but a series of
questions of the form: ‘Would you subscribe to the Wall Street Journal?’, ‘Would you subscribe
to The Times?’, and so on. The response to each of these questions would be of the ‘yes/no’
variety, resulting in five variables, each taking two possible values (e.g., 1 = would subscribe, 2
= would not subscribe).
HINT 1.1 If a question does not specify to ‘tick one option only’, chances are it is
a multi-response question and cannot be represented by a single variable.
TYPES OF DATA
Let us now move on to the different types of data that one may come across in a data analysis
project. While there are many data classification schemes, at this stage we shall briefly look
6 TAKING THE FEAR OUT OF DATA ANALYSIS
at different types of data according to (a) their meaning, (b) their source, and (c) their time
dimension; in Chapter 3, a fourth classification of data will be introduced based upon their
measurement properties.
Focusing initially on different kinds of data according to their meaning, there are data that
refer to facts; that is, characteristics or situations that exist or have existed in the past. Things
such as age, gender, income, church membership, 1973 sales of Wartburg automobiles, and
number of drunken Taiwanese visitors at last year’s Oktoberfest in Munich are all examples
of facts. Descriptions of individuals’ present behavior (e.g., current shopping habits or using
smartphones during sex) and past behavior (e.g., historical voting patterns in Azerbaijan) also
fall into this category.
Secondly, there are data that refer to awareness or knowledge of some object or phenom-
enon. Typical examples here are brand awareness (e.g., ‘Which of the following toothpaste
brands do you recognize: (a) Draculadent, (b) Vampirmed, (c) Ghoulshine?’), knowledge of
important events (e.g., ‘When did the German chancellor first yodel in public?’), and mastery
of a certain subject or topic (e.g., ‘Who wrote Das Kapital? (a) Woody Allen, (b) John Grisham,
(c) Stephen King, (d) Karl Marx, or (e) Karl Marx and Woody Allen together?’).
Thirdly, there are data representing intentions that are acts that people have in mind to do
(i.e., their anticipated or planned behavior). Such intentions can relate to future purchasing
behavior (e.g., ‘Having tried your free sample of Explosion laxative, do you intend to buy it in
the future?’), social behavior (e.g., ‘Mrs. Corleone, you will be sorry to hear that your son has
failed all his final exams. Do you intend to (a) let him get away with it, (b) cut his pocket-money
by 96%, or (c) confiscate his Ferrari?’), and personal behavior (e.g., ‘Do you seek to enter the
United States to engage in (a) export control violations, (b) subversive or terrorist activities, (c)
any other unlawful activity, or (d) golf with the president?’).
Fourthly, there are attitudes and opinions data, which indicate people’s views, preferences,
inclinations, or feelings toward some object or phenomenon. Examples of attitude/opinions
data are product or service evaluations (e.g., ‘Do you think that reporting by the New York
Times (a) is brilliant, (b) is fake news, or (c) does not adequately reflect the views of the lesbian,
gay, bisexual, and transgender community?’), political beliefs (e.g., ‘In your view, does the
Favor or the Conversation Party have the better policy for providing employment opportuni-
ties for single teenage mothers in the Outer Hebrides?’), and views on social issues (e.g., ‘What
is the single most important problem facing humanity today: (a) poverty, (b) drugs, (c) global
warming, (d) the Welsh rugby team’s recent bad streak?’).
Lastly, there are data relating to the motives of individuals. Motives are internal forces (i.e.,
desires, wishes, needs, urges, impulses) that channel behavior in a particular way. Although
motivations may be complex and difficult to articulate, data of this kind are quite important
because they can tell us why people behave in the way they do. Examples include reasons
given for doing or not doing a certain thing (e.g., ‘I go to aerobics so that my fantastic body
becomes even more irresistible’ or ‘I don’t do any exercise whatsoever because I’m a lazy slob’),
explanations for preferring something over something else (e.g., ‘I prefer a wall between the US
and Canada to a wall between Mexico and the US’), and rationales for holding certain views
or opinions (e.g., ‘I think that memorizing complex statistical formulae is a total waste of time
because you can look them up in a book or use a computer to do the work instead’).
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 7
Turning our attention to types of data according to their source, a broad distinction can be
drawn between primary and secondary data. Primary data are data collected with a specific
purpose in mind; that is, for a particular research project. The researcher usually gathers
such data via surveys (conducted face to face, by telephone, or through the web), experiments
(carried out in the laboratory or a ‘natural’ setting), or observation methods (using automatic
data capturing or humans to record observed behavior). In contrast, secondary data are
data that have not been gathered expressly for the immediate study at hand but for some
other purpose; such data, however, might be of relevance for a particular research project
(in other words, somebody else has done the work but you may be lucky enough to be able
to use it without getting your own hands dirty). A wealth of secondary data can be found in
published statistics (by government departments, trade associations, chambers of commerce,
and research foundations), annual reports (published by business firms as well as non-profit
organizations), and abstracting and index services (covering thousands of periodicals, aca-
demic journals, and newspapers). For those who can afford to pay for them (and beware,
because they don’t come cheap!), there are also syndicated services (providing regular detailed
information on a particular country, industry, or product group) and database services (allow-
ing fast access to digital information sources worldwide, or enabling ‘electronic’ transfer of
data sets from one location to another).
The final classification of data to be considered has to do with the time dimension and
distinguishes between data relating to a single point in time and data relating to a number
of time-periods. Data of the former type are known as cross-sectional data, while the latter
is commonly referred to as longitudinal data. From an analysis point of view, the distinction
between the two is quite important because it determines whether inferences regarding change
can be made. To illustrate this, consider for a moment the three data sets displayed in Table 1.4;
the units of analysis in all cases consist of 600 insomniacs living in Auchtermuchty, Scotland.
Data sets A and В are examples of cross-sectional data. Each provides a snapshot of the varia-
bles of interest at a particular point in time; in this instance, data set A informs us about whisky
and lager purchases in February 2020, while data set В does the same thing for March 2020.
One could compare purchases across product types within each data set and reach conclusions
regarding the most popular drink in a particular time-period.
8 TAKING THE FEAR OUT OF DATA ANALYSIS
Data set C, on the other hand, is an example of longitudinal data and involves repeated
measurements over time on the variables of interest; data set С informs us about whisky and
lager purchases in February and March 2020. As a result, one can compare not only purchases
across product types but also purchases of the same product over time; thus, in addition to
being able to draw the kind of conclusions data sets A and В enable, conclusions regarding
changes in the relative popularity of the three drinks are now possible.
Now, having just distinguished between cross-sectional and longitudinal data, we shall
immediately confuse you by suggesting that one can do longitudinal analysis by using
cross-sectional data! This is not as impossible as it first sounds. Take, for example, data sets
A and B and combine them; that is, imagine they are parts of the same study. Piecing the two
data sets together provides information on the same variables (here whisky and lager pur-
chases) of different (but comparable) units of analysis (here two lots of 600 Auchtermuchty
insomniacs) at different points in time (here February and March 2020, respectively). Data
of this kind are known as trend data and enable inferences to be drawn regarding changes
in aggregate behavior, attitudes, and so on. Election polls provide a good illustration in this
context: ‘… a poll commissioned by the Daily Polygraph, in which the voting intentions of
a nationally representative sample of 11,893 bird-watchers were obtained yesterday, shows the
Anti-Everything Party standing at 23%, the Conversation Party at 37%, and the Pro-Birds Party
at 40%. A similar poll, conducted two weeks ago by the Financial Crimes, had Anti-Everything
neck and neck with the Conversationists at 44.3% and 44.1%, respectively, with the Pro-Birds
trailing at an appalling 11.6%!’
WARNING 1.2 Unless you are certain that the same units of analysis
have been measured over time on the same variable, then
conclusions regarding change at the individual level are not
possible.
Trend data should be distinguished from true longitudinal data (such as those provided by
data set C), where the same units of analysis are studied at different points in time. The latter
is sometimes also referred to as panel data, and in addition to capturing aggregate changes
over time, they enable inferences to be drawn as to changes in individual behavior. A typical
example of this kind of data is provided by consumer panels, in which a number of individuals
or families (usually balanced on such variables as age, income, and geography) record their
purchases of a number of products at regular intervals (e.g., monthly). Their records are
subsequently used, among other things, to determine the degree of brand loyalty (i.e., the pro-
portion of those buying brand X in period 1 who also bought brand X in period 2) and degree
of brand switching (i.e., the proportion of those buying brand X in period 1 who bought some
other brand in period 2). As you have probably fallen asleep by now, have a look at WARNING
1.2 to wake you up.
In general, given a set of variables, four basic kinds of studies can be distinguished according
to (a) whether the variables concerned are measured once or repeatedly and (b) whether the
same or different units are studied in each case (Table 1.5).
As you might have astutely suspected, there are more types of data sets than those we have
discussed. The good news is that, for the most part, they represent different combinations or
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 9
variations of the basic types shown in Table 1.5. For example, the well-known experimental
design of the ‘before–after’ variety with a control group is essentially a combination of two
panels, one of which has been unlucky enough to be exposed to the experimental treatment
(e.g., subjects were forced to watch 43 TV adverts of Loch-Ness-Café Gold Bland) and one
of which has been spared the torture (i.e., subjects were left in peace). Similarly, an omnibus
panel (no, this is not the rear section of a London double-decker bus) is essentially a series of
cross-sectional studies, in which the same group of individuals (i.e., the panel members) is
measured on different variables at different points in time (e.g., at one time, panel members
may be asked to evaluate the aesthetic appeal of alternative packages for cat food and, at
another time, to indicate their attitudes toward a new type of heavy-duty flea spray). Often,
a sub-group of the total panel is selected for a particular purpose; for example, if one wanted
to study consumer reactions to a new type of push-up bra, only those panel members that are
female and over a certain age would (usually) be surveyed.
Before we move on to even more exciting stuff, we need to look briefly at the relationship
between data and information. In everyday language, the two are usually taken to be synony-
mous; however, one can distinguish between them in at least two senses.
Firstly, one can look at information as the product of data; that is, information as data
that has been digested and analyzed. In other words, information is the knowledge obtained
and conclusions arrived at after appropriate analytical techniques have been applied to the
data matrix in Table 1.2. Arguably, a mass of raw data as described at the beginning of this
chapter (i.e., unsummarized/unstructured) is of little informational value in itself. Imagine, for
example, that cookies traced the web-surfing habits of 20 million Russian consumers during
the last three months and you have access to this data. Even if you are Ms. Bigbrains in person,
you need to analyze these data in order to extract managerially useful information, such as on
which sites to place banner ads for your new brand of egg-timers that play the national anthem
when the eggs have been boiled to perfection.
A second distinction between data and information can be drawn on the basis of relevance;
under this view, information is data relevant for a particular decision. To illustrate this, take
again the example of deciding where to place banner ads for the new egg-timer. Assume that,
in addition to the data on web-surfing habits of Russian consumers, you are also given the
following: (a) price lists for banner ads of major websites, (b) a list of websites where your
competitors place their banner ads, and (c) the shoe size of your boss’s secretary. While most
of us would agree that (a), (b), and (c) all constitute some kind of data, virtually no one (sober,
10 TAKING THE FEAR OUT OF DATA ANALYSIS
at least) would consider (c) as relevant information. Of course, for a different decision, (c) may
be a perfectly legitimate informational input (e.g., if the objective is to order sports shoes for
the office rugby team).
SUMMARY
Let us take a deep breath and try to summarize what we have learned so far. We first
looked at the nature of data, distinguishing between units of analysis, variables, and val-
ues; we then put the three together to create a data matrix. Next, we warned against
confusing questions with variables and discussed different kinds of data according to their
meaning, source, and time dimension. Finally, we considered the link between data and
information. It is now time to make a cup of coffee and mentally prepare for the things to
come.
FURTHER READING
Bryman, A. (2016). Social Research Methods, 5th edition. Oxford: Oxford University Press. An
excellent introduction to all aspects of the research process.
Creswell, J. W. & Creswell, J. D. (2017). Research Design: Qualitative, Quantitative and Mixed
Method Approaches. London: Sage Publications. All you need to know regarding thinking
about, designing, and implementing different kinds of research projects.
Sheehan, K. B. & Pittman, M. (2016). Amazon’s Mechanical Turk for Academics: The HIT
Handbook for Social Science Research. Irvine, CA: Melvin & Leigh. A comprehensive guide to
collecting data via Mechanical Turk.
2
Does sampling have a purpose
other than providing employment
for statisticians?
Having taught you all we know about data sets, we shall ignore your cries for mercy and
proceed to do exactly the same with regard to sampling. Crudely speaking, a sample is a part
of something larger, called a population (or ‘universe’); the latter is the totality of entities in
which we have an interest – that is, the collection of individuals, objects, or events about which
we want to make inferences. For example, in Table 1.4 earlier, the population implied was ‘all
insomniacs living in Auchtermuchty, Scotland’. Other examples of populations include ‘all
countries on the planet Earth’, ‘all furniture stores in the UK’, ‘all strikes at Italian knitting
factories during 1975–79’, and ‘all vegetarian chiropodists based in Greater Manchester’. We
can now define a sample as a subset of a given population. Going back again to Table 1.4, the
sample consisted of 600 insomniacs living in Auchtermuchty, who were fortunate enough to
have their purchasing behavior studied. The Scottish Tourist Board assured us that the total
number of insomniacs living in Auchtermuchty is much higher! Possible samples for the other
population examples mentioned above are ‘member countries of NATO’, ‘28 furniture stores
located anywhere in Somerset and Devon’, ‘strikes at Italian knitting factories during the first
six months of 1977’, and ‘13 vegetarian chiropodists with city center practices in Manchester’
(note that these are merely possible samples that could be drawn from the above populations,
not necessarily good samples – an issue to be addressed later).
The rationale for sampling is really very simple: by checking out part of a whole, we can say
something about the whole. However, you may ask, ‘But why bother with a sample in the first
place? Why not study the population instead?’ Well, in some instances, the entire population is
indeed studied as, for example, with the population census, which is conducted every few years
by the government. However, in most cases, undertaking a census is not possible or desirable
for reasons summarized in Table 2.1.
Despite the many advantages of sampling over a census, it is important to keep in mind
that there is a price to pay, namely deciding whether the picture painted by the sample can
be generalized to the population of interest. To illustrate this problem, we have applied our
12 TAKING THE FEAR OUT OF DATA ANALYSIS
magnificent artistic talents and creatively depicted the relationship between population and
sample in graphical form; the relevant masterpiece is shown in Figure 2.1.
The first thing to realize is that, given a population containing a fixed number of elements
(what is known among statistical geniuses as a finite population), a number of possible
samples could be drawn. Just to put it into perspective (and have you faint in the process),
a total of 1,099,511,600,000 different samples could potentially be drawn from the population
DOES SAMPLING HAVE A PURPOSE OTHER THAN PROVIDING EMPLOYMENT? 13
shown in Figure 2.1 (which only has 40 elements in the first place!). In general, if there are N
elements in the population and the size of the sample is not fixed, one has a choice between
2N possible samples (including taking no sample at all and conducting a census of the popula-
tion). Of course, the choice of possible samples is drastically reduced if a certain sample size
is pre-specified; that is, if the number of elements to be included in the sample is fixed before-
hand (not an easy matter, either, as will be further discussed below). Going back to Figure 2.1,
and assuming that we wish a sample of size 10 (such as sample C), our options are reduced to
‘only’ 847,660,530 possible samples! In general, given a desired sample size n (where n ≤ N),
N!
n!( N − n)!
then different samples could be drawn from a population with N elements. Thus,
whichever way you look at it, one has a lot of choices when it comes to sampling.
SAMPLE SELECTION
Given the above, the key question becomes how to select the n sample elements (i.e., members
of the sample) so that they are representative of the N population elements (i.e., members
of the population). Unfortunately, the answer to this is just as difficult as selecting a ‘good’
husband/wife/partner (in other words, there is no easy answer!). However, one important con-
sideration is the effect that excluded population elements are likely to have on the quality of the
sample. Sampling, by definition, means that certain population elements will be excluded from
the sample. This exclusion causes what is known as sampling error: the difference between
a result based on a sample and that which would have been obtained if the entire population
was studied (i.e., the ‘true’ value). Sampling error is generated whenever a sample is drawn by
whatever sampling procedure and is a function of sample size (i.e., as the sample size increases,
sampling error decreases).
To get a better feel of the concept of sampling error (while avoiding nasty technicalities),
imagine that the population in Figure 2.1 consists of 30 female and 10 male students enrolled
in an advanced entomology class at Mosquito State University, and you have just picked at
Another random document with
no related content on Scribd:
TOTUUS
IHMISEN MÄÄRÄ
IHMISEN MÄÄRÄ
PURRESSA
RUNOILIJAN LAULU
Li-Tai-Pe.
Li-Tai-Pe.
Thu-Fu.
(Bethgen mukaan.)
Cecco Angiolieri.
(Mukaelma.)
Heinrich Heine.
I.
Kruunupäinen pyövelille
puhuu: "Pappi messun alkoi,
pian päättyy vihkiminen —
pidä piilus valmihina."
Kauhun-kalvas, vapiseva
tuo on kaunis prinsessainen;
huolt' ei huimall' Olavilla,
suulla punaisell' on hymy.
Hymy suulla punaisella,
niin hän puhuu kuninkaalle:
"Hyvää huoment', appiukko,
katkot tänään kaulan multa.
II.
Nämä häät ovat ritari Olavin, nyt juo hän viime pikarin. Ja nyt
painaa vasten poveaan hän armastaan — oven takana
pyöveli seisoo.
III.
Georg Herwegh.
Charles Baudelaire.
Charles Baudelaire.
Charles Baudelaire.