Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Full download test bank at ebookmeta.

com

Taking the Fear Out of Data Analysis: Completely


Revised, Significantly Extended and Still Fun 2nd
Edition Adamantios Diamantopoulos
For dowload this book click LINK or Button below

https://ebookmeta.com/product/taking-the-fear-out-
of-data-analysis-completely-revised-significantly-
extended-and-still-fun-2nd-edition-adamantios-
diamantopoulos/
OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Loss Data Analysis : The Maximum Entropy Approach 2


Extended Edition Henryk Gzyl

https://ebookmeta.com/product/loss-data-analysis-the-maximum-
entropy-approach-2-extended-edition-henryk-gzyl/

Econometrics (6th Revised and Extended Edition) Badi H.


Baltagi

https://ebookmeta.com/product/econometrics-6th-revised-and-
extended-edition-badi-h-baltagi/

The Fun of Dying Find Out What Really Happens Next


Roberta Grimes

https://ebookmeta.com/product/the-fun-of-dying-find-out-what-
really-happens-next-roberta-grimes/

A Grammatical Overview of Lingála Revised and Extended


Edition LINCOM Studies in African Linguistics Michael
Meeuwis

https://ebookmeta.com/product/a-grammatical-overview-of-lingala-
revised-and-extended-edition-lincom-studies-in-african-
linguistics-michael-meeuwis/
Primary Mathematics 3A Hoerst

https://ebookmeta.com/product/primary-mathematics-3a-hoerst/

Environmental Data Analysis: Methods and Applications,


2nd 2nd Edition Zhihua Zhang

https://ebookmeta.com/product/environmental-data-analysis-
methods-and-applications-2nd-2nd-edition-zhihua-zhang/

The Soul s Logical Life Towards a Rigorous Notion of


Psychology Fifth edition revised and extended by an
index Wolfgang Giegerich

https://ebookmeta.com/product/the-soul-s-logical-life-towards-a-
rigorous-notion-of-psychology-fifth-edition-revised-and-extended-
by-an-index-wolfgang-giegerich/

Beginning Data Science in R 4: Data Analysis,


Visualization, and Modelling for the Data Scientist 2nd
Edition Thomas Mailund

https://ebookmeta.com/product/beginning-data-science-in-r-4-data-
analysis-visualization-and-modelling-for-the-data-scientist-2nd-
edition-thomas-mailund/

Environmental Data Analysis Methods and Applications


2nd Edition Zhihua Zhang

https://ebookmeta.com/product/environmental-data-analysis-
methods-and-applications-2nd-edition-zhihua-zhang/
TAKING THE FEAR
OUT OF DATA
ANALYSIS
…to our families.

Seriousness is the only refuge of the


shallow. –Oscar Wilde
TAKING THE FEAR OUT
OF DATA ANALYSIS
COMPLETELY REVISED, SIGNIFICANTLY EXTENDED
AND STILL FUN

SECOND EDITION

ADAMANTIOS DIAMANTOPOULOS
Professor of International Marketing, Department of Marketing and
International Business, University of Vienna, Austria

BODO B. SCHLEGELMILCH
Professor of International Marketing Management,
Department of Marketing, WU Vienna, Austria

GEORGIOS HALKIAS
Associate Professor of Marketing and Behavioral Research,
Department of Marketing, Copenhagen Business School, Denmark

Cheltenham, UK • Northampton, MA, USA


© Adamantios Diamantopoulos, Bodo B. Schlegelmilch and
Georgios Halkias 2023

All rights reserved. No part of this publication may be reproduced,


stored in a retrieval system or transmitted in any form or by any
means, electronic, mechanical or photocopying, recording, or
otherwise without the prior permission of the publisher.

Published by
Edward Elgar Publishing Limited
The Lypiatts
15 Lansdown Road
Cheltenham
Glos GL50 2JA
UK

Edward Elgar Publishing, Inc.


William Pratt House
9 Dewey Court
Northampton
Massachusetts 01060
USA

A catalogue record for this book


is available from the British Library

Library of Congress Control Number: 2022950118

This book is available electronically in the


Business subject collection
http://dx.doi.org/10.4337/9781803929842

ISBN 978 1 80392 983 5 (cased)


ISBN 978 1 80392 984 2 (eBook)
ISBN 978 1 80392 985 9 (paperback)

EEP BoX
C O N T E N T S I N BR I E F

Full contentsvii
To the readerxii
Instead of a prefacexiii
About the authorsxv
Pre-publication reviews from around the worldxviii
Introduction to Taking the Fear out of Data Analysis xxi

PART I UNDERSTANDING DATA

1 What is data (and can you do it in your sleep)? 2

2 Does sampling have a purpose other than providing employment


for statisticians? 11

3 Why should you be concerned about different types of measurement?22

PART II PREPARING DATA FOR ANALYSIS

4 Have you cleaned your data and found the mistakes you made? 42

5 Why do you need to know your objective before you fail to achieve it? 59

PART III CARRYING OUT THE ANALYSIS

6 Why not take it easy initially and describe your data? 71

7 Can you use few numbers in place of many to summarize your data? 90

8 What about using estimation to see what the population looks like? 120

9 How about sitting back and hypothesizing?135

10 Simple things first: One variable, one sample 163

11 Getting experienced: Making comparisons 185

12 Getting adventurous: Searching for relationships216

13 Getting hooked: A look into multivariate analysis 239

14 Getting obsessed: A further look into multivariate analysis  284

15 It’s all over … or is it? 302

Index310
F UL L C O N TE N T S

To the readerxii
Instead of a prefacexiii
About the authorsxv
Pre-publication reviews from around the worldxviii
Introduction to Taking the Fear out of Data Analysis xxi

PART I UNDERSTANDING DATA

1 What is data (and can you do it in your sleep)? 2


The nature of data 2
Types of data 5
Data and information 9
Summary10
Questions and problems 10
Further reading 10

2 Does sampling have a purpose other than providing employment for


statisticians? 11
The nature of sampling 11
Sample selection 13
Sample size determination 17
The sampling process 20
Summary20
Questions and problems 21
Further reading 21

3 Why should you be concerned about different types of measurement? 22


The nature of measurement 22
Measurement scales 24
Scaling formats 32
Measurement error 33
Assessing validity and reliability 37
Summary39
Questions and problems 40
Further reading 40
viii TAKING THE FEAR OUT OF DATA ANALYSIS

PART II PREPARING DATA FOR ANALYSIS

4 Have you cleaned your data and found the mistakes you made? 42
The role of data cleaning 42
The role of data coding 47
Finding your mistakes 52
Transforming variables 54
Summary57
Questions and problems 58
Further reading 58

5 Why do you need to know your objective before you fail to achieve it? 59
The need for analysis objectives 59
Setting analysis objectives 60
The question of focus 61
Choosing the method of analysis 63
Summary68
Questions and problems 68
Further reading 69

PART III CARRYING OUT THE ANALYSIS

6 Why not take it easy initially and describe your data? 71


Purposes of data description 71
Frequency distributions 72
Grouped frequency distributions 75
Graphical representation of frequency distributions 80
Summary88
Questions and problems 88
Further reading 89

7 Can you use few numbers in place of many to summarize your data? 90
Characterizing frequency distributions 90
Measuring central location 93
The mode 94
The median 97
The mean 99
Measuring variability 103
The index of diversity 104
The range and interquartile range 105
The variance and standard deviation 106
FULL CONTENTS ix

Measuring skewness and kurtosis 110


Chebyshev’s theorem and the normal distribution 111
Summary119
Questions and problems 119
Further reading 119

8 What about using estimation to see what the population looks like? 120
The nature of estimation 120
Setting confidence intervals 123
Estimating the population proportion 125
Estimating the population mean 128
Estimating other population parameters 132
Summary133
Questions and problems 133
Further reading 134

9 How about sitting back and hypothesizing? 135


The nature and role of hypotheses 135
A general approach to hypothesis-testing 141
Step 1: Formulation of null and alternative hypotheses 141
Step 2: Specification of significance level 143
Step 3: Selection of an appropriate statistical test 145
Step 4: Identification of the probability distribution of the
test statistic and definition of the region of rejection 148
Step 5: Computation of the test statistic and rejection or
non-rejection of the null hypothesis 152
Hypothesis-testing and confidence intervals 154
Statistical and substantive significance 155
Statistical power revisited 158
Beyond statistical significance: read at own risk 159
Summary161
Questions and problems 162
Further reading 162

10 Simple things first: One variable, one sample 163


Single-sample hypotheses 163
Assessing fit 164
The one-sample chi-square (χ2) test 165
The one-sample Kolmogorov–Smirnov (K–S) test 168
x TAKING THE FEAR OUT OF DATA ANALYSIS

Testing for location 171


The one-sample sign test 172
The one-sample t-test174
Testing for variability 177
Testing for proportions 178
Testing for randomness 181
Summary183
Questions and problems  184
Further reading 184

11 Getting experienced: Making comparisons 185


The pleasure of comparing 185
Independent measures: comparing groups 186
The two-sample chi-square (χ ) test
2
188
The k-sample chi-square test 193
The Mann–Whitney U test 194
The Kruskal–Wallis (K–W) one-way analysis of variance (ANOVA) 196
The two-sample t-test198
One-way analysis of variance (ANOVA) 201
Related measures: comparing variables 205
The McNemar test 207
The Cochran Q test 208
The paired-samples sign test 208
Friedman’s analysis of variance (ANOVA) 209
Repeated-measures ANOVA 211
Summary214
Questions and problems 214
Further reading 215

12 Getting adventurous: Searching for relationships 216


The mystique of relationships 216
Measures of association 217
Cramer’s V218
Spearman’s rank-order correlation 220
Pearson’s product moment correlation 221
Simple linear regression 227
Logistic regression 232
Correlation and causality 237
FULL CONTENTS xi

Summary237
Questions and problems 238
Further reading 238

13 Getting hooked: A look into multivariate analysis 239


The nature of multivariate analysis 239
Types of multivariate techniques 242
Dependence methods I: making (more complex) comparisons 245
Analysis of covariance (ANCOVA) 245
Factorial ANOVA 249
Multivariate analysis of variance (MANOVA) 257
Dependence methods II: investigating (more complex) relationships 262
Partial and semi-partial correlation analysis 262
Multiple linear regression analysis 264
Multiple logistic regression analysis 276
Canonical correlation analysis 280
Summary282
Questions and problems 282
Further reading 283

14 Getting obsessed: A further look into multivariate analysis  284


Interdependence methods: identifying structures in variables 284
Factor analysis  284
Principal component analysis (PCA) 292
Interdependence methods: identifying structures in objects 294
Cluster analysis 294
Summary300
Questions and problems 301
Further reading 301

15 It’s all over … or is it? 302


The written research report 302
The oral presentation 305
Summary309
Questions and problems 309
Further reading 309

Index310
TO THE READER
Effective data analysis requires the effective use of statistics. Unfortunately, statistics is boring.
It is boring to learn, it is boring to teach, and it is usually boring people who actually like sta-
tistics. Indeed, the comment that a statistician is ‘a person who didn’t have enough charisma
to be a cost accountant’ says it all!
Statistics is also hard. It is hard to learn, it is hard to teach (properly), and it is even harder to
remember what little you may have learned. In short, statistics is rarely fun. But it can be – as
you will soon find out. Trust us.
I N S T E A D O F A PR E F A C E
The first edition of this text goes back to 1997. Cars had just been invented, and computers
were still coal fired – at least from the perspective of our new generation of students. There was
no Facebook, no Amazon, and no iPhone. And, no, you could not watch videos on your mobile
phone or surf the Internet on it. Times were clearly (very) different. Most people learning statis-
tics were still blissfully unaware of some powerful techniques now described in this book, such
as the much-feared multivariate analyses. It was also an innocent time! Questionnaires could
still request the respondents’ ‘sex’ in a dichotomous (male/female) fashion, and researchers did
not even think about being politically incorrect for omitting dozens of other gender options.
However, as unlikely as it sounds, even in these innocent times, the two original authors of this
text, Adamantios Diamantopoulos and Bodo Schlegelmilch, in the profession widely referred
to as the terrible twins (or more unkindly as the ‘gruesome twosome’), still managed to get
into trouble.
Upon seeing a couple of draft chapters of the first edition of this book, reviewers took
exception to several of our jokes, found some of our examples politically incorrect, and
queried the wisdom of not being ‘sufficiently serious’. Having said that, one reviewer openly
admitted that he/she might be getting to be an ‘old sourpuss’(!), with which we politely agreed.
Notwithstanding these obstacles, through a magic mix of constant encouragement and occa-
sional threats of physical violence from the commissioning editor, we somehow managed to
finish the book and get it published! Since then, astonishingly, the original version has been
reprinted no fewer than six times, and we received numerous pieces of fan mail (well, we can
at least recall two positive ones in the late 90s!).
So why a new, completely revised, and significantly extended version of the book? First,
statistical software has progressed enormously and is much easier to handle; we now include
up-to-date applications to illustrate the various techniques discussed. Second, partly as
a result of the ‘software revolution’, sophisticated analytical techniques have become more
accessible; we now include a discussion of the most important ones. And yes, we feel that we
have a responsibility to our readers here: the mere mention of terms like ‘non-hierarchical
clustering’ or ‘orthogonal rotation’ will substantially improve your chances on the job market
and the dating scene! Finally, we now live in different times, so we had to (reluctantly) sanitize
some non-PC jokes.
All these goodies – up-to-date software applications, new statistical tests, and (somewhat!)
sanitized jokes – were largely made possible by taking a new and substantially younger
co-author on board, Georgios Halkias. He is the one you should blame for any comments
that may be (wrongly!) construed to be still not 100% PC (despite our best efforts). Any other
mistakes or omissions you may find are clearly and squarely the responsibility of our won-
derful support team, namely Martina Roth, Doris Lehdorfer, and Thomas Winter from the
University of Vienna, as well as Hanife Özdemir, Erin Silangil, Sarina Mansour Fallah, Sanem
Öztürk, Stephie Habicher, and Reiko Domai from WU Vienna – thank you all for doing a fan-
tastic job! While the attribution of blame to others may contradict the spirit of this preface,
there is an important reason for it. The two senior authors are striving for a second career in
the clergy after retirement and, hence, we cannot possibly be responsible for anything bad!
xiv TAKING THE FEAR OUT OF DATA ANALYSIS

And speaking of our future careers, if anyone knows of a mixed-gender monastery with
basic amenities such as an infinity pool, a nice sea view, and 24/7 room service, where we do
not have to get up in the middle of the night to pray, please let us know (and you’ll get a gen-
erous 2% discount off the price of this book!).
We hope you enjoy reading the book as much as we enjoyed writing it!

Adamantios Diamantopoulos
Bodo B. Schlegelmilch
Georgios Halkias

DISCLAIMER
All examples, numerical figures, data, names, characters, places, products, and incidents men-
tioned in this book are fictitious and only reflect the authors’ imagination. Any resemblance
to actual people, elements, and events is coincidental. No such identification is intended or
should anyhow be inferred.
A B O UT T HE A U TH O R S

Adamantios Diamantopoulos (BA, MSc, PhD, DLitt) is Chaired Professor of International


Marketing and Head of the Marketing and International Business Department at the
University of Vienna, Austria. He is also Visiting Professor at the University of Ljubljana,
Slovenia, and Senior Fellow at the Dr. Theo and Friedl Schoeller Research Center for Business
and Society, Germany. Previous full-time appointments include the Chair of Marketing and
Business Research at Loughborough University and the Chair of International Marketing at
the University of Wales, as well as positions at the University of Strathclyde and the University
of Edinburgh. He has held several visiting professorships in the USA and Europe, including
the Joseph A. Schumpeter Fellowship at Harvard University and the Nestlé Visiting Research
Professorship of Consumer Marketing at Lund University, Sweden. He has taught at various
university institutions in some 20 countries and has collaborated with several international
companies. He is an elected Fellow of the European Marketing Academy and the British
Academy of Management, and a recipient of the JIBS Silver Medal.
His main research interests are in international marketing and research methodology, and he
is the author of some 200 publications in these areas with over 40,000 citations. His work has
appeared, among others, in the Journal of Marketing Research, Journal of International Business
Studies, Journal of the Academy of Marketing Science, International Journal of Research in
Marketing, Journal of Service Research, Journal of International Marketing, Journal of Retailing,
MIS Quarterly, Organizational Research Methods, Psychological Methods, Information Systems
Research, British Journal of Management, and International Journal of Forecasting.
He has been the recipient of several best paper awards, including the Hans B. Thorelli Award
for an article published in the Journal of International Marketing that has made significant and
long-term contributions to international marketing theory or practice. He sits on the editorial
review boards of several academic journals, and acts as a referee for various professional asso-
ciations and funding bodies.
When not working, he likes skiing, riding big motorcycles (he has two of them), and playing
the drums (but not well enough to give up the day job). For some reason, his wife and son both
think he will never grow up.

Bodo B. Schlegelmilch heads the Institute for International Marketing Management at


WU Vienna University of Economics and Business and is Chair of the Association of
MBAs and Business Graduates Association (AMBA and BGA). For more than 10 years he
served as founding Dean of the WU Executive Academy.
Initially educated in Germany, he obtained two doctorates (a PhD in International Marketing
and a DLitt in Corporate Social Responsibility) from the University of Manchester (UK) and
an honorary PhD from Thammasat University (Thailand). Starting at Deutsche Bank and
Procter & Gamble in Germany, he continued his career at the University of Edinburgh and
the University of California, Berkeley. Appointments as British Rail Chair of Marketing at the
xvi TAKING THE FEAR OUT OF DATA ANALYSIS

University of Wales (UK) and Professor of International Business at Thunderbird School of


Global Management (USA) followed.
Bodo serves on several business school advisory boards in Europe and Asia. He holds/held vis-
iting appointments, for example, at the Universities of Minnesota (USA), Keio (Japan), Leeds
(UK), Sun Yat-sen (China), and Cologne (Germany), as well as the Indian School of Business
(India), and has taught in over 30 countries.
He has received numerous teaching and research awards, including the ‘Significant
Contributions to Global Marketing’ award of the American Marketing Association, as well
as fellowships from the Academy of International Business, the Academy of Marketing
Science, and the Chartered Institute of Marketing. His research interests span from inter-
national marketing strategy to corporate social responsibility. He has published more than
a dozen books in English, German, and Mandarin, and his work has appeared in journals
such as the Strategic Management Journal, Journal of International Business Studies, Journal
of the Academy of Marketing Science, and Journal of World Business. He was also the first
European editor-in-chief of the Journal of International Marketing, published by the American
Marketing Association.
When not researching, writing, or teaching, Bodo enjoys sailing, hanging out at nice beaches,
eating Japanese meals, and writing long and complicated autobiographical statements. His
wife knows he will never grow up.

Georgios Halkias is Associate Professor of Marketing at the Department of Marketing,


Copenhagen Business School (Denmark) and a Visiting Professor at the University of Vienna
(Austria). Prior to joining CBS, he was Associate Professor at the TUM School of Management,
Technical University of Munich (Germany), while in the past he has held resident and/or
visiting faculty positions at the University of York (UK), the University of Vienna and the
WU Vienna University of Economics and Business (Austria), the University of Ljubljana
(Slovenia), and the Athens University of Economics and Business (Greece). Georgios has also
gained industry experience with multinational firms such as Société Générale and Procter &
Gamble in Greece and the UK.
Georgios has received academic qualifications in three different countries. He holds a PhD
Habil. from the University of Vienna (Austria), a PhD from the Athens University of Economics
and Business (Greece), and an MSc from the University of Warwick, Warwick Business School
(UK).
His main research interests lie in the areas of consumer psychology, branding, and research
methods. His work has attracted several national/European Union funds and has been pub-
lished in leading academic journals, including the Journal of International Business Studies,
International Journal of Research in Marketing, British Journal of Management, Journal of
Advertising, Journal of Business Research, International Marketing Review, and International
Journal of Advertising. He has also made multiple contributions to academic books and inter-
national conferences.
ABOUT THE AUTHORS xvii

Georgios acts as a reviewer for several top-tier journals and sits on the editorial review Board
of the Journal of International Marketing (American Marketing Association). He has received
various international distinctions and awards, including the Outstanding Reviewer Award
2019 and the Outstanding Paper Award 2020 from the International Marketing Review in the
Emerald Literati Awards of Excellence. He has been repeatedly nominated for teaching awards
in courses on quantitative research methods, consumer behavior, and branding, and is the
recipient of the Best Teaching Award 2017 granted for outstanding teaching at the graduate
level (University of Vienna).
When not engaged in highly scientific activities, Georgios practices the guitar because he
wants to form a progressive metal band when he grows up. His wife desperately tries to make
him realize that this ain’t happening.
P R E- P U BL I C A TI O N R E V IEW S FROM AROU ND T H E
WORLD
‘Written with wry wit and incredible clarity, the authors provide the reader a detailed under-
standing of seminal issues in data analysis. A masterful work that truly does “take the fear out of
data analysis” – this book is a rare treat indeed.’
– David A. Griffith, Mays Business School, Texas A&M University, USA

‘Written by a proficient team of authors, Taking the Fear Out of Data Analysis is a fascinating
… ah, forget the marketing blurb. This is a great text, you should read it! And your family too,
provided that they want to learn about the basics of data analysis in the most entertaining way.
There is no doubt that you will devour this book in no time and learn a lot about statistics on
the way.’
– Marko Sarstedt, Ludwig-Maximilians-University (LMU), Germany

‘Awww … da da da da da da daayyyyy … boo boo boo brrrrrrr …’


– Penelope, 10-month-old daughter and Research Assistant of Georgios Halkias, Austria

‘[H]idden behind some bizarrely memorable examples and illustrations is a very fine introduc-
tion to data analysis. This book will be of value to those approaching quantitative analysis for
the first time, and should be on the reading list for project-based courses and for new research
students.’
– Richard Speed, La Trobe University, Australia

‘In the age of big data, at least a rudimentary understanding of data analysis is a must. Business
students need to know about data analytics, but they are often intimidated by statistics and thus
fail to appreciate the value of data-based and model-supported decision making. Even seasoned
researchers are sometimes uncomfortable with conducting statistical analyses of their data
because they lack the confidence to apply methods that are perceived as abstract and confusing.
In this entertaining book, the authors gently guide the reader through the steps of the data anal-
ysis process and they brilliantly succeed in explaining difficult topics in an engaging, witty, and
highly informative manner.’
– Hans Baumgartner, Smeal College of Business, Penn State University, USA

‘Statistics. I know – you hate it. It’s hard and confusing. Students of all levels find the topic
hard. I tell them to get this book. And no! They cannot borrow mine, I don’t want to lose it.
Diamantopoulos, Schlegelmilch and Halkias knock another one out of the park with this excel-
lent introduction to a great array of statistical issues. They start right at the beginning – which is
always a good place to start if you’re a beginner – and gently, often hilariously, and successfully
guide the reader through the various learning moments that need to be negotiated if one is to
become fearless in the face of columns of data. Priceless.’
– John Cadogan, School of Business and Economics, Loughborough University, UK

‘What happens when three applied researchers write a book on data analysis? Well, as
a minimum, you get a resource book written for the user, and not the statistician. In this
PRE-PUBLICATION REVIEWS FROM AROUND THE WORLD xix

revised edition of the popular book Taking the Fear out of Data Analysis, Diamantopoulos,
Schlegelmilch and Halkias provide a highly practical and helpful book for the applied social
scientist. Their writing (considerably more accessible than the spelling of their surnames) is plain,
straightforward and fun to read. In an era when aspiring researchers will remain a one-legged
scholar unless they master the foundational skills of data analysis and research methodology, this
book is considered an essential read and frequent reference.’
– S. Tamer Cavusgil, Georgia State University, USA

‘ Taking the Fear Out of Data Analysis is one of the few books to provide a comprehensive,
conceptually solid yet accessible overview of the theory and practice of data analysis. Building
on their own extensive research experience, the authors use a pragmatic, elegantly ironic and
competent style to “translate” the complex science of managing data and involve readers in an
enjoyable learning journey. All the concepts and techniques are presented in a manner that is
easy to read and understand and several (and mostly humorous) examples and illustrations have
been integrated throughout the text. Definitely a must-have for anybody who is interested in
discovering not only how to deal with data but also how pleasant it can be to learn it.’
– Alessandro De Nisco, UNINT, Rome, Italy

‘These guys never give me any tips and now they have threatened to buy their bento boxes else-
where if I don’t give them a review. So here it is: I like the book; it has a lot of words!’
– Sōta Tuna, Head Waiter, Harakiri Sushi Shop, Vienna, Austria

‘This book is a real page-turner! Why? Because the book strikes an exceptionally good balance
between fun (funny examples, witty remarks) and a sound introduction to statistics and data
analysis. Thus, the book encourages its readers in an entertaining way to delve into the statis-
tical concepts and methods covered. Despite its reassuring title, the book also does not conceal
the mysterious “dark side of empirical data analysis”, such as p-hacking or HARKing. But this,
of course, makes perfect sense since you can only defeat your fear if you know the danger. The
upshot is that this work definitely delivers on its promises.’
– Dirk Temme, Schumpeter School of Business and Economics, University of Wuppertal,
Germany

‘Read this book! It should not only be on the prescribed list of any student of the social sciences,
it should also be compulsory reading for journalists and the media.’
– Leyland Pitt, Simon Fraser University, Canada

‘It’s been tried and tested and we can now reject the null hypothesis that “this book is no different
to other data analysis books”. It is, and in ways that make it an easier read and an easier ride for
those starting out on their analysis journey. Like any great product, it lives up to its name and
really does help to take some of the fear out of analysing research data. Congratulations to the
authors for updating this now classic text.’
– Vince Mitchell, The University of Sydney Business School, Australia

‘The authors gave me a choice of either giving a testimonial or reading the book cover to cover
… you know the rest.’
– Anonymous PhD Student, Anonymous University, Anonymous Country
xx TAKING THE FEAR OUT OF DATA ANALYSIS

‘The new edition of this book provides excellent guidance to data knowledge and competence
using a problem-solving approach. With the digital becoming increasingly important, analytical
skills should be key competencies in everybody’s daily life. To achieve this goal, Taking the Fear
Out of Data Analysis is highly recommended.’
– Zhongming Wang, Zhejiang University, China

‘ Taking the Fear Out of Data Analysis is the best book for someone who has heard a lot of buzz
about data analysis but doesn’t have a firm grasp of the subject. The book is an eye-opening read
for anyone who wishes to learn about data analysis: understanding of data, preparing the data
for analysis and different analysis techniques. If I had to pick one book for an absolute newbie to
the field of data analysis, it would be this one.’
– Manish Gangwar, Associate Dean, Research and RCI Management and Executive Director,
ISB Institute of Data Science, Telangana, India

‘When I began my academic career as a Research Assistant, the first edition of this book enjoyed
a prominent position on the shelves in the office I shared with three other early career researchers.
We liked this book because it unpacked the complicated, procedure-heavy world of quantitative
methods in a user-friendly way. It poked fun at the subject matter in a manner that was so dis-
arming that even our office’s hard-core qualitative researcher loved it. The significantly extended
new edition is increasingly relevant as the world of quantitative methods has kept on expanding,
in part due to an explosion in software programs that scholars can use seemingly without much
understanding. Do not let the light-hearted nature of this book fool you. It is a statistics book
that carefully leads readers through all the necessary stages of analysis. It effortlessly explains
the analysis details and assumptions that PhD examiners, journal reviewers, and conference
presentation audience members insist on raising. This excellent new edition is destined to be very
well thumbed.’
– Matthew Robson, Cardiff Business School, UK

NOTE

Many testimonials were quite lengthy. While our tolerance for receiving praise tends to be
alarmingly high, we felt it would be in the best interests of our readers to shorten the testimo-
nials so that the part of the book talking about statistics would be slightly longer than the part
devoted to testimonials. An extended list can be found on the publisher’s website for this book.
I N T R O D U C T I O N T O TA KING T H E F EAR O U T O F
DATA ANALYSIS
The trouble with numbers is that they frighten a lot of people.
–Leslie W. Rodger, Statistics for Marketing

This book has been written for people who do not like data analysis, who do not want to
become analysts or statisticians, but who have to learn about data analysis for whatever reason
(e.g., to pass a course at college or university, complete a dissertation, get/keep a job, or to
impress a new date). It has also been written for people who think they cannot understand data
analysis and statistics, having had bad experiences with textbooks full of formulae and little
substance, and teachers full of confidence and no humor. In fact, this book has been written
for anybody suffering from the ‘I hate numbers’ syndrome – and we know there are plenty of
you out there.
What we tried to do in this book is quite simple: take your hand and lead you through the
entire data analysis process without boring you to death along the way. Our specific aims have
been threefold:

1. To provide a comprehensive but digestible introduction into the strange world of data
analysis, assuming no prior knowledge on your part.
2. To indicate the linkages among the various stages of the data analysis process and highlight
the implications of good/bad early decisions on subsequent ones.
3. To demonstrate that learning about data analysis can be an enjoyable experience; hope-
fully, after you have finished reading this book, you will feel that way too.

Our philosophy behind the content and structure of the book is based on a few basic premises.
First and foremost, our main concern is with understanding rather than memorizing. Thus,
we urge you to channel your learning effort towards grasping and digesting the various con-
cepts/techniques of data analysis rather than mechanically reproducing a bunch of formulae.
Moreover, at this introductory level, we feel that your attention should be directed towards key
issues and major building blocks rather than statistical refinements and details. Consistent with
this view, we keep the number of formulae down to an absolute minimum and do not bother
with providing mathematical proofs (the latter being a sure way of sending you to sleep!). We
also firmly believe that a point is best driven home by means of an example. Consequently, we
make liberal use of (mostly silly) examples and illustrations to show how the various concepts
and techniques can be applied. Lastly, we see no harm in making you smile from time to time
– a hefty dosage of humor is often the only way to keep one sane while learning/doing data
analysis!
You can use this book in a number of ways, depending upon your background and objec-
tives. For those of you with no prior experience of data analysis and little, if any, statistical
knowledge, we strongly recommend that you go through the book chapter by chapter. In this
way, you will be introduced to more complex material in a gradual manner and should not
feel lost at any time. On the other hand, those of you with some previous exposure to research
methods and/or statistics may opt for a more flexible approach. For example, you may wish to
xxii TAKING THE FEAR OUT OF DATA ANALYSIS

concentrate on Chapter 4 (dealing with data preparation and coding) onwards and refer back
to Chapters 1 to 3 (which are really background chapters) on an ‘as required’ basis. Finally, for
those of you who are particularly interested in specific types of analysis, you should primarily
focus on Chapters 7 to 14, which cover different analytical techniques (and presuppose famil-
iarity with basic data analysis principles).
There are also some key chapters that everyone should read. These include Chapter 5 (on
setting analysis objectives), Chapter 8 (on the nature of statistical estimation), Chapter 9 (on
the principles of hypothesis testing), and Chapter 15 (on evaluating and presenting the anal-
ysis). Moreover, all readers would do well to heed the numerous HINTS and WARNINGS
dispersed throughout the various chapters. These serve to emphasize key points, awareness of
which can prevent problems and make life easier for you.
The Further Reading section at the end of each chapter should also be consulted. Rather
than provide a long list of references, we have intentionally limited ourselves to a selection of
a few key sources which we feel best amplifies and complements the material covered in the
chapter. Each suggested reading has been briefly annotated, and there is no duplication of
sources in the Further Reading sections of the various chapters. Having said that, many of the
suggested readings may also be useful for issues discussed in the other chapters – so keep an
open mind and be flexible.
The book is organized into three main parts, containing a total of fifteen chapters.
Part I. Understanding Data, provides the necessary background by looking at the nature of
data, the sampling process, and the notion of measurement. These are essential building blocks
underpinning data analysis and a prerequisite for understanding the application of statistical
techniques.
Part II, Preparing Data for Analysis, focuses on the various tasks associated with converting
raw data into a form that can be analyzed and on setting objectives for the analysis. Careful
attention to the issues raised here will prevent a lot of problems at the actual analysis stage.
Part III. Carrying out the Analysis, examines the rationale behind different types of analysis
and introduces a wide variety of analytical methods appropriate for different circumstances.
Starting with simple approaches to describing and summarizing data, a number of techniques
are considered, which, if properly applied, will enable you to get the most out of your data.
A good taste of multivariate analysis is also provided in this edition, navigating the interested
reader through more complex analytical techniques. Finally, several issues are raised relating
to the evaluation and presentation of a data analysis project and the preparation of written and
oral research reports.
By the time you finish reading this book, you should be in a position to know what analysis
to apply, for which purpose, to what kind of data. The chapter-by-chapter overview that follows
should help clarify what all this means.
Chapter 1 lays the groundwork for the rest of the book by introducing you to the data
matrix, which is the raw material you will work with whenever you do data analysis. Here
you will get acquainted with units of analysis, variables, and values, which are the essential
ingredients of data. You will also learn about different types of data and about the distinction
between data and information. By the time you finish this chapter, you should be able to talk
about data as if you knew something about it!
INTRODUCTION xxiii

Chapter 2 looks at sampling to demonstrate how units of analysis may come about in
a particular project. You will understand the rationale for taking a sample rather than stud-
ying the whole population and the different sampling methods that you can use. You will
also encounter the concept of sampling error, which, no doubt, will haunt you forever after!
Finally, the numerous considerations for determining sample size will be piled upon you – if
only to confuse you even further!
Chapter 3 describes what measurement is, why it is important and how it can be done. You
will recognize the advantages and disadvantages of different measurement scales and get to
know a variety of scaling formats. Following this, you will encounter a second kind of error,
the notorious measurement error, which will also haunt you forever after (either with the
sampling error or on its own). By the end of this chapter, you should have a firm grasp of the
options available for measuring your variables and interpreting the resulting values, as well as
for assessing the validity and reliability of measurement.
Chapter 4 moves into the practicalities of preparing data for analysis, focusing on the process
of data editing. You will learn how to detect many errors you may make when processing data
and how to deal with ambiguous, inconsistent, and missing data. You will be shown how to
properly code your data, and, lastly, you will see how easy it is to perform variable transfor-
mations in order to take advantage of the full potential of your data set.
Chapter 5 emphasizes the need for having clear analysis objectives before embarking on the
actual analysis (otherwise, you will not know how to start or when to stop). These will ensure
that your analysis is relevant, comprehensive, and efficient. You will be introduced to different
analytical perspectives, namely description, estimation, and hypothesis-testing (which will
be dealt with in more detail in later chapters). The chapter will conclude with a discussion of
the factors governing the choice of the method of analysis (which will undoubtedly become
the subject of your worst nightmares for years to come!)
Chapter 6 focuses on data description, which is usually the first type of analysis that you will
want to do. You will get to know the various forms of frequency distributions and the steps
to take in grouping data. Your artistic talents will also be thoroughly stimulated by an exami-
nation of different types of graphical representations. At this stage, you will wonder what the
purpose of life would be without percentiles, true class limits, and gives!
Chapter 7 soldiers on with data description and introduces you to different summary meas-
ures that can be used to capture typical or average responses as well as the extent of variability
in your data. By using different summary measures in conjunction with one another, you will
be able to identify the shape of a frequency distribution and compare it to known forms. In this
context, the famous normal distribution will serve as a useful point of reference.
Chapter 8 deals with the process of estimation and shows how you can talk about the
population when you only have a sample. First, you will become familiar with the concept of
a sampling distribution and then learn how to set confidence intervals for different popula-
tion parameters that you may wish to estimate. Being able to make inferences from a sample
to a population will surely set you apart from the uninitiated punter!
Chapter 9 ventures into the mystical world of hypothesis-testing and provides you with an
understanding of the basic principles associated with developing and testing specific research
propositions. You will learn about different types of hypotheses and the rationale behind sig-
xxiv TAKING THE FEAR OUT OF DATA ANALYSIS

nificance testing, which, like estimation, also enables you to make inferences from a sample to
a population. Concepts like ‘null hypothesis’, ‘p-values’, and ‘regions of rejection’ will become
second nature to you and a topic of conversation at every possible opportunity! Finally, in
this chapter, you will be introduced to the concepts of effect size and statistical power and be
alerted toward issues that go beyond statistical significance.
Chapter 10 deals with the simplest type of hypotheses, namely those involving a single
variable and a single sample. Here you will be shown how to examine the fit of your frequency
distributions against prior expectations or against a theoretical distribution. In addition, you
will be able to determine whether your sample is likely to have been drawn from a population
with known parameter values, including tests for central location, proportions, and variabil-
ity, as well as whether your sample is, in fact, random. By the time you complete the chapter,
you should feel confident enough to face more complex hypotheses (involving more than one
sample or more than one variable).
Chapter 11 extends your journey into hypothesis-testing by concentrating on comparisons.
First, you will learn how to compare two or more groups on the same variable – what is known
as an independent measures (or samples) comparison. Next, you will address the issue of
comparing two or more responses from the same group, involving a related measures (or
samples) comparison. In both cases, you will be testing to see whether significant differences
exist, that is, whether any observed differences on your sample results are likely to reflect ‘true’
differences in the population. By becoming an expert on comparisons, you will be able to
impress your friends with profound statements of the sort: ‘Basketball players are, on average,
significantly taller than jockeys’ and ‘There is no significant difference between the propor-
tions of male and female construction workers with a passion for ornithology.’
Chapter 12 shows you how to investigate relationships between two variables. Here you
will be exposed to measures of association that can tell you whether two different variables
are related to one another and which can be subjected to significance tests to see whether any
observed relationship (based on your sample data) is also likely to hold in the population. Once
you have established a significant relationship, your association measure will also enable you
to assess its strength and directionality (i.e., positive or negative). This chapter will also intro-
duce you to fundamental techniques that can not only help you identify a relationship between
two variables, but also allow you to make specific predictions about the expected value of an
outcome variable based on a given level of a predictor variable. The chapter concludes with an
important note on the distinction between correlation and causality – make sure you don’t
miss this!
In Chapter 13, we show you that there is much more to data analysis – multivariate analysis
procedures open up a whole lot of new opportunities for extracting information from your
data and answering more complex research questions. We start by making a basic distinction
between dependence and interdependence methods and proceed by giving you a good taste
of the techniques included in the former. Here you will learn a great deal about complex com-
parisons and relationships that involve multiple variables at the same time. Advanced data
analysis using multivariate procedures cannot be done justice in a single chapter – that is why
we added another!
INTRODUCTION xxv

Chapter 14 deals with the remaining group of techniques in the multivariate universe, i.e.,
the interdependence methods. In this chapter, you will become acquainted with the most
fundamental techniques of identifying structures in the data both in terms of variables and
the units of analysis involved. If you reach this point of the book, you should feel rather com-
fortable in dealing with a wide range of complex analytical techniques.
Chapter 15 is an oasis of sanity at the end of your data analysis odyssey. Unbelievable as
it may sound, in this chapter, you will not have to grasp new theoretical concepts, learn yet
another technique, or interpret more statistical results. Responding to your pleas for mercy,
the purpose of this final chapter is to make you sit back and think about the ‘Now what?’
question. Here we talk you through how to present your analysis to your audience(s): Is it too
technical? Are the practical implications clear? Is the presentation attractive? Unless the pres-
entation of your analysis (written and/or oral) is effective for the specific audience involved, all
your efforts will have been in vain. Not only must you do the right thing and do it right: you
must convince others that you have done so.

Have fun.
PART I
UNDERSTANDING DATA
1
What is data (and can you do it in
your sleep)?

THE NATURE OF DATA

In today’s digital economy, it is virtually impossible not to produce data. Automatic data
capture through, for example, digital TVs, refrigerators or washing machines that read RFID
tags embedded in your T-shirt or yogurt cup, traffic cameras (you should always look your
best in public), and temperature sensors (China and some other countries want to ensure that
new arrivals are healthy) constantly produce data. Tracking their shoppers’ purchasing habits,
supermarkets often learn earlier about their female customers’ pregnancies than the respective
fathers (of course, some fathers never learn about their fatherhood – but that is a different
story). When driving a car, automatic measurement devices can let insurance companies
know how safe or unsafe your driving style is (perhaps this explains why your insurance
premium quadrupled last year). Taking your dog for a walk, holistic dog food bakeries and
other stores can detect you via location-based services as potential customers and send you
irresistible coupons on your smartphone, enticing you to visit. When you are surfing the web,
cookies trace your behavior or misbehavior (couples beware when visiting dating sites). Even
when you are sleeping, wearable devices can let your doctor, pharmacist, or manufacturers of
anti-snoring aids know whether you urgently need their products or services.
Although we are swamped with data, sometimes market researchers still need more or
different data. Most of us have already been accosted in the street by extremely ‘friendly’ (and
sometimes obnoxious) characters who conduct personal interviews and ask to ‘please answer
a few questions’ regarding our opinions on certain stores (e.g., ‘Should El-Cheapo supermar-
ket also offer Japanese bubble tea?’), product preferences (e.g., ‘Do you prefer Superclean
over Ultrasteril washing powder?’), voting intentions (e.g., ‘If a general election were called
tomorrow, would you vote for (a) The Conversation Party, (b) The Favor Party, or (c) The
Anti-Everything Party?’), or opinions regarding the European Union (e.g., ‘Do you feel that
Greek producers of eucalyptus-flavored dog biscuits have benefited from EU membership?’).
To top it all off, even the privacy of our own home is not enough to prevent the hungry
information-seekers from reaching us. How many times have we had to miss a crucial part
of our favorite Netflix series in order to answer the phone, only to find out that the caller is
Belinda Pain, from Persistence Research, Inc., who was wondering whether you would partic-
ipate in a survey regarding time-share holidays in Tirana?
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 3

Market researchers also like to make use of web-based survey instruments. Having just
been to a hotel or restaurant or completed your weekly visit to a spiritual healer, you receive
an email requesting you to fill in a questionnaire on various aspects of your experience.
Countless university-based researchers have also discovered crowdsourcing systems (like
Amazon’s Mechanical Turk), where herds of volunteers complete questionnaires against
micro-payments. Finally, very often, we have to provide details about ourselves (e.g., our age,
income, place of residence) when applying for a driving license, passport, or bank account, or
when filing a tax return (very painful) or booking a flight or holiday online.
While the approaches used to obtain data vary in each instance, the objective is always to
learn about individuals with regard to certain characteristics of interest. Now, in statistical
jargon (yes, you do have to learn some of it), the individuals are called units of analysis (or
sometimes ‘observations’, ‘cases’, or ‘subjects’), the characteristics studied are termed var-
iables, and the responses linking the individuals to the characteristics are known as values.
Together, units of analysis, variables, and values make up what we call ‘data’. Thus when
we refer to data, we implicitly address three distinct issues, notably (a) the respondents (as
indicated by the units of analysis), (b) the topic of interest (as described by the set of variables,
and (c) the responses of the latter in relation to the topic of interest (as reflected in the values
of the variables). Variables can assume different values for different units of analysis; if this is
not the case (i.e., when all units of analysis have the same value), then we are not dealing with
a variable but with (surprise, surprise) a constant.
The examples at the beginning of this chapter will have made you realize that units of anal-
ysis do not have to be individuals (although in a great deal of social research, they are). They
can be ordinary objects (e.g., nasal-hair removers, vodka brands, or horsewhips), time-periods
(e.g., months, years, decades, centuries), events (e.g., strikes, accidents, shareholder meetings,
visits to the dental hygienist), or other entities (e.g., firms, cities, nations, zoos). Neither do
variables have to refer to human properties; they can be product features (e.g., speed, dura-
bility, color), organizational dimensions (e.g., centralization, formalization, span of control),
or national characteristics (e.g., inflation rates, government spending, interest rates) – in fact,
anything that can be used to characterize the particular unit of analysis.
When we have a number of units of analysis (e.g., 200 first-year parapsychology students),
a number of variables (e.g., age, gender, parents’ income, preferred method for reaching out
to spirits), and a set of values linking the units of analysis to the variables, the result is a data
set. This can be best visualized as a matrix, the rows of which represent the units of analysis,
the columns the variables, and the matrix cells the relevant values; for our example, the idea is
shown in Table 1.1. By the way, if you are afraid of numbers, you can give names to our tables.
For example, Table 1.1 could become ‘Sissi’ or ‘Rudoph’.
If there are n units of analysis (respondents, objects, events, etc.) and m variables, the data
matrix looks like Table 1.2, where Rij is the response that unit i gives to variable j (in other
words, Rij is the value for unit i on variable j). The subscripts i and j are simply used for count-
ing purposes; in other words, i = 1,2, …, n and j = 1,2, …, m. Obviously, depending upon how
many units of analysis and variables we have, n and m will vary; for example, in Table 1.1, n =
200 and m = 4.
4 TAKING THE FEAR OUT OF DATA ANALYSIS

Table 1.1 An example of a data matrix


Units of analysis Variables
Age Gender Parents’ income Preferred method
Student 1 17 years Male €56,000 Chanting
Student 2 18 years Non-binary €92,000 A séance
Student 3 20 years Female €85,500 Ouija board
• • • • •
• • • • •
• • • • •
Student 200 19 years Female €77,000 Live chicken

Table 1.2 General form of a data matrix


Units of analysis Variables
V1 V2 V3 • • • • Vj • • • • Vm
01 R11 R12 R13 • • • • R1j • • • • R1m
02 R21 R22 R23 • • • • R2j • • • • R2m
03 R31 R32 R33 • • • • R3j • • • • R3m
• • • • • •
• • • • • •
• • • • • •
0i Ri1 Ri2 Ri3 • • • • Rij • • • • Rim
• • • • • •
• • • • • •
• • • • • •
0n Rn1 Rn2 Rn3 • • • • Rnj • • • • Rnm

The main benefit of arranging data in matrix form is that the threefold nature of data (i.e.,
units of analysis, variables, and values) becomes immediately visible; in fact, most data sets
can be represented in the form of Table 1.2. There are some exceptions (e.g., multivariate
time-series data) but these need not concern us at this point.
The data matrix is the starting point for analysis, and its structure determines the kind of
analysis that can be legitimately carried out. Specifically, the number of rows, n, indicates how
many units are being studied; that is, the sample or population size (we will have much more
to say about samples and populations in Chapter 2). The number of columns, m, on the other
hand, indicates how many variables are used to characterize the units of analysis; depending
upon the number of variables, k (k ≤ m), that are simultaneously manipulated by applying
statistical techniques, we talk about univariate (k = 1), bivariate (k = 2), and multivariate (k >
2) data analysis, respectively (we shall return to this issue in Chapter 5). Finally, the natures
of the values, Rij, reflect the level of measurement of the variables and thus indicate what can
and cannot be said about the units of analysis (measurement will be dealt with in some detail
in Chapter 3, so don’t worry if you feel totally confused at the moment!).
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 5

Table 1.3 Single- and multi-response questions


If you only had enough money to subscribe to just one of the following publications, which one would you
choose?
If money were not a problem, which of the following publications would you subscribe to?
(Please tick)
Wall Street Journal []
The Times []
Journal of the Mathematically Insane []
Mongolian Economic Review []
Unspeakable Acts []

WARNING 1.1 Questions and variables may not be the same: beware of
multi-response questions.

Now, here’s something you must always watch out for when you are dealing with a data set
that is based on questioning respondents: a question and a variable are not necessarily the
same thing. Very often, answering what appears to be a single question in a questionnaire
may, in fact, require multiple responses and will thus result in a number of variables. Table 1.3
illustrates this point: irrespective of whether Question A or Question В is asked, the response
alternatives are identical. However, the nature and number of the resulting variables are not.
If Question A is asked, then only one option needs to be ticked to answer it. Consequently, to
capture all possible responses, a single variable would be sufficient; this would be called some-
thing like ‘most preferred publication’ and would take five values, one for each publication
involved (e.g., 1 = Wall Street Journal, 2 = The Times, …, 5 = Unspeakable Acts). If Question
В is asked instead, a single variable is not sufficient to capture all possible responses, because
a respondent may wish to subscribe to more than one publication (e.g., The Wall Street Journal
and Unspeakable Acts) and thus legitimately tick multiple options. In this case, to ensure that
all possible responses are captured, five variables would be needed (i.e., one for each publi-
cation). The reason for this is that Question В is not really a single question but a series of
questions of the form: ‘Would you subscribe to the Wall Street Journal?’, ‘Would you subscribe
to The Times?’, and so on. The response to each of these questions would be of the ‘yes/no’
variety, resulting in five variables, each taking two possible values (e.g., 1 = would subscribe, 2
= would not subscribe).

HINT 1.1 If a question does not specify to ‘tick one option only’, chances are it is
a multi-response question and cannot be represented by a single variable.

TYPES OF DATA

Let us now move on to the different types of data that one may come across in a data analysis
project. While there are many data classification schemes, at this stage we shall briefly look
6 TAKING THE FEAR OUT OF DATA ANALYSIS

at different types of data according to (a) their meaning, (b) their source, and (c) their time
dimension; in Chapter 3, a fourth classification of data will be introduced based upon their
measurement properties.
Focusing initially on different kinds of data according to their meaning, there are data that
refer to facts; that is, characteristics or situations that exist or have existed in the past. Things
such as age, gender, income, church membership, 1973 sales of Wartburg automobiles, and
number of drunken Taiwanese visitors at last year’s Oktoberfest in Munich are all examples
of facts. Descriptions of individuals’ present behavior (e.g., current shopping habits or using
smartphones during sex) and past behavior (e.g., historical voting patterns in Azerbaijan) also
fall into this category.
Secondly, there are data that refer to awareness or knowledge of some object or phenom-
enon. Typical examples here are brand awareness (e.g., ‘Which of the following toothpaste
brands do you recognize: (a) Draculadent, (b) Vampirmed, (c) Ghoulshine?’), knowledge of
important events (e.g., ‘When did the German chancellor first yodel in public?’), and mastery
of a certain subject or topic (e.g., ‘Who wrote Das Kapital? (a) Woody Allen, (b) John Grisham,
(c) Stephen King, (d) Karl Marx, or (e) Karl Marx and Woody Allen together?’).
Thirdly, there are data representing intentions that are acts that people have in mind to do
(i.e., their anticipated or planned behavior). Such intentions can relate to future purchasing
behavior (e.g., ‘Having tried your free sample of Explosion laxative, do you intend to buy it in
the future?’), social behavior (e.g., ‘Mrs. Corleone, you will be sorry to hear that your son has
failed all his final exams. Do you intend to (a) let him get away with it, (b) cut his pocket-money
by 96%, or (c) confiscate his Ferrari?’), and personal behavior (e.g., ‘Do you seek to enter the
United States to engage in (a) export control violations, (b) subversive or terrorist activities, (c)
any other unlawful activity, or (d) golf with the president?’).
Fourthly, there are attitudes and opinions data, which indicate people’s views, preferences,
inclinations, or feelings toward some object or phenomenon. Examples of attitude/opinions
data are product or service evaluations (e.g., ‘Do you think that reporting by the New York
Times (a) is brilliant, (b) is fake news, or (c) does not adequately reflect the views of the lesbian,
gay, bisexual, and transgender community?’), political beliefs (e.g., ‘In your view, does the
Favor or the Conversation Party have the better policy for providing employment opportuni-
ties for single teenage mothers in the Outer Hebrides?’), and views on social issues (e.g., ‘What
is the single most important problem facing humanity today: (a) poverty, (b) drugs, (c) global
warming, (d) the Welsh rugby team’s recent bad streak?’).
Lastly, there are data relating to the motives of individuals. Motives are internal forces (i.e.,
desires, wishes, needs, urges, impulses) that channel behavior in a particular way. Although
motivations may be complex and difficult to articulate, data of this kind are quite important
because they can tell us why people behave in the way they do. Examples include reasons
given for doing or not doing a certain thing (e.g., ‘I go to aerobics so that my fantastic body
becomes even more irresistible’ or ‘I don’t do any exercise whatsoever because I’m a lazy slob’),
explanations for preferring something over something else (e.g., ‘I prefer a wall between the US
and Canada to a wall between Mexico and the US’), and rationales for holding certain views
or opinions (e.g., ‘I think that memorizing complex statistical formulae is a total waste of time
because you can look them up in a book or use a computer to do the work instead’).
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 7

Table 1.4 Cross-sectional and longitudinal data sets


Variables Data set
A B C
February 2020 purchases of Glennfiddle scotch 15 12
February 2020 purchases of Johnny Stalker scotch 10 8
February 2020 purchases of Castledrain XXXX lager 5 7
March 2020 purchases of Glennfiddle scotch 18 14
March 2020 purchases of Johnny Stalker scotch 11 10
March 2020 purchases of Castledrain XXXX lager 8 6

Note: All purchases are measured in 1,000 liters!

Turning our attention to types of data according to their source, a broad distinction can be
drawn between primary and secondary data. Primary data are data collected with a specific
purpose in mind; that is, for a particular research project. The researcher usually gathers
such data via surveys (conducted face to face, by telephone, or through the web), experiments
(carried out in the laboratory or a ‘natural’ setting), or observation methods (using automatic
data capturing or humans to record observed behavior). In contrast, secondary data are
data that have not been gathered expressly for the immediate study at hand but for some
other purpose; such data, however, might be of relevance for a particular research project
(in other words, somebody else has done the work but you may be lucky enough to be able
to use it without getting your own hands dirty). A wealth of secondary data can be found in
published statistics (by government departments, trade associations, chambers of commerce,
and research foundations), annual reports (published by business firms as well as non-profit
organizations), and abstracting and index services (covering thousands of periodicals, aca-
demic journals, and newspapers). For those who can afford to pay for them (and beware,
because they don’t come cheap!), there are also syndicated services (providing regular detailed
information on a particular country, industry, or product group) and database services (allow-
ing fast access to digital information sources worldwide, or enabling ‘electronic’ transfer of
data sets from one location to another).
The final classification of data to be considered has to do with the time dimension and
distinguishes between data relating to a single point in time and data relating to a number
of time-periods. Data of the former type are known as cross-sectional data, while the latter
is commonly referred to as longitudinal data. From an analysis point of view, the distinction
between the two is quite important because it determines whether inferences regarding change
can be made. To illustrate this, consider for a moment the three data sets displayed in Table 1.4;
the units of analysis in all cases consist of 600 insomniacs living in Auchtermuchty, Scotland.
Data sets A and В are examples of cross-sectional data. Each provides a snapshot of the varia-
bles of interest at a particular point in time; in this instance, data set A informs us about whisky
and lager purchases in February 2020, while data set В does the same thing for March 2020.
One could compare purchases across product types within each data set and reach conclusions
regarding the most popular drink in a particular time-period.
8 TAKING THE FEAR OUT OF DATA ANALYSIS

Data set C, on the other hand, is an example of longitudinal data and involves repeated
measurements over time on the variables of interest; data set С informs us about whisky and
lager purchases in February and March 2020. As a result, one can compare not only purchases
across product types but also purchases of the same product over time; thus, in addition to
being able to draw the kind of conclusions data sets A and В enable, conclusions regarding
changes in the relative popularity of the three drinks are now possible.
Now, having just distinguished between cross-sectional and longitudinal data, we shall
immediately confuse you by suggesting that one can do longitudinal analysis by using
cross-sectional data! This is not as impossible as it first sounds. Take, for example, data sets
A and B and combine them; that is, imagine they are parts of the same study. Piecing the two
data sets together provides information on the same variables (here whisky and lager pur-
chases) of different (but comparable) units of analysis (here two lots of 600 Auchtermuchty
insomniacs) at different points in time (here February and March 2020, respectively). Data
of this kind are known as trend data and enable inferences to be drawn regarding changes
in aggregate behavior, attitudes, and so on. Election polls provide a good illustration in this
context: ‘… a poll commissioned by the Daily Polygraph, in which the voting intentions of
a nationally representative sample of 11,893 bird-watchers were obtained yesterday, shows the
Anti-Everything Party standing at 23%, the Conversation Party at 37%, and the Pro-Birds Party
at 40%. A similar poll, conducted two weeks ago by the Financial Crimes, had Anti-Everything
neck and neck with the Conversationists at 44.3% and 44.1%, respectively, with the Pro-Birds
trailing at an appalling 11.6%!’

WARNING 1.2 Unless you are certain that the same units of analysis
have been measured over time on the same variable, then
conclusions regarding change at the individual level are not
possible.

Trend data should be distinguished from true longitudinal data (such as those provided by
data set C), where the same units of analysis are studied at different points in time. The latter
is sometimes also referred to as panel data, and in addition to capturing aggregate changes
over time, they enable inferences to be drawn as to changes in individual behavior. A typical
example of this kind of data is provided by consumer panels, in which a number of individuals
or families (usually balanced on such variables as age, income, and geography) record their
purchases of a number of products at regular intervals (e.g., monthly). Their records are
subsequently used, among other things, to determine the degree of brand loyalty (i.e., the pro-
portion of those buying brand X in period 1 who also bought brand X in period 2) and degree
of brand switching (i.e., the proportion of those buying brand X in period 1 who bought some
other brand in period 2). As you have probably fallen asleep by now, have a look at WARNING
1.2 to wake you up.
In general, given a set of variables, four basic kinds of studies can be distinguished according
to (a) whether the variables concerned are measured once or repeatedly and (b) whether the
same or different units are studied in each case (Table 1.5).
As you might have astutely suspected, there are more types of data sets than those we have
discussed. The good news is that, for the most part, they represent different combinations or
WHAT IS DATA (AND CAN YOU DO IT IN YOUR SLEEP)? 9

Table 1.5 Basic types of studies


Units studied Points in time for observations
One Many
Same Cross-sectional study Panel study
Different Cross-sectional replication Trend study

variations of the basic types shown in Table 1.5. For example, the well-known experimental
design of the ‘before–after’ variety with a control group is essentially a combination of two
panels, one of which has been unlucky enough to be exposed to the experimental treatment
(e.g., subjects were forced to watch 43 TV adverts of Loch-Ness-Café Gold Bland) and one
of which has been spared the torture (i.e., subjects were left in peace). Similarly, an omnibus
panel (no, this is not the rear section of a London double-decker bus) is essentially a series of
cross-sectional studies, in which the same group of individuals (i.e., the panel members) is
measured on different variables at different points in time (e.g., at one time, panel members
may be asked to evaluate the aesthetic appeal of alternative packages for cat food and, at
another time, to indicate their attitudes toward a new type of heavy-duty flea spray). Often,
a sub-group of the total panel is selected for a particular purpose; for example, if one wanted
to study consumer reactions to a new type of push-up bra, only those panel members that are
female and over a certain age would (usually) be surveyed.

DATA AND INFORMATION

Before we move on to even more exciting stuff, we need to look briefly at the relationship
between data and information. In everyday language, the two are usually taken to be synony-
mous; however, one can distinguish between them in at least two senses.
Firstly, one can look at information as the product of data; that is, information as data
that has been digested and analyzed. In other words, information is the knowledge obtained
and conclusions arrived at after appropriate analytical techniques have been applied to the
data matrix in Table 1.2. Arguably, a mass of raw data as described at the beginning of this
chapter (i.e., unsummarized/unstructured) is of little informational value in itself. Imagine, for
example, that cookies traced the web-surfing habits of 20 million Russian consumers during
the last three months and you have access to this data. Even if you are Ms. Bigbrains in person,
you need to analyze these data in order to extract managerially useful information, such as on
which sites to place banner ads for your new brand of egg-timers that play the national anthem
when the eggs have been boiled to perfection.
A second distinction between data and information can be drawn on the basis of relevance;
under this view, information is data relevant for a particular decision. To illustrate this, take
again the example of deciding where to place banner ads for the new egg-timer. Assume that,
in addition to the data on web-surfing habits of Russian consumers, you are also given the
following: (a) price lists for banner ads of major websites, (b) a list of websites where your
competitors place their banner ads, and (c) the shoe size of your boss’s secretary. While most
of us would agree that (a), (b), and (c) all constitute some kind of data, virtually no one (sober,
10 TAKING THE FEAR OUT OF DATA ANALYSIS

at least) would consider (c) as relevant information. Of course, for a different decision, (c) may
be a perfectly legitimate informational input (e.g., if the objective is to order sports shoes for
the office rugby team).

SUMMARY
Let us take a deep breath and try to summarize what we have learned so far. We first
looked at the nature of data, distinguishing between units of analysis, variables, and val-
ues; we then put the three together to create a data matrix. Next, we warned against
confusing questions with variables and discussed different kinds of data according to their
meaning, source, and time dimension. Finally, we considered the link between data and
information. It is now time to make a cup of coffee and mentally prepare for the things to
come.

QUESTIONS AND PROBLEMS


1. What is the difference between a variable and a value?
2. Give two examples of different units of analysis and two examples of variables that
could be used to characterize them.
3. What determines the size of a data matrix (i.e., the number of rows and columns)?
4. Give three examples of multi-response questions. How many variables would you
need to capture the answers to them?
5. Distinguish between facts, awareness, intentions, opinions, and motives, and give an
example of each. Why is it important to distinguish between these types of data?
6. What would you consider to be the main advantages (viz. disadvantages) of primary
versus secondary data?
7. What type of data do you need in order to make inferences regarding change?
8. What is the key difference between trend data and panel data?
9. It has been argued that ‘students often wonder what statistics is and why they should
bother to study the subject’. What is your view on this?
10. What is your favorite dish? (We like to get to know our audience.)

FURTHER READING
Bryman, A. (2016). Social Research Methods, 5th edition. Oxford: Oxford University Press. An
excellent introduction to all aspects of the research process.
Creswell, J. W. & Creswell, J. D. (2017). Research Design: Qualitative, Quantitative and Mixed
Method Approaches. London: Sage Publications. All you need to know regarding thinking
about, designing, and implementing different kinds of research projects.
Sheehan, K. B. & Pittman, M. (2016). Amazon’s Mechanical Turk for Academics: The HIT
Handbook for Social Science Research. Irvine, CA: Melvin & Leigh. A comprehensive guide to
collecting data via Mechanical Turk.
2
Does sampling have a purpose
other than providing employment
for statisticians?

THE NATURE OF SAMPLING

Having taught you all we know about data sets, we shall ignore your cries for mercy and
proceed to do exactly the same with regard to sampling. Crudely speaking, a sample is a part
of something larger, called a population (or ‘universe’); the latter is the totality of entities in
which we have an interest – that is, the collection of individuals, objects, or events about which
we want to make inferences. For example, in Table 1.4 earlier, the population implied was ‘all
insomniacs living in Auchtermuchty, Scotland’. Other examples of populations include ‘all
countries on the planet Earth’, ‘all furniture stores in the UK’, ‘all strikes at Italian knitting
factories during 1975–79’, and ‘all vegetarian chiropodists based in Greater Manchester’. We
can now define a sample as a subset of a given population. Going back again to Table 1.4, the
sample consisted of 600 insomniacs living in Auchtermuchty, who were fortunate enough to
have their purchasing behavior studied. The Scottish Tourist Board assured us that the total
number of insomniacs living in Auchtermuchty is much higher! Possible samples for the other
population examples mentioned above are ‘member countries of NATO’, ‘28 furniture stores
located anywhere in Somerset and Devon’, ‘strikes at Italian knitting factories during the first
six months of 1977’, and ‘13 vegetarian chiropodists with city center practices in Manchester’
(note that these are merely possible samples that could be drawn from the above populations,
not necessarily good samples – an issue to be addressed later).
The rationale for sampling is really very simple: by checking out part of a whole, we can say
something about the whole. However, you may ask, ‘But why bother with a sample in the first
place? Why not study the population instead?’ Well, in some instances, the entire population is
indeed studied as, for example, with the population census, which is conducted every few years
by the government. However, in most cases, undertaking a census is not possible or desirable
for reasons summarized in Table 2.1.
Despite the many advantages of sampling over a census, it is important to keep in mind
that there is a price to pay, namely deciding whether the picture painted by the sample can
be generalized to the population of interest. To illustrate this problem, we have applied our
12 TAKING THE FEAR OUT OF DATA ANALYSIS

Table 2.1 Reasons for preferring a sample over a census


Reason Rationale Example
Cost A census is almost always more Interviewing all car owners in all European Union
expensive than taking a sample; member countries as to the color of windscreen
sometimes, the cost of a census is simply washer liquid they prefer would cost millions
prohibitive. of euros; the value of the information obtained
to a car accessories manufacturer would not
outweigh the cost of commissioning such a census.
Time A census may take more time to conduct The time needed to complete a census of all
and analyze than is available for making ski-enthusiasts in North America regarding their
the decision involved. In other words, ski resort preferences would be unacceptably
doing a census may simply take too long. long if the research related to updating a Snob
Skiers Guide, which must hit the bookshops by
September (and it’s already May!).
Destruction/ A complete census may result in Torture-testing each and every condom coming off
contamination destruction or contamination of the the production line for durability, tensile strength,
of population entire population; ‘destructive testing’ and friction resistance would result in no products
members procedures must, by necessity, be limited reaching the shops (and a lot of pregnant people).
to a sample only.
Decision A minor decision would not normally Deciding where to place a pink peach bubble tea
importance justify the hassle of a census; a ‘quick dispenser in a tractor factory with 600 employees
and dirty’ sample may be all that is would not justify a census of ‘perceived optimal
needed. locations’; on the other hand, whether to move the
factory to a new location 50 miles away is an issue
for which a census of the workforce’s intentions to
stay with the firm may be called for.
Confidentiality A census is more likely to be noticed Test-marketing a new family-size package of
by interested parties than a sample beetroot-banana pickles by placing it in all major
study; with a census, the chances that supermarket chain outlets (e.g., Aladi, Strangeway,
competition will get wind of what’s Fresco, and Painsbury) is much more likely to
going on are much higher. invite competitive retaliation than if only a few
carefully selected stores are used for test-marketing
purposes.
Accuracy Unbelievable as it may sound, Sending out 75 hastily recruited part-time
a census may be less accurate than interviewers to manually record the views
a sample; the latter’s sampling error of the entire student body of Edinburgh
may be outweighed by the former’s University on the introduction of a mandatory
non-sampling error resulting in a greater ‘tartan dress code’, and using another 20
total error in the case of a census. semi-qualified, bored-to-death typists to input
the data to a computer is likely to result in many
non-sampling inaccuracies (e.g., interviewer
variability and bias and/or transcription and
typing mistakes).

magnificent artistic talents and creatively depicted the relationship between population and
sample in graphical form; the relevant masterpiece is shown in Figure 2.1.
The first thing to realize is that, given a population containing a fixed number of elements
(what is known among statistical geniuses as a finite population), a number of possible
samples could be drawn. Just to put it into perspective (and have you faint in the process),
a total of 1,099,511,600,000 different samples could potentially be drawn from the population
DOES SAMPLING HAVE A PURPOSE OTHER THAN PROVIDING EMPLOYMENT? 13

Figure 2.1 Population and sample

shown in Figure 2.1 (which only has 40 elements in the first place!). In general, if there are N
elements in the population and the size of the sample is not fixed, one has a choice between
2N possible samples (including taking no sample at all and conducting a census of the popula-
tion). Of course, the choice of possible samples is drastically reduced if a certain sample size
is pre-specified; that is, if the number of elements to be included in the sample is fixed before-
hand (not an easy matter, either, as will be further discussed below). Going back to Figure 2.1,
and assuming that we wish a sample of size 10 (such as sample C), our options are reduced to
‘only’ 847,660,530 possible samples! In general, given a desired sample size n (where n ≤ N),
N!
n!( N − n)!
then different samples could be drawn from a population with N elements. Thus,
whichever way you look at it, one has a lot of choices when it comes to sampling.

SAMPLE SELECTION

Given the above, the key question becomes how to select the n sample elements (i.e., members
of the sample) so that they are representative of the N population elements (i.e., members
of the population). Unfortunately, the answer to this is just as difficult as selecting a ‘good’
husband/wife/partner (in other words, there is no easy answer!). However, one important con-
sideration is the effect that excluded population elements are likely to have on the quality of the
sample. Sampling, by definition, means that certain population elements will be excluded from
the sample. This exclusion causes what is known as sampling error: the difference between
a result based on a sample and that which would have been obtained if the entire population
was studied (i.e., the ‘true’ value). Sampling error is generated whenever a sample is drawn by
whatever sampling procedure and is a function of sample size (i.e., as the sample size increases,
sampling error decreases).
To get a better feel of the concept of sampling error (while avoiding nasty technicalities),
imagine that the population in Figure 2.1 consists of 30 female and 10 male students enrolled
in an advanced entomology class at Mosquito State University, and you have just picked at
Another random document with
no related content on Scribd:
TOTUUS

Sinust' eivät tiedä ne ihmiset, jotka


joka päivä sun nimeesi vannovat suulla.

Sinä levität siipesi ristinpuulla


ja kaartelet taivaita, kotka.

IHMISEN MÄÄRÄ
IHMISEN MÄÄRÄ

Ei ole ihmisen hyvä — jos hän on heikko — tuntea, nähdä ja


elää kaikkea: kuilu on syvä, siellä on vaaniva peikko.

Inhimillisen luota on moni mennyt — inhimillisen tähden:


mieletön, harhaillut suota viekkaita virvoja nähden.

Yksi on — veljeni, tiedä — ihmisen määrä: kaikessa


ihminen olla, suon yli polkunsa viedä vuorille, hyljätä väärä.

Matkalla voi moni tällä


langeta, nousta.
Mutta ken itsensä hukkaa
sielunsa näännyttämällä,
hänen on mahdoton nousta.

Veljeni, määrä on loitto:


kuoleman luoksi!
Ihmisen ylpeys olkoon,
vaikkakin turha on voitto,
taistella turhan vuoksi.
Kauneus, untemme sisar on opas meillä: laulaen kuoleva
joutsen, lehden poimussa pisar, myös tomu kultainen teillä.

PURRESSA
RUNOILIJAN LAULU

Li-Tai-Pe.

Ma, kansan runoilija, lauloin lauluain,


ja kansan kuulla huilu ilakoi.
Äänt' armiasta kuunneltu ei lain.
Nyt jumalteni talon eessä huilu soi.

Ja parviss' elonhartaat, hullut jumalat


punaisen pilven päällä tanssivat.
Ja silloin hämmästyen kuuli kansa maan
ihmettä, lauluani lennossaan.
PURRESSA

Li-Tai-Pe.

(Hans Bethgen mukaan.)

Niin kepeästi pursi vettä viiltää väkevin aironvedoin.


Seuranamme on kauniit immet huiluin kultaisin.

He soittavat ja taas he kujeilevat.


Ja jalo viini kuohuu pikareissa,
ja mieliin syttyy riemu vallaton.

Kai odottavat mua kuolottomat


sinessä haikaralla ratsastaen —
jään, huoleton, mä purteen kepeään.

Ne palatsit ja linnat, jotka kerran


näilt' autioilta kukkuloilta nousi,
on sortuneet, on tyystin hävinneet.

Ylevän runoilijan suuret sanat


ikuiset ovat niinkuin muistomerkki,
mi välkkyvänä tähtiin kohoaa.
Kun haltioidun, tartun piirtimeeni,
luon laulun kohisevan, niinkuin myrskyn,
ja viisi pyhää vuorta vapisee.

On minuss' ylpeys ja ilo, — nauran


maan tämän kirjaville antimille:
ne kukanlehden lailla varisee.

Jos valtaa, kunniaa ja rikkautta


mä joskus tavoitan ja niihin kiinnyn,
Keltainen Virta pyörtää takaisin.
VAIMON VALITUS

Thu-Fu.

(Bethgen mukaan.)

Hän yksin istui laaksoss' autiossa, nuor' vaimo, jolle


kauneuden lahjan ol' ihmeellisen suonut sallimus.

Hän puhui: Vaikka kodist' ylhäisestä,


mä sentään kokenut oon kovan onnen
ja erämaasta etsin turvaa nyt.

On julma tuho käynyt yli maani,


sen hävittänyt: saivat surman veljet,
ah nuoruutensa loistoss' ihanat.

Ei heidän ruumiitansa suotu mulle,


jott' oisin voinut saattaa heidät hautaan —
nyt viha, raakuus aikaa hallitsee.

On olo epävarmaa, niinkuin liekki


on soihdun tuulessa. Mun jättänyt
on puolisoni ajan melskeisiin.
Ei voimaa, suuruutt' ollut sydämessään,
yks halu hällä vain on: joku vetää
muu nauravainen nainen povelleen.

Nyt hänen silmiänsä kiehtoo sulo


jo toisien. Oi että kuulis kerran
hän huokausta naisen hyljätyn!

Oon piikani mä lähettänyt kauas


mun korujeni timantteja myymään,
mut ruokomajaani mä itse jään.

On maja kehno. Korjata ma koen


sen seinäin hataruutta köynnöksillä,
on ilma kylmä, vaatteet ohuet.

Tuo piiat kukkasia — viekää pois ne.


Ei kukist' ole. Merkiks surustani
te tuokaa mulle oksa sypressin.

Jo ilta saa. Mä tahdon suojapaikan


tuolt' alta suurten bamburuokoin löytää
yön leposijaks kylmän, aution.
JOS OISIN TULI

Cecco Angiolieri.

(Mukaelma.)

Jos oisin tuli, palais maailmanne.


Jos oisin tuulispää, sen hajoittaisin.
Jos oisin meri, maan ma hukuttaisin.
Jos oisin Jumala: — voi kauhuanne!

Jos oisin paavi: miten sielujanne,


te roistot, kristityt, mä kiduttaisin!
Jos oisin kuningas, niin ripustaisin
mä teidät hirteen koirat kimpussanne!

Jos oisin kuolema, ois syli mulla


taas avoin vanhemmille. Yli harmaan
en kynnyksen voi elävänä tulla.

Jos oisin Cecco — hm, se olen varmaan, niin olkoot


ihanimmat neitseet mulla; ja muille suon ma sulottoman
armaan.
RITARI OLAVI

Heinrich Heine.

I.

Katedraalin eessä kaksi miestä punakauhtanaista: kuningas


on toinen niistä, toinen pyövel' ammatiltaan.

Kruunupäinen pyövelille
puhuu: "Pappi messun alkoi,
pian päättyy vihkiminen —
pidä piilus valmihina."

Kellonsoitto, pauhu urkuin,


kirkost' ulos väki tulvii:
keskell' uljaan juhlasaaton
nuoret vihkikoruissansa.

Kauhun-kalvas, vapiseva
tuo on kaunis prinsessainen;
huolt' ei huimall' Olavilla,
suulla punaisell' on hymy.
Hymy suulla punaisella,
niin hän puhuu kuninkaalle:
"Hyvää huoment', appiukko,
katkot tänään kaulan multa.

Kuolen tänään — suo mun vielä


elää keskiyöhön asti,
että voisin hääni nämä
pidoin, soihtutanssein viettää.

Suo mun elää, suo mun elää,


kunnes juon mä viime maljan,
kunnes tanssin viime tanssin —
suo mun keskiyöhön elää!"

Kruunupäinen pyövelille puhuu: "Vävy säilyttäköön


keskiyöhön asti hengen — pidä piilus valmihina."

II.

Nämä häät ovat ritari Olavin, nyt juo hän viime pikarin. Ja nyt
painaa vasten poveaan hän armastaan — oven takana
pyöveli seisoo.

Soi soitto, ja Olavi nuorikon vie karkeloon: raju, huima on


tämä tanssi loimussa soihtujen, elon viimeinen — oven
takana pyöveli seisoo.

Mikä riemu nyt viulujen kielissä soi,


miten nyyhkiä huilun ääni voi.
Ken heidän tanssivan nähdä saa,
sitä kouristaa —
oven takana pyöveli seisoo.

He tanssivat kauan ja lähekkäin, mies puolisolleen kuiskaa


näin: "Miten rakas oot, et tietää voi — vilu haudass' on, oi" —
oven takana pyöveli seisoo.

III.

Herra Olavi, on jo keskiyö ja päätös matkasi maisen. Jo mielin


määrin nauttinut olet lempeä prinsessaisen.

Jo munkit messua hymisee,


ja punakauhtanassa
mies seisoo piilu kädessään,
jo on pölkky vartomassa.

Ja ritari Olavi hoviin käy,


valo, miekat kipunoivat.
Ja ritarin suu on punainen,
suu hymyy, ja sanat soivat:

"Minä siunaan auringon, siunaan kuun,


joka tähtiretkeläisen,
minä siunaan linnun jokaisen
sini-ilmassa livertäväisen.

Ole siunattu, meri, ole siunattu, maa,


ja keto kukkinesi:
minä siunaan orvokit, suloiset
kuin vaimoni silmien mesi.
Ah, tummat silmät puolison, nyt kauttanne päätän retken.
Minä siunaan seljan: ah, alla sen näin antaumukses hetken."
SÄKEITÄ TUNTEMATTOMASTA

Georg Herwegh.

Mä tahdon kuolla ruskon kuoleman, kuin päivä, viime


hehkussansa nukkuin — ah, kuolon tajuttoman, ihanan — niin
kuolla, syliin Ikuisimman hukkuin.

Mä tahdon kuolla lailla tähtien,


täys kirkkautta, silmän sammumatta,
ja taivaan pohjattomaan sinehen
pois mennä hiljaa, kivun koskematta.

Mä tahdon kuolla sinun kuolemas,


maan kukkain tuoksu, perhoslentimille
mi ilmaan kimpoat ja, riemukas,
saat suitsutukseks Herran alttarille.

Mä tahdon kuolla niinkuin kaste tuo,


kun janoisina aamun liekit palaa;
oi Jumala, niin sydämeni juo
kuin päivänsäteet kasteen juovat salaa!
Mä tahdon kuolla niinkuin säveleet,
joit' arat harpun kielet huminoivat:
ne ovat tuskin harpun jättäneet,
kun kohta Luojan sydämessä soivat. —

Et kuole iltaruskon kuolemaan,


et täältä hiljaa tähden lailla vaivu,
et kuole kivutta kuin kukka maan,
ei sielus puoleen aamusäde taivu.

Sä kuolet kyllä, jäljettömihin, mut kurjuudessa voimas


talttuu, turtuu: saa vienon kuolon luonto ikuisin, mut
ihmissydän kappaleittain murtuu.
HYMNI KAUNEUDELLE

Charles Baudelaire.

Laps Haadeksen vai taivaan ootko, Kauneus? On katsees


— jumalainen, saatanallinen — kuin rikoksen ja siunauksen
sekoitus, siks olet niinkuin viinin myrkky suloinen.

On samall' illan, aamun valo katseessas,


on tuoksus ruusuin, rajuilman ruhjomain,
on malja täynnä taikajuomaa suudelmas,
se lamaa sankaria, lasta virvoittain.

Syvyydest' astutko vai luota tähtien?


Kuin koira seuraa turmelus sun liepeilläs.
Sä tuhon sekä riemun kylvät leikiten,
oot opas, mut et näytä, mik' on määränäs.

Käyt nauru huulillasi yli vainajain,


ja Kauhu ylvähin on sinun koruistas.
Ja lailla punakorallien hehkuvain
öin nähdään Murhan kiertyvän sun kaulahas.
Niin korennoinen lentää liekkiin kynttilän,
ja; "Terve, kirkkaus!" se kuiskaa kuollessaan.
Mies huohottava vierell' immen viehkeän
kuin kuoleva on, hyväilevä multaa maan.

Oi kauhistava Kauneus! Se yhtä lie,


oletko viesti Saatanan vai Jumalan,
jos vain sun hymys, katsehesi portti, tie
on maahan, jot' en tunne, mutta rakastan.

Seraafi taikka seireeni, se yhtä on,


jos teet sä — keijukainen silkkisilmäinen,
sä kuningatar laulun, valon, tuoksujen —
vähemmän iljettäväks maan, ja tuokion.
VIINIHOURE

Charles Baudelaire.

Tää päivä on kuin unelmaa.


Ei suitsia, ei satulaa!
Selässä viinin ratsastain
luo keijujen ja jumalain.

Kuin enkeleitä kiidättää


suur delirium-houre tää
meit' etäisimpiin taivaihin
kautt' aamunkoiton kristallin.

Näin myrskyn siipi meitä vie,


myös sydämillä siivet lie.
Oi sisar, lento nopsa niin
vie unelmaini taivaisiin.
KOHOAMINEN

Charles Baudelaire.

Yli suitsevain taivasten, aurinkojen, meren, pilvien, laakson


ja vuoriston, valo maailmain kunis sammunut on, taa eetterin,
taa avaruuksien

tavall' uimarin, tyrskyjen huumaaman, — palavan,


sanomattoman riemukas — niin ylös pakenet, henkeni,
korkeuksiin läpi kaikkeuspiirin valtavan.

Ah, pois hämäryydestä laaksojen


ylös taivasten puhtaiden tuoksujen luo;
jumalviinistä oma osas juo,
tulest' äärettömän, syvän kaikkeuden!

Surut, kyltymys, taistelu rauhaton


sorass', usvissa tien yhä taakkanas ois:
valon siivillä ken osas kiitää pois
ylevään ikirauhaan, se autuas on!
Ajatukset, ah, kiuruina kirmaten ylös lentävät aamuun, ja
laakso jää, kokonaan elon nähden ne ymmärtää mykän
luonnon kieltä ja kukkien.

You might also like