The Nature of Statistics (Statistics - A Universal Guide To The Unknown Book 1)
The Nature of Statistics (Statistics - A Universal Guide To The Unknown Book 1)
HEINZ KOHLER
Willard Long Thorp Professor of Economics, Emeritus
Amherst College
Copyright © 2021 by Heinz Kohler
Contents
FOREWORD
PREVIEW
INTRODUCTION
COLLECTING DATA
ANALYZING DATA
SUMMARY
KEY TERMS
PRACTICE PROBLEMS
SELF-EXAMS
TRUE/FALSE TEST
RECOGNIZING KEY TERMS
MULTIPLE-CHOICE TEST
PROBLEMS
SOLUTIONS TO SELF-EXAMS
TRUE/FALSE TEST
RECOGNIZING KEY TERMS
MULTIPLE-CHOICE TEST
PROBLEMS
2. get a preview of the kinds of problems you will learn to solve and, in the
process, come to see why statistics is said to facilitate wise decision-making
in the face of uncertainty and is viewed as a "universal guide to the
unknown",
You just received Fortune magazine's latest Global 500 report, which
provides information on the world's 500 largest corporations. Besides each
company's name, country, industry code, and number of employees, the
report includes dollar amounts and rankings of each firm's revenues, profits,
assets, and stockholders' equity.
b. How many variables can you find in this report? Which are they?
e. Can you find examples of nominal, ordinal, interval, and ratio data in this
report? Why would you care?
Preview
You are a film producer and your studio has just spent millions of dollars
on creating a new soap opera that seems destined to be shown on television.
Naturally, you want to make as much money as possible. Thus, it is time to
devise a marketing strategy. Several possibilities come to mind:
1. All rights to the new series could be sold to a distributor who is willing to
pay $125 million right now, and that could be the end of the story as far as
you are concerned.
3. You could hire a consulting firm, which is willing to offer advice on the
network's likely reaction for a $1 million fee. The consulting firm's "track
record" is given in Table 1.1 on the following page.
Table 1.1 Film Consultant's Track Record
During the past decade, the consulting firm has issued numerous reports
in similar situations. An ultimate rejection – an event here designated as E1 –
was preceded by a report predicting rejection, R1, 80 percent of the time, and
by a report predicting a contract offer, R2, 20 percent of the time. On the
other hand, an eventual contract offer – an event here designated as E2 – was
preceded by a report predicting rejection 30 percent of the time and by a
report predicting a contract offer 70 percent of the time.
What then is your optimal strategy? Should you grasp the sure thing and
pocket $125 million now by selling the rights? Should you take the –$30
million versus +$300 million gamble by showing the pilot to network
executives? Should you buy the advice and then take the action that
maximizes your likely revenue (sell the rights if report R1 is received; offer
the film to the network if report R2 is received)? Or should you buy the
advice now and, after having received your report, rethink the whole matter
in light of Table 1.1?
Surely, you can picture yourself now, with report R1 in hand, about to
make that $125 million deal with the distributor, yet thinking about the lost
chance of making $300 million if the report is wrong. Surely, you can see
yourself fretting all night, with report R2 in hand, about to contact the
network and collect that $300 million prize, while thinking about the very
real chance, if the report is wrong, of ending up $30 million in the red instead
of pocketing $125 million for sure .
All sorts of executives face decision-making problems like this every
day. Long before you have studied the last book of this series, you will be
able to solve this particular problem in no time. And you will acquire similar
skills in every book in-between.
Introduction
Ask anyone to define the nature of statistics and, just as in the
dictionary, you are likely to get one of three answers. Some, like you perhaps,
are about to take a course in the subject. They will naturally think of statistics
as a field of study that somehow deals with the collection, presentation, and
interpretation of numerical data. There will be others, the vast majority of
people no doubt, who will instantly think of masses of data, seemingly
infinite in number, that constantly bombard us in our daily lives. Just think of
all those numbers ceaselessly spewing forth from television sets, radios,
newspapers, and sites on the world wide web: data about the weather and
sports events; election results and opinion polls; prices of bonds, stocks,
foreign monies, and commodity futures; rates of inflation, unemployment,
and economic growth . . . Finally, a few other people, already trained in the
discipline, will conjure up a highly technical meaning of the term that we,
too, will meet in later books. The term statistics can refer to summary
measures, such as sample averages and sample proportions, that have been
computed from relatively few data gathered by sampling a much larger
collection of data called a population.
In fact, these three definitions are linked. Statistics, viewed as a
scientific discipline, inevitably uses as raw material those very masses of data
that most people associate with the term. Indeed, statistics courses used to
have an ugly reputation precisely because they involved endless, boring hours
of manipulating masses of data. But such number crunching is no more.
Sophisticated computer software, such as Microsoft's Excel or the Minitab
program, for example, can perform powerful magic. As Book 2 illustrates,
upon starting Excel, you encounter a screen that is nicely divided into a series
of columns and rows and invites you to enter data, masses of them, if you
want. Indeed, in the case of Excel, a single Workbook contains 16 blank
spreadsheets. Each of these measures 256 columns by 65,536 rows. That
comes to 16,777,216 cells, which could be printed on a sheet of paper 19 feet
wide and 1,300 feet long! In the case of Minitab's Professional Version, an
eager user can fill in 150 million of those pretty cells (provided there is
sufficient computer memory), but even the lower limit of a mere 5,000 entries
in the Student Version can keep you busy for quite some time. Relax! You
won't have to do that in this series of books.
Having entered your data into Excel, for example, you must specify an
appropriate statistical technique. A few well-chosen keystrokes will do, and
wham! In a fraction of a second, you have your result. Times have certainly
changed. Not so many years ago, some of these calculations might have taken
weeks and even months or years of work.
Collecting Data
Finding Existing Data
Generating New Data
Any practical statistical work requires data, data, and more data. A
first branch of the discipline of statistics, therefore, focuses on the careful
collection of this crucial type of raw material. Such collection can proceed in
one of three ways: 1) A would-be investigator can look for data that already
exist because others have gathered them in the past. 2) Brand-new data can
be generated with the help of so-called observational studies that involve
census taking or sampling. 3) Brand-new data can be generated by
conducting carefully controlled experiments. These three approaches are
explained in Books 3, 4, and 5 of this series, respectively.
Looking Ahead
The importance of inferential statistics is illustrated by the rich array of
examples found throughout this series of books. Statistical techniques help
firms screen job applicants, budget research and development expenditures,
determine the quality of raw materials received or of output produced, and
decide whether sales personnel are better motivated by salary or commission.
Statistical techniques can, similarly, help firms choose the best one among
several product designs, leasing arrangements, oil‑drilling sites, fertilizer
types, and advertising media. And inferential statistics can tell firms precisely
how the quantity of their product that is demanded relates to the product's
price, to the prices of substitutes and complements, to consumer income, and,
perhaps, even to the consumer's sex.
Government officials are equally avid users of what this series has to
teach. Inferential statistics plays an important role in assuring the reliability
of space missions and of more mundane airport lighting systems. And
statistical techniques help answer questions such as these: Do motorcycle
helmets really reduce accident fatalities? Do nursing homes discriminate
against Medicaid recipients? Do the boxes of raisins marketed by this firm
truly contain 15 ounces as claimed? Do the firms in this state meet anti-
pollution standards? When is this recession likely to end? What is next year's
probable rate of inflation? This list, too, can be expanded at will.
Here, book-by-book, are some specific problems you will encounter
and learn to solve. Naturally, the list contains terms you haven't yet met.
Don't fret. You will learn about them in due course.
Note: As you will learn in Book 4, many types of samples exist. Not all of
them are equally likely to reflect the makeup of the sampled population. For
example, the group of three salaries found in the column 6 box here might be
a convenience sample that was selected merely for the ease of illustration. It
might also be a simple random sample that was selected by some procedure
such as writing the nine salaries on slips of paper, mixing the slips in a bowl,
and pulling out three. In the latter case, as you will learn in Book 8, it is
possible to select 84 different samples of 3 out of 9. That would give us 1
chance in 84 of selecting the particular sample shown here.
Nominal Data
Suppose you were working, as we will later in this series, with an
alphabetic list of the 100 largest multinational firms that maintained
headquarters in the United States. (Table 4.1 on page contains such a list.)
Continually referring to the actual company names, such as Goodyear Tire &
Rubber or Minnesota Mining and Manufacturing, may soon become
awkward and unwieldy. So you decide to substitute numbers for those
company names, ranging from 00 for Abbott Laboratories to 99 for Xerox.
These numbers are nominal data. They merely name or label differences in
kind. Thus, they serve the purpose of classifying observations about
qualitative variables into mutually exclusive groups where the numbers in
each group can then be counted. (Numbers between 00 and 99, for example,
might refer to multinational companies; numbers between 100 and 159 to
other types of firms, and so on.)
In fact, we meet nominal data every day. House numbers provide a good
example. The green house at the corner might be assigned the number 1, the
yellow house across the street a 2, the white house in the middle of the street
a 6, and so on, until the brick house at the end is labeled with a 12. Similarly,
a statistician working with Table 1.9 above might code "male" as 0 and
"female" as 1 for the sake of mere convenience, but alternative labels of
"male" = 100 and "female" = 50 would serve as well.
Invariably, nominal data provide the weakest level of measurement in
the sense that they contain only the tiniest amount of useful information.
More importantly, as the slightest bit of thought about these examples can
confirm, it never makes sense to add, subtract, multiply, divide, rank,
average, or otherwise manipulate nominal data arithmetically. We can merely
count them. The presence of 12 numbers on a street denotes the existence of
12 different houses. Five 1s, according to one of the above codes, indicates
the presence of five females. And that is all.
Consider how adding all the house numbers on our street would yield a
meaningless number 78. Summing six 0s and three 1s to a total of 3 (because,
say, six men and three women are working in a firm) would be equally silly.
Ordering nominal numbers by size, or ranking them, would be senseless
as well. Although 2 is smaller than 6, in what sense is the yellow house
numbered 2 smaller than the white house numbered 6? Although 1 is greater
than 0, in what sense is "female" greater than "male"? Nor could we assume
that equal differences or intervals between nominal data carry any meaning at
all: Just because 12 -10 = 2 and 10 - 8 = 2 as well, could we assume that the
distance between house #12 and house #10 is the same as that between house
#10 and house #8? Hardly. And dividing one house number by another would
be pointless, too. True enough, the ratio of 12/6 is 2, but can we say that
house #12 is somehow twice as large or otherwise more important than house
#6 down the street?
Application 1.1 The Chinese Calendar
The Chinese calendar provides good example of the use of nominal
data. Years are named after animals (real or imaginary). The list repeats after
twelve years. Table 1.10 explains.
The new time starts at an arbitrary zero point and beats 000 at midnight
over the Swatch building in Biel, Switzerland. It also divides the day into
1,000 swatch beats, each equivalent to 86.4 good old seconds. So, if it's 3 PM
local time, or 15:00 hours, by the old clock, you are at 625 universal time, as
the display shows.
As you might have guessed, Swatch is selling an Internet watch around
the world ($70 at the time of this writing) and you can even download
software at its site to teach your computer a trick or two about the meaning of
Internet time.
[Source: "Dick Tracy's Cellular Swatch Watch," The New York Times, June
26, 2000, p. C8.]
Ratio Data
2. Masses of data are, indeed, the statistician's raw material, and a first branch
of the discipline of statistics focuses on the careful collection of data. Such
collection can proceed in one of three ways:
a. A would-be investigator can look for data that already exist because others
have gathered them in the past.
b. Brand-new data can be generated with the help of observational studies
that involve census taking or sampling.
c. Brand-new data can be generated by conducting carefully controlled
experiments.
3. A second branch of the discipline of statistics, known as descriptive
statistics, is concerned with developing and utilizing techniques for the
effective presentation of numerical information so as to highlight patterns
otherwise hidden in a data set.
5. In the end, the discipline of statistics is, perhaps, best viewed as a branch
of mathematics that develops and utilizes techniques for the careful
collection, effective presentation, and proper analysis of numerical
information. As such it facilitates wise decision-making in the face of
uncertainty and becomes a universal guide to the unknown.
Introduction
1. This is a fun question that elaborates on this book’s Preview. It challenges
you to do some serious thinking, but you can also learn much by merely
looking up the answers later in this book. First, study the solution that is
given here to this book’s Preview problem; then consider the questions that
follow.
If you forgo buying the consulting firm's advice, the optimal action is to sell
the rights for $125 million, which is illustrated in Figure 1.2.
FIGURE 1.2 The Soap Opera Decision Without Advice
Note: The 60 percent chance of rejection and the 40 percent chance of a
contract offer are indicated as probability of event E1 being .6 and probability
of event E2 being .4.
If you do buy the consulting firm's advice, the optimal action, now illustrated
in Figure 1.3 on the following page, is this: If R1 is received, sell the rights
and take the $125 million minus the $1 million fee. If R2 is received, offer the
film to the network and earn an expected $200 million.
FIGURE 1.3 The Soap Opera Decision With Advice
Note: You need not fully understand all of the entries in Figure 1.3 at this
point.
a. Can you guess the meaning of the $102 million number at point b in Figure
1.2?
b. Can you guess the meaning of the $154.4 million number at point b in
Figure 1.3?
c. Can you guess what strategy the filmmaker would be well advised to
follow: forgoing the advice or buying the advice?
d. What do you think is the maximum amount the filmmaker could be made
to pay for the (admittedly imperfect) advice?
The Collection of Data
2. This problem provides a preview of the type of material to be discussed at
length in Book 3. If you are connected to the Internet, visit
https://www.usa.gov/statistics, a site maintained by the U.S. federal
government. Click on Agencies and explore the manifold sources of U.S.
federal government statistics. Make a list of five agencies that supply data to
this site.
8. The Collecting Data section above briefly anticipates issues that will be
discussed at length in later books. One of these issues is the difference
between surveys and experiments and the significance of exercising control
over elementary units whose characteristics are being scrutinized. Imagine
annual salaries of workers in a firm to equal $10,000 for everyone, plus
$1,000 for every year of work experience. Salaries are, thus, totally unrelated
to race. Then imagine that most of a firm's black workers are young (and,
therefore, have had little work experience), while most of its white workers
are older (and have had many years of experience on the job). Someone
merely surveying salaries might find an average salary of $15,000 a year
among blacks and of $28,000 a year among whites and might conclude, quite
incorrectly, that the firm's management is discriminating based on race.
In contrast, a controlled study would divide the firm's workers into
groups according to work experience and would compare salaries within each
group. Such a study would find identical salaries between (1) the few white
and (2) the many black workers in the younger and less experienced group.
And it would find identical but higher salaries between (1) the many white
and (2) the few black workers in the older and more experienced group. Thus,
the controlled study would avoid the false racial-discrimination charge.
Make up a detailed numerical example to corroborate the story told
by the numbers given here.
Basic Statistical Concepts
9. Consider Table 1.12, which contains selected data found in Fortune
magazine's 1999 Global 500 report.
Table 1.12
Selected Characteristics of the World's Largest Corporations in 1999
Table 1.15
Best Picture Nominees for the 1998 Academy Awards
Table 1.16
Best Picture Winners at 1993 to 1997 Academy Awards
Table 1.19
Best U. S. Business Software Sales (Windows and DOS), December 1998.
Table 1.20
Best U. S. Business Software Sales (Macintosh), December 1998.
b. A table contains data on quantity produced and total cost for 7 factories.
b. A table contains data on the ask and bid prices for 25 different corporate
bonds.
d. A table contains data on 100 incoming airline passengers who have been
questioned about the reason for their trip (10 categories, ranging from
business to honeymoon), the likely length of their stay, their likely
expenditures in town, and their type of accommodation (6 categories, ranging
from hotel to own home).
33. Review each of the cases in Practice Problem 19 and identify the data
types involved.
34. Review each of the cases in Practice Problem 20 and identify the data
types involved.
35. Review each of the cases in Practice Problem 21 and identify the data
types involved.
36. Review each of the cases in Practice Problem 22 and identify the data
types involved.
37. Review each of the cases in Practice Problem 23 and identify the data
types involved.
38. Among numbers describing the following, which are nominal data?
a. distances traveled
b. student I.D. numbers
c. net assets
d. room numbers
e. sound levels inside different airplanes
f. drivers' ratings of the handling characteristics of cars
g. football jersey numbers
a. ratings of colleges
b. temperature readings at the airport
c. the daily receipts of a supermarket
d. consumer brand preferences concerning types of coffee
e. army ranks
f. a corporate hierarchy from president to janitor
g. calendar years
41. A product is produced in six alternative colors: blue, brown, green, red,
yellow, and white.
43. Make a list of 6 data types that are clearly nominal in nature.
44. Make a list of 6 data types that are clearly interval in nature.
45. Make a list of 6 data types that are clearly ordinal in nature.
46. Consider the following situations; identify the types of data involved:
b. A hotel manager has labeled rooms on the first, second, or third floors by
numbers in the 100s, 200s, or 300s, respectively, while also designating
rooms on the north or south side of the building by even or odd last digits.
Thus, 102, 104, 106 stand for first‑floor rooms to the north; 301, 303, 305 for
third‑floor rooms facing south.
47. Someone claims that coding Olympic "gold," "silver," and "bronze" as 3,
2, and 1 amounts to creating interval data. What do you think?
48. The Fujita or F scale measures the intensity of tornadoes, as follows:
0. Wind velocity 40-72 mph; damages chimneys, tree limbs, and sign boards.
1. Wind velocity 73-112 mph; flips cars, mobile homes, peels roofing.
2. Wind velocity 113-157 mph; tears roofs off houses, splinters mobile
homes.
3. Wind velocity 158-206 mph; tears roofs and walls off houses, uproots
trees.
4. Wind velocity 207-260 mph; levels frame houses, generates missiles.
5. Wind velocity 261-318 mph; hurls houses and cars long distances.
50. Identify the types of data created when the following are coded from 1-5:
b. The largest energy companies on Fortune's 1999 Global 500 list: Suez
Lyonnaise des Eaux, Enron, RAO Gazprom, Dynegy, Transcanada Pipelines
d. The largest banks on Fortune's 1999 Global 500 list: Bank of America
Corporation, Credit Suisse, Deutsche Bank, HSBC Holdings, ABN AMRO
Holding
e. The largest chemicals companies on Fortune's 1999 Global 500 list: E.I.
Du Pont de Nemours, Bayer, BASF, Hoechst, Dow Chemical
Self-Exams
True/False Test
____9. A multivariate data set is one that contains information on more than
two variables.
____15. Interval data are numbers that label differences in kind, as nominal
data do, but that, in addition, by their very size also order or rank
observations on the basis of importance.
____28. A bivariate data set is a quantitative variable that can assume values
at all points on a scale of values, with no breaks between possible values.
____36. Ratio data are numbers that possess all the characteristics of ordinal
data and, in addition, relate to one another by meaningful intervals or
distances, because all numbers are referenced to a common (although
admittedly arbitrary) zero point.
____37. Elementary units are numbers that possess all the characteristics of
interval data and, in addition, have meaningful ratios because they are
referenced to an absolute or natural zero point that denotes the complete
absence of the characteristic being measured.
Recognizing Key Terms
In each of the following sections identify the Key Term that is being defined.
1.________________________________
a qualitative variable about which observations can be made in only two
categories
2.________________________________
a data set containing information on two variables
3.________________________________
a quantitative variable that can assume values at all points on a scale of
values, with no breaks between possible values
4.________________________________
the collection of data about persons or objects by deliberately exposing them
to some kind of change, while leaving all else unchanged, and subsequently
recording how identical persons or objects respond to different types of
change, or how different types of persons or objects respond to identical
change
5.________________________________
any collection of observations about one or more characteristics of interest
possessed by one or more elementary units
6.________________________________
any single observation about a specified characteristic of interest possessed
by an elementary unit; the basic unit of the statistician's raw material
7.________________________________
drawing inferences about an unknown part from a known whole
8.________________________________
a branch of the discipline that is concerned with developing and utilizing
techniques for effectively presenting numerical information so as to highlight
patterns otherwise hidden in data sets
9.________________________________
a quantitative variable that can assume values only at specific points on a
scale of values, with inevitable gaps between them
10.________________________________
persons or objects that have characteristics of interest to statisticians
11.________________________________
a complete listing of all elementary units relevant to a statistical investigation
12.________________________________
drawing inferences about an unknown whole from a known part
13.________________________________
a branch of the discipline that is concerned with developing and utilizing
techniques for properly analyzing (or drawing inferences from) numerical
information
14.________________________________
numbers that possess all the characteristics of ordinal data and, in addition,
relate to one another by meaningful intervals or distances, because all
numbers are referenced to a common (although admittedly arbitrary) zero
point
15.________________________________
the assignment of numbers to characteristics that are being observed
16.________________________________
a qualitative variable about which observations can be made in more than two
categories
17.________________________________
a data set containing information on more than two variables
18.________________________________
numbers that merely name or label differences in kind and, thus, can serve
the purpose of classifying observations about qualitative variables into
mutually exclusive groups where the numbers in each group can then be
counted
19.________________________________
the collection of data about persons or objects by merely recording
information about selected characteristics of interest (such as A or B), while
paying no attention to possibly widely diverging other characteristics (such as
C or D) that may affect the chosen characteristics
20.________________________________
numbers that label differences in kind, as nominal data do, but that, in
addition, by their very size also order or rank observations on the basis of
importance
21.________________________________
the set of all possible observations about a specified characteristic of interest
22.________________________________
a variable that is normally described in words rather than numerically
(because it differs in kind rather than degree among elementary units)
23.________________________________
a variable that is normally expressed numerically (because it differs in degree
rather than kind among the elementary units under study)
24.________________________________
numbers that possess all the characteristics of interval data and, in addition,
have meaningful ratios because they are referenced to an absolute or natural
zero point that denotes the complete absence of the characteristic being
measured
25.________________________________
a subset of a statistical population or of the frame from which it is derived
26.________________________________
a branch of mathematics that is concerned with facilitating wise decision
making in the face of uncertainty and that, therefore, develops and utilizes
techniques for the careful collection, effective presentation, and proper
analysis of numerical information
27.________________________________
a data set containing information on one variable only
28.________________________________
characteristics possessed by elementary units
Multiple-Choice Test
Circle the letter of the one answer that you think is correct or closest to
correct.
1. Ask anyone to define the nature of statistics and, just as in the dictionary,
you are likely to hear it defined as
a. a field of study that somehow deals with the collection, presentation, and
interpretation of numerical data.
a. a controlled experiment.
b. an observational study.
c. a sample survey.
a. descriptive statistics.
b. external statistics.
c. inferential statistics.
d. internal statistics.
a. deductive reasoning.
b. inductive reasoning.
c. census taking.
d. a sample survey.
a. data sets.
b. elementary units.
c. inferential statistics.
d. variables.
6. Characteristics possessed by elementary units are called
a. data sets.
b. descriptive statistics.
c. internal data.
d. variables.
a. bivariate.
b. multivariate.
c. univariate.
a. binomial.
b. multinomial.
c. qualitative.
d. quantitative.
9. The set of all possible observations about a specified characteristic of
interest is
a. a frame
c. an observational study.
d. a population.
a. a datum.
b. an elementary unit.
c. a sample.
11. Four types of data exist. In order of increasing sophistication, they are:
a. Adding them.
b. Subtracting them.
c. Multiplying them.
a. nominal data.
b. ordinal data.
c. interval data.
14. Dividing ratio data produces meaningful results because such data
a. getting data that already exist because others have gathered them in the
past.
a. census taking.
b. sampling.
d. controlled experiments.
19. Which of the following are not internal data from the point of view of a
business administrator?
b. The firm's employee records providing names, addresses, job titles, years
of service, salaries, social security numbers, and even numbers of sick days
used.
d. The databases held by the Bureau of the Census, the Department of Labor,
the Federal Reserve Board, and the Office of Management and Budget.
a. a controlled experiment.
b. an observational study.
c. a sample survey.
a. tables.
b. graphs.
a. deductive reasoning.
b. inductive reasoning.
c. census taking.
d. a sample survey.
b. frame.
d. population.
26. Which of the following might constitute the elementary units of a
statistical investigation?
a. bivariate.
b. multivariate.
c. univariate.
a. bivariate.
b. multivariate.
c. univariate.
d. multinomial.
29. Observations about a binomial qualitative variable
c. differ in degree rather than kind among the elementary units under study.
a. binomial or multinomial.
b. discrete or continuous.
c. binomial or continuous.
d. discrete or multinomial.
a. Business type.
b. Gender.
c. Job title.
d. Race.
a. a datum.
b. an elementary unit.
c. a sample.
a. analytical statistics.
b. deductive reasoning.
c. measurement.
39. An alphabetic list of all Fortune 500 company names, even if encoded
numerically, is best viewed as a set of
a. nominal data.
b. ordinal data.
c. interval data.
d. ratio data.
40. With respect to nominal data, which of the following makes sense?
a. Adding them.
b. Averaging them.
c. Counting them.
a. nominal data.
b. ordinal data.
c. interval data.
d. ratio data.
42. With respect to ordinal data, which of the following makes sense?
a. Adding them.
b. Multiplying them.
c. Dividing them.
43. Scales of calendar time, clock time, and temperatures provide good
examples of
a. nominal data.
b. ordinal data.
c. interval data.
d. ratio data.
44. With respect to interval data, which of the following makes sense?
a. Adding them.
b. Multiplying them.
c. Dividing them.
a. nominal data.
b. ordinal data.
c. interval data.
d. ratio data.
a. nominal data.
b. ordinal data.
c. interval data.
d. ratio data.
47. Coded army ranks (private = 1, corporal = 2, etc.) are best viewed as
a. nominal data.
b. ordinal data.
c. interval data.
d. ratio data.
48. Qualitative variables are usually described verbally. When coded, these
verbal descriptions turn into numbers that are
a. nominal data.
b. ordinal data.
d. interval data.
a. nominal data.
b. ordinal data.
d. ratio data.
50. Quantitative variables are always described numerically, either by
b. Highway numbers.
b. Highway numbers.
b. A listing of all the Years of the Dragon in the Chinese calendar (1904,
1916, 1928, 1940 1988, 2000, 2012, 2024).
b. A listing of all the Years of the Monkey in the Chinese calendar (1908,
1920, 1932, 1944 1992, 2004, 2016, 2028).
c. A listing of all the Years of the Rat in the Chinese calendar (1900, 1912,
1924, 2036 1996, 2008, 2020, 2032).
1. Here is a preview of matters you will learn much more about in Book 3.
Visit https://www.usa.gov/statistics, a site maintained by the U. S. federal
government. Click on MapStats to get a profile of your state or county. Find
information on labor force, employment, and unemployment.
2. Consider Table 1.21 on the following page, which contains selected data
about the ten largest U.S. private companies
a. Identify the elementary units.
b. How many variables can you find in this table? Which are they?
c. Identify the variables as quantitative or qualitative.
d. Identify variables as discrete/continuous or binomial/multinomial.
e. What kind of data set does this table contain?
Table 1.21
Selected Characteristics of Largest U.S. Private Companies in 1999
e. A table contains data for all the houses currently for sale, including price,
type, lot size, number of rooms, availability of garage or swimming pool, and
age.
4. Classify the following variables, first, as qualitative or quantitative and,
second, as binomial/multinomial or discrete/continuous:
a. the weights of cows at an auction.
b. the dollar figures listed on a sheet of paper.
c. the genders of airline pilots.
d. the styles of houses (1-story, 2-story, split level, etc.).
e. the grades of meat (prime, choice, good, utility).
f. the credit limits of customers.
7. Here is a preview of matters you will learn much more about in Book 3.
Visit https://www.usa.gov/statistics, a site maintained by the U. S. federal
government. Click on Regional Statistics > Agriculture > Rankings by
State and Commodity > Crop Rankings by State to find data about your
state.
8. Here is a preview of matters you will learn much more about in Book 3.
Visit https://www.statcan.gc.ca, a site maintained by Statistics Canada. Find
employment data by industry after clicking on Canadian statistics >
Labour, employment, and unemployment.
9. Here is a preview of matters you will learn much more about in Book 3.
Visit https://www.ine.cl, a site maintained by Chile's National Institute of
Statistics. Click on Indice de Precios to find the latest data on consumer
prices.
10. Here is a preview of matters you will learn much more about in Book 4.
Visit http://www.gallup.com/home.aspx, a site maintained by the Gallup
Organization. Click Business & the Economy > Business & Industry and
find the results of the latest poll on corporations, business leaders, and
industries.
11. Here is a preview of matters you will learn much more about in Book 4.
Visit a site maintained by the Harris organization at
https://theharrispoll.com/category/theharrispoll. Click International and
find out about the nature of the company's Global Network.
Table 1.23
Best U.S. Video Rentals, October 30-November 5, 2000
14 . Consider the data of Table 1.24.
Table 1.24
Best U.S. Business Software Sales (Windows and DOS), September 2000
15. In each of the following cases, determine whether the data set is
univariate, bivariate, or multivariate:
e. A table contains data about cabin airflow (feet per minute), bacteria count,
mold count, and respirable-particulates count on 11 sampled airline flights.
b. Telephone numbers.
c. Birth dates.
d. Computer models.
20. Provide illustrations of the fact that arithmetic operations with ordinal
data make no sense.
Solutions to Practice Problems
1.
a. The $102 million represents the amount of money you would earn on
average, in the long run, if you continued to make films, repeatedly found
yourself in the same situation, and always offered your films to the network
for review. For example, given the assumed probabilities involved, 60 of the
next 100 films would be rejected and saddle you with a loss of 60 times $30
million = $1,800 million. However, 40 of these films would be accepted and
bring you a profit of 40 times $300 million = $12,000 million. Altogether, the
100 films would bring in –$1,800 million + $12,000 million = $10,200
million, or the indicated $102 million per film. Why then does the no-advice
strategy counsel selling the rights at once? It does so because that approach
brings $125 million per film.
b. The $154.4 million represents the amount of money you would earn on
average, in the long run, if you continued to make films, repeatedly found
yourself in the same situation, always hired the consulting firm, and then
followed its advice. For example, given the assumed probabilities involved,
60 of the next 100 films would receive a report predicting rejection;
therefore, you would sell them to the distributor and take in $125 million
minus the $1 million fee, or a total of $7,440 million. However, 40 of these
films would receive a report predicting a contract offer and you would show
them to the network. Sadly, given the consulting firm's track record, 30
percent of these films would, nevertheless, be rejected, bringing you a loss of
12 times $31 million (your $30 million of filming expenses plus the $1
million consulting fee) = $372 million. Finally, given the consulting firm's
track record, 70 percent of these films would be accepted, allowing you to
take in 28 times $299 million (the $300 million network fee minus the $1
million consulting fee) = 8,372 million. Altogether, the 100 films would
bring in $7,440 million –$372 million + $8,372 million = $15,440 million, or
the indicated $154.4 million per film.
c. It should buy the advice. In the long run, it would earn $154.4 million
instead of $125 million per film.
d. Given average earnings of $125 million per film without the advice and of
$154.4 million per film after having spent $1 million on advice, an additional
$154.4 – $125 million = 29.4 million could be extracted from the filmmaker
by a shrewd consulting firm. The maximum is $30.4 million per film.
2. Answers can vary. Among the many agencies listed, you will find the
Bureau of Economic Analysis, the Bureau of Transportation Statistics, the
Environmental Protection Agency, the National Center for Education
Statistics, and the Small Business Administration.
3. Answers can vary. At the time of this writing, the latest available figure,
for November 1999, was $31.351 billion.
4. Answers can vary. At the time of this writing, the latest available figure,
for October 1999, was 20.42 %.
9.
a. The 15 company names are the elementary units here.
b. There are 5 variables, noted in the headings of columns (2) to (6).
c. Quantitative: cols. (4) - (6); qualitative: cols. (2) and (3).
d. Quantitative and continuous: cols. (4) and (5); quantitative and discrete:
col. (6); qualitative and multinomial: cols. (2) and (3).
11.
13.
15.
17.
20.
22.
Qualitative: c, e, f.
Quantitative: a, b, d.
23.
Qualitative: e, f.
Quantitative: a, b, c, d.
24. Answers can vary. Here is one possibility: the recording of quiz answers
as true or false; a listing of product quality as satisfactory or defective; a
notice showing airway VOR beacons as on or off; a notice listing low-level
military flight routes as hot or cold (that is, active or not in use); a listing of
airport control towers as being active or closed; a record of Sigmets
(significant meteorological warnings issued to pilots) as being valid or
expired.
25. Answers can vary. Here is one possibility: a list of more than two options
in a computer dialog box; a list of 19 aircraft types; a list of 3 types of
businesses; a list of the 10 hottest media stocks; a list of the 5 hottest TV
shows; a list of the 10 best-selling music albums.
26. Answers can vary. Here is one possibility: the number of people who
have access to the Internet; this year's number of perfect SAT scores; the
number of times a pilot changes the transponder code during a given flight;
the number of shoppers who, when asked, recall a certain ad; the number of
wins a sports team has this season; the number of papers sold today by The
New York Times.
27. Answers can vary. Here is one possibility: the percentage change in the
assets of fastest growing mutual fund companies; the tar content per cigarette;
the height above ground shown by an aircraft's radar altimeter; an aircraft's
true airspeed; the gallons of remaining fuel indicated on an aircraft's fuel
gauge; the time an aircraft requires on the average to fly from A to B.
28.
a. nominal: cols. 2 and 3; ratio: cols. 4-6.
b. nominal: col. 3; ordinal: col. 2; ratio: col. 4.
29.
a. ordinal: col. 2; ratio: col. 3.
b. nominal: col. 2; ratio: col. 3.
30.
31.
a. nominal: col. 3; ordinal: col. 2; ratio: col. 4.
b. nominal: col. 3; ordinal: col. 2; ratio: col. 4.
c. nominal: col. 3; ordinal: col. 2; ratio: col. 4.
32. Nominal data: management style, job category, gender. Ordinal data:
performance indexes. Ratio data: all others.
33. Nominal data: geographic location, trip reasons, accommodation types,
most favorite music type. Ordinal data: refrigerator quality ratings. Ratio
data: all others.
34.
35.
36.
Nominal data: religious affiliations, brands of gasoline, list of states.
38. Nominal: b, d, g
39.
Nominal: a, d, f
Ordinal: c
Ratio: b, e
40.
Ordinal: a, d, e, f
Interval: b, g
Ratio: c
41.
a. Answers can vary. One possibility:
blue = 1; brown = 2; green = 3; red = 4; yellow = 5; white = 6.
b.
Consider adding: Although 1 + 2 = 3, does blue + brown = green?
Consider subtracting: Although 5 – 4 = 1, does yellow minus red equal blue?
Consider multiplying: 2 ‰ 5 = 10, but there is no code 10.
Consider dividing: Although 6 ¸ 2 = 3, is white divided by brown equal to
green?
Consider ranking: Although 5 > 4, in what sense is yellow larger than red?
Consider averaging 4 and 6, which is {(4 +6) ¸ 2} = 5. Is the average of red
and white equal to yellow?
The answer is always the same: Arithmetic operations with nominal data
make no sense at all.
Comment: Alert! Alert! The sum of these nominal data is totally meaningless.
One should never try to sum nominal data. Yet, a calculator or computer will
do so when employed for the purpose.
42.
b. The sum equals 1,275 and the average is 25.5. It means absolutely nothing.
In what sense is the average of all states somewhere between Missouri = 25
and Montana = 26?
43. Answers can vary. Here is one possibility: the grades (A to F) of all
students in a class; a list of all the products sold by Kmart; a list of all the
firms named in today's issue of the Wall Street Journal; the occupational
codes used by the U.S. Internal Revenue Service (Fishing = 114110, dentistry
= 621210; religious organizations = 813000); airway numbers used by
aircraft throughout the world; bank account numbers.
44. Answers can vary. Here is one possibility: the ancient Chinese calendar
that reckoned days and years in cycles of sixty; the Julian calendar prescribed
by Julius Caesar; the Gregorian calendar now in general use in most parts of
the world (first prescribed in 1582 by Pope Gregory XIII to correct the Julian
year to the astronomical year); the Hebrew calendar which reckons the year
of creation as 3,761 B.C.; the Moslem calendar generally used in Moslem
countries and reckoning time from July 16, 622 A.D., the day following
Mohammed's flight from Mecca to Medina; the Republican calendar
instituted on October 5, 1793 by the first French republic.
45. Answers can vary. Here is one possibility: faculty ranks (Professor,
Associate Professor, Assistant Professor, Instructor); Standard and Poor's
bond rating (AAA,
AA, A, BBB, BB, B …DDD, DD, D); restaurant quality ratings (five-star to
one-star), the Beaufort wind scale (0 = calm and a wind speed is less than 1
mile per hour; 6 = strong breeze and a wind speed of 25-31 mph; 12 =
hurricane and a wind speed above 75 mph); the Richter earthquake scale
(ranging from 0 to 8.9, each whole number represents a tenfold increase in
earthquake magnitude), the decibel scale (a measure of sound intensity, being
the logarithm to the base of 10 of the ratio of two amounts of power). In all of
these cases, the differences between any two adjacent ratings cannot be
assumed to have identical meanings; therefore, these ratings are not interval
data.
Note: A quick review of logarithms can show why scales based on
logarithms cannot be interval data. Consider logarithms to the base of 10.
Write down one 10 or two 10s or three 10s, as in column (1) and put
multiplication signs between numbers. The results of this multiplication,
shown in column (2), might represent the values to be recorded, such as
earthquake intensity or noise level. A different way of showing these values
is given in column (3). The exponents shown in column (3) become the
logarithms of the original values, as in column (4). Clearly, if logarithms,
such as 1, 2, and 3, are used in a data set (as they are on the Richter and
decibel scales) equal intervals between these logarithm numbers (here always
1) in no way assure us of equal intervals between the original numbers
(which, in fact are 90 and 900 here). Logarithms, thus, are not interval data.
46.
a. Nominal data.
b. Ordinal data.
47. More likely, they are ordinal data. A coding of "gold," "silver," and
"bronze" as 3, 2, and 1 does imply that, in some sense, an Olympic gold
medal winner is more important than a silver or bronze medal winner, but no
one would be prepared to say that the difference in achievement between the
recipients of gold and silver was exactly the same as that between the winners
of silver and bronze, just because 3 - 2 = 1 and 2 - 1 = 1 as well.
48. These are ordinal data. The differences between any two adjacent ratings
interval data.
Note: A quick review of logarithms can show why scales based on
logarithms (such as this one) cannot be interval data. Consider logarithms to
the base of 10. Write down one 10 or two 10s or three 10s, as in column (1)
and put multiplication signs between numbers. The results of this
multiplication, shown in column (2), might represent the values to be
recorded, such as tornado intensity. A different way of showing these values
is given in column (3). The exponents shown in column (3) become the
logarithms of the original values, as in column (4). Clearly, if logarithms,
such as 1, 2, and 3, are used in a data set (as they are on the Fujita scale)
equal intervals between these logarithm numbers (here always 1) in no way
assure us of equal intervals between the original numbers (which, in fact are
90 and 900 here). Logarithms, thus, are not interval data.
True/False Test
1. T
2. T
5. T
6. T
7. F (They are called elementary units.)
8. F (It is a binomial qualitative variable.)
9. T
10. T
11. F (This is true about a multinomial qualitative variable.)
12. T
13. T
14. F (The statement describes nominal data.)
15. F (The statement describes ordinal data.)
16. T
17. F
18. T
19. T
20. T
21. F
22. T
23. T
24. F
25. T
26. F
27. T
28. F
29. F
30. F
31. T
32. F
33. F
34. T
35. T
36. F
37. F
Recognizing Key Terms
In each of the following sections identify the Key Term that is being defined.
4. controlled experiment
the collection of data about persons or objects by deliberately exposing them
to some kind of change, while leaving all else unchanged, and subsequently
recording how identical persons or objects respond to different types of
change, or how different types of persons or objects respond to identical
change
5. data set
any collection of observations about one or more characteristics of interest
possessed by one or more elementary units
6. datum
any single observation about a specified characteristic of interest possessed
by an elementary unit; the basic unit of the statistician's raw material
7. deductive reasoning
drawing inferences about an unknown part from a known whole
8. descriptive statistics
a branch of the discipline that is concerned with developing and utilizing
techniques for effectively presenting numerical information so as to highlight
patterns otherwise hidden in data sets
11. frame
a complete listing of all elementary units relevant to a statistical investigation
15. measurement
the assignment of numbers to characteristics that are being observed
21. population
the set of all possible observations about a specified characteristic of interest
26. statistics
a branch of mathematics that is concerned with facilitating wise decision
making in the face of uncertainty and that, therefore, develops and utilizes
techniques for the careful collection, effective presentation, and proper
analysis of numerical information
28. variables
characteristics possessed by elementary units
Multiple-Choice Test
1d 2a 3a 4b 5b
6d 7b 8c 9d 10 a
11 c 12 d 13 d 14 a 15 b
16 d 17 d 18 c 19 d 20 b
21 c 22 d 23 a 24 d 25 b
26 d 27 a 28 c 29 a 30 a
31 d 32 b 33 b 34 a 35 b
36 c 37 c 38 c 39 a 40 c
41 b 42 d 43 c 44 a 45 c
46 a 47 b 48 c 49 c 50 b
51 a 52 a 53 d 54 d 55 d
56 d 57 b 58 b 59 a
Problems
1. Answers can vary. Here is what the author found when looking for
statewide California data in the fall of 2000:
2.
3.
13.
15.
Bivariate. Two pieces of information are recorded for each elementary unit:
d.
Multivariate. More than two pieces of information are recorded for each
elementary unit: e.
16.
18.
19.
Nominal: b, d
Ordinal: none
Interval: c, e
Ratio: a, f
20.
Answers can vary. Here is one possibility:
Let students rate their professors as superb, average, or pathetic. Let these
ratings be coded as 1, 2, and 3, respectively.
Addition: Even though 1 + 2 = 3, it makes no sense to say that superb plus
average equals pathetic.
Subtraction: Even though 3 – 2 = 1, it makes no sense to say that pathetic
minus average equals superb.
Multiplication: Even though 1 times 2 equals 2, it makes no sense to say that
superb times average equals average.
Division: Even though 3 divided by 2 equals 1.5, it makes no sense to say
that pathetic divided by average equals someone halfway between superb and
average.
BIOGRAPHY 1.1 Adolphe Quetelet (1796 -1874)
Apart from his World War II memoir, My Name Was Five, his nontechnical
writings include Caution: Snake Oil! which shows how statistical thinking
can help us expose misinformation about our health, and another series of
electronic books, Surfing a Magical Internet, which shows how people
gathered information some 150 years ago before the current internet existed.
For a complete listing, follow these links:
https://www.amazon.com/author/heinzkohler
https://www.hkstatistics.com
https://www.surfingamagicalinternet.com
https://www.amherst.edu/people/facstaff/hkohler