Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 66

DATA ANALITICS FOR BUSINESS

Descriptive Statistics

Dr. Uka Wikarya

FAKULTAS EKONOMI DAN BISNIS


UNIVERSITAS INDONESIA

1
Outline of Lecturing
1. Introduction to statistics
a. Definition
b. Type of statistical procedure and Variables
c. Population and Sample
d. Type of measurement

2. Descriptive Statistics
1. Central Tendency
2. Dispersion and Data Position
3. The Shape of Distribution

2
Introduction: What is Statistics?

3
What is Meant by Statistics?

Statistics is the science of


collecting, organizing, presenting,
analyzing, and interpreting
numerical data to assist in
making more effective decisions.
Why Study Statistics?
1. Numerical information is everywhere
2. Statistical techniques are used to make decisions that
affect our daily lives
3. The knowledge of statistical methods will help you
understand how decisions are made and give you a better
understanding of how they affect you.

No matter what line of work you select, you will find yourself
faced with decisions where an understanding of data
analysis is helpful.
Who Uses Statistics?

Statistical techniques are used extensively


by marketing, accounting, quality control,
consumers, professional sports people,
hospital administrators, educators,
politicians, physicians, etc...
The Role of Statistics in Managerial Decision
Making
 Statistical literacy is necessary today to make
informed decisions both at work and at home
 Requires statistical thinking to critically
assess data and the inferences drawn from it
 Statistical thinking assists you in identifying
research resulting from unethical statistical
practices
The Role of Statistics in Managerial Decision
Making
 Common Sources of Error in Survey Data

 Selection bias – exclusion of a subset of the population of


interest prior to sampling
 Non-response bias – introduced when responses are not
gotten from all sample members
 Measurement error – inaccuracy in recorded data. Can be
due to survey design, interviewer impact, or a transcription
error
Types of Statistics – Descriptive Statistics and
Inferential Statistics
Descriptive Statistics - methods of organizing,
summarizing, and presenting data in an informative
way.

EXAMPLE 1: The United States government reports the population of the


United States was 179,323,000 in 1960; 203,302,000 in 1970;
226,542,000 in 1980; 248,709,000 in 1990, and 265,000,000 in 2000.

EXAMPLE 2: According to the Bureau of Labor Statistics, the average hourly


earnings of production workers was $17.90 for April 2008.
Types of Statistics – Descriptive Statistics and
Inferential Statistics

Inferential Statistics: A decision, estimate, prediction, or


generalization about a population, based on a sample.

Note: In statistics the word population and sample have a broader


meaning. A population or sample may consist of individuals or
objects
Population versus Sample
A population is a collection of all possible individuals, objects, or
measurements of interest.

A sample is a portion, or part, of the population of interest

Why take a sample instead of studying every member of the population?


Summary of Types of Variables
Four Levels of Measurement
Nominal level - data that is classified
into categories and cannot be Interval level - similar to the ordinal level,
arranged in any particular order. with the additional property that
meaningful amounts of differences
between data values can be
EXAMPLES: eye color, gender, determined. There is no natural zero
religious affiliation. point.
EXAMPLE: Temperature on the
Fahrenheit scale.

Ordinal level – data arranged in some


order, but the differences between Ratio level - the interval level with an
data values cannot be determined inherent zero starting point.
Differences and ratios are meaningful
or are meaningless. for this level of measurement.
EXAMPLE: During a taste test of 4 soft drinks, EXAMPLES: Monthly income of surgeons, or
Mellow Yellow was ranked number 1, Sprite distance traveled by manufacturer’s
number 2, Seven-up number 3, and Orange representatives per month.
Crush number 4.
Why Know the Level of Measurement of a
Data?
 The level of measurement of the data dictates the
calculations that can be done to summarize and present
the data.
 To determine the statistical tests that should be performed
on the data
Summary of the Characteristics for Levels of
Measurement
Data and Statistic Presentation

16
Frequency Distribution
FREQUENCY DISTRIBUTION A grouping of data into
mutually exclusive classes showing the number of
observations in each class.

Class interval: The class interval is obtained by subtracting the


lower limit of a class from the lower limit of the
next class.

Class frequency: The number of observations in each class.

Class midpoint: A point that divides a class into two equal parts.
This is the average of the upper and lower class
limits
EXAMPLE – Creating a Frequency Distribution Table

Ms. Kathryn Ball of AutoUSA


wants to develop tables, charts,
and graphs to show the typical
selling price on various dealer
lots. The table on the right
reports only the price of the 80
vehicles sold last month at
Whitner Autoplex.
Constructing a Frequency Table - Example
 Step 1: Decide on the number of
classes.
A useful recipe to determine the number of
classes (k) is the “2 to the k rule.” such
that 2k > n.

 Step 2: Determine the class


interval or width.
The formula is: i  (H-L)/k where i is the
class interval, H is the highest observed
value, L is the lowest observed value, and
k is the number of classes.

 Step 3: Set the individual class


limits
Constructing a Frequency Table

 Step 4: Tally the vehicle


selling prices into the
classes.

 Step 5: Count the number


of items in each class.
Relative Frequency Distribution
To convert a frequency distribution to a relative frequency distribution,
each of the class frequencies is divided by the total number of
observations.
Graphic Presentation of a Frequency
Distribution
The three commonly used graphic forms are:
 Histograms
 Frequency polygons
 Cumulative frequency distributions
Histogram
HISTOGRAM A graph in which the classes are marked on the
horizontal axis and the class frequencies on the vertical axis. The
class frequencies are represented by the heights of the bars and the
bars are drawn adjacent to each other.
Describing Quantitative Discrete Data
 Frequency Distribution and Bar Graph
3,000,000 30.00
Anggota Rumah Freq.
Tangga terboboti Percent Cum.
2,500,000 25.00

1 724,283 7.69 7.69


2 1,551,283 16.47 24.15 2,000,000 20.00

Rmh tangga
3 2,384,693 25.31 49.47
4 2,479,664 26.32 75.79
1,500,000 15.00
5 1,313,154 13.94 89.72
6 612,157 6.50 96.22
7 245,706 2.61 98.83 1,000,000 10.00
8 54,279 0.58 99.41
9 37,852 0.40 99.81
500,000 5.00
10 10,426 0.11 99.92
11 546 0.01 99.92
12 4,958 0.05 99.98 0 0.00
13 2,235 0.02 100
1 2 3 4 5 6 7 8 9 10 11 12 13

Total 9,421,235 100 ART


Histogram of Quantitative Data
 Frequency of Grouped Quantitative Data goldprices
Class ` Freq
1000.2 - 1025.2 14
1025.2 - 1050.2 8
1050.2 - 1075.2 43
1075.2 - 1100.2 62
1100.2 - 1125.2 75
1125.2 - 1150.2 76
1150.2 - 1175.2 86
1175.2 - 1200.2 151
1200.2 - 1225.2 217
1225.2 - 1250.2 233
1250.2 - 1275.2 197
1275.2 - 1300.2 272
1300.2 - 1325.2 204
1325.2 - 1350.2 173
1350.2 - 1375.2 73
1375.2 - 1400.2 63
1400.2 - 1425.2 40
1425.2 - 1450.2 20
1450.2 - 1475.2 18

Class Interval : 25;


Histogram and Line Graphs
 Descriptive Statistics - describe collected data
min 1,000.20
p1 1,046.95
p5 1,092.90
p10 1,125.26
p25 1,198.85
mean 1,250.87
p50 1,255.56
p75 1,312.63
p90 1,353.15
p95 1,389.02
p99 1,434.40
sd 87.29
max 1,474.91
kurtosis 2.9018
skewness -0.2520

Data berdistribusi hampir simetris, walaupun agak skew ke kiri


Data mendekati distribusi normal, kurtosis mendakati 3
Histogram
 Pareto Diagram

Anggota Rmh 6,000,000 60.00


Tangga Freq Percent
5,000,000 50.00

3-4 4,864,357 51.63 4,000,000 40.00

percent
3,000,000 30.00

HH
1-2 2,275,566 24.15
2,000,000 20.00
5-6 1,925,311 20.44 1,000,000 10.00

7-9 337,837 3.59 0 0.00


3-4 1-2 5-6 7-9 >=10

>=10 18,165 0.19 number of people in HH


Frequency Polygon
 A frequency polygon
also shows the shape
of a distribution and is
similar to a histogram.

 It consists of line
segments connecting
the points formed by
the intersections of the
class midpoints and the
class frequencies.
Cumulative Frequency Distribution
Cumulative Frequency Distribution
Ogive diagram
Pie Charts
PIE CHART A chart that shows the proportion or percent
that each class represents of the total number of
frequencies.
Pie Charts
 Describing Qualitative Data
Anggota
Rmh Freq Percent
5-6 1-2
Tangga 20%
7-9 >=10
4% 0% 24%

1-2 2,275,566 24.15


3-4 4,864,357 51.63
5-6 1,925,311 20.44
7-9 337,837 3.59
3-4
>=10 18,165 0.19 52%

9,421,237 100.00

1-2 3-4 5-6 7-9 >=10


Measurement of Central Tendency
(Mean, Median, and Mode)

33
Parameter Versus Statistics
PARAMETER A measurable characteristic of a population.

STATISTIC A measurable characteristic of a sample.


Arithmetic Mean or Mean
Characteristic of the Mean
 The arithmetic mean is the most widely used

measure of location.
 Requires the interval scale.

 Major characteristics:

 All values are used.


 It is unique.
 The sum of the deviations from the mean is 0.
 It is calculated by summing the values and dividing
by the number of values.
Population Mean
For ungrouped data, the population mean is the
sum of all the population values divided by the
total number of population values:
Properties of the Arithmetic Mean
1. Every set of interval-level and ratio-level data
has a mean.
2. All the values are included in computing the
mean.
3. The mean is unique.
4. The sum of the deviations of each value from the
mean is zero.
Sample Mean
 For ungrouped data, the sample mean
is the sum of all the sample values
divided by the number of sample
values:
Weighted Mean
 The weighted mean of a set of numbers X1,
X2, ..., Xn, with corresponding weights w1, w2,
...,wn, is computed from the following
formula:
EXAMPLE – Weighted Mean
The Carter Construction Company pays its hourly
employees $16.50, $19.00, or $25.00 per hour.
There are 26 hourly employees, 14 of which are paid
at the $16.50 rate, 10 at the $19.00 rate, and 2 at the
$25.00 rate. What is the mean hourly rate paid the
26 employees?
The Geometric Mean
 Useful in finding the average change of percentages, ratios, indexes, or growth rates
over time.
 It has a wide application in business and economics because we are often interested in
finding the percentage changes in sales, salaries, or economic figures, such as the
GDP, which compound or build on each other.
 The geometric mean will always be less than or equal to the arithmetic mean.
 The formula for the geometric mean is written:

EXAMPLE:
Suppose you receive a 5 percent increase in salary this year and a 15 percent
increase next year. The average annual percent increase is 9.886, not 10.0. Why is
this so? We begin by calculating the geometric mean.

GM  ( 1.05 )( 1.15 )  1.09886


The Geometric Mean – Finding an Average Percent
Change Over Time
EXAMPLE
During the decade of the 1990s, and into the 2000s, Las Vegas, Nevada, was the fastest-growing city
in the United States. The population increased from 258,295 in 1990 to 552,539 in 2007. This is an
increase of 294,244 people or a 13.9 percent increase over the 17-year period.
What is the average annual increase?

Value at end of period


GM  n 1
Value at start of period
552,539
 17 1
258,295
 1.0457  1
 0.0457
The Median
MEDIAN The midpoint of the values after they have been ordered from the
smallest to the largest, or the largest to the smallest.

PROPERTIES OF THE MEDIAN


1. There is a unique median for each data set.
2. It is not affected by extremely large or small values and is
therefore a valuable measure of central tendency when such
values occur.
3. It can be computed for ratio-level, interval-level, and ordinal-
level data.
4. It can be computed for an open-ended frequency distribution if
the median does not lie in an open-ended class.
EXAMPLES - Median
The ages for a sample of The heights of four basketball
five college students players, in inches, are:
are: 76, 73, 80, 75
21, 25, 19, 20, 22
Arranging the data in ascending
order gives:
Arranging the data in
ascending order gives: 73, 75, 76, 80.

Thus the median is 75.5


19, 20, 21, 22, 25.

Thus the median is 21.


The Mode
MODE The value of the observation that appears most frequently.
Example - Mode
The Relative Positions of the Mean, Median
and the Mode
Numerical Measures of Variability or Dispersion

48
Variance and Standard Deviation
VARIANCE The arithmetic mean of the squared deviations
from the mean.

STANDARD DEVIATION The square root of the variance.

 The variance and standard deviations are nonnegative and are zero
only if all observations are the same.
 For populations whose values are near the mean, the variance and
standard deviation will be small.
 For populations whose values are dispersed from the mean, the
population variance and standard deviation will be large.
 The variance overcomes the weakness of the range by using all the
values in the population
Variance – Formula and Computation

Steps in Computing the Variance.

Step 1: Find the mean.


Step 2: Find the difference between each observation and the mean, and square that
difference.
Step 3: Sum all the squared differences found in step 3
Step 4: Divide the sum of the squared differences by the number of items in the
population.
EXAMPLE – Variance and Standard Deviation
The number of traffic citations issued during the last five months in
Beaufort County, South Carolina, is reported below:

What is the population variance?

Step 1: Find the mean.   x 19  17  ...  34  10 348


   29
N 12 12

Step 2: Find the difference between each observation and the mean,
and square that difference.
Step 3: Sum all the squared differences found in step 3
Step 4: Divide the sum of the squared differences by the number of
items in the population.
EXAMPLE – Variance and Standard Deviation

The number of traffic citations issued during the last five months in
Beaufort County, South Carolina, is reported below:

What is the population variance?

Step 2: Find the difference between each


observation and the mean,
and square that difference.

Step 3: Sum all the squared differences found in step 3

Step 4: Divide the sum of the squared differences


by the number of items in the population.

2 
 (X  ) 2


1,488
 124
N 12
Sample Variance

Where :
s 2 is the sample variance
X is the value of each observatio n in the sample
X is the mean of the sample
n is the number of observations in the sample
EXAMPLE – Sample Variance
The hourly wages for
a sample of part-time
employees at Home
Depot are: $12, $20,
$16, $18, and $19.
What is the sample
variance?
Sample Standard Deviation

Where :
s 2 is the sample variance
X is the value of each observatio n in the sample
X is the mean of the sample
n is the number of observations in the sample
Interpreting the Standard Deviation
 You have purchased compact fluorescent light bulbs for your home.
Average life length is 500 hours, standard deviation is 24, and
frequency distribution for the life length is mound shaped. One of your
bulbs burns out at 450 hours. Would you send the bulb back for a
refund?
Interval Range % of observations % of observations
included excluded
±1s 476
476 -- 524
524 Approximately
Approximately 68%
68% Approximately
Approximately 32%
32%
± 2s 452
452 -- 548
548 Approximately
Approximately 95%
95% Approximately
Approximately 5%
5%
± 3s 428 Approximately
Approximately 99.7% Approximately
Approximately 0.3%
428 -- 572
572 99.7% 0.3%
The Empirical Data: Interpreting the
Standard Deviation
Numerical Measures of Relative Standing
 Percentile rankings make use of the pth percentile
 The median is an example of percentiles.
 Median is the 50th percentile – 50% of observations lie
above it, and 50% lie below it
 For any p, the pth percentile has p% of the measures lying
below it, and (100-p)% above it
 Location/position of pth percentile:

Then find the pth of data: fit the exact or estimate percentile data
Percentiles – Example (cont.)
Step 1: Organize the data from lowest to largest value
$1,460 $1,471 $1,637 $1,721
$1,758 $1,787 $1,940 $2,038
$2,047 $2,054 $2,097 $2,205
$2,287 $2,311 $2,406

Step 2: Compute the first and third quartiles. Locate L25 and L75 using

25 75
L25  (15  1) 4 L75  (15  1)  12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respective ly
L25  $1,721
L75  $2,205
Skewness

60
Skewness
 To measure the central location of a set of data: the mean,
median, and mode
 To measure data dispersion: range and the standard
deviation
 Another characteristic of a set of data is the shape.
 There are four shapes commonly observed:
 symmetric,
 positively skewed,
 negatively skewed,
 bimodal.

 To measure the shape of data: Skewness


Skewness - Formulas for Computing
Properties of skewness:
 the coefficient of skewness can range from -3 up to 3.
 A value near -3, indicates considerable negative skewness.
 A value such as 1.63 indicates moderate positive skewness.
 A value of 0, which will occur when the mean and median are equal,
indicates the distribution is symmetrical and that there is no skewness
present.
Commonly Observed Shapes
Skewness – An Example
 Following are the earnings per share for a sample of 15
software companies for the year 2007. The earnings per
share are arranged from smallest to largest.

 Compute the mean, median, and standard deviation. Find


the coefficient of skewness using Pearson’s estimate.
 What is your conclusion regarding the shape of the
distribution?
Skewness – An Example Using Pearson’s Coefficient

Step 1: Compute the Mean

X
X 
$74.26
 $4.95
n 15

Step 2 : Compute the Standard Deviation

s
 XX   2


($0.09  $4.95) 2  ...  ($16.40  $4.95) 2 )
 $5.22
n 1 15  1

Step 3 : Find the Median


The middle value in the set of data, arranged from smallest to largest is 3.18

Step 3 : Compute the Skewness


3( X  Median ) 3($4.95  $3.18)
sk    1.017
s $5.22
Thank You
66

You might also like