Data Analitics For Business: Descriptive Statistics
Data Analitics For Business: Descriptive Statistics
Descriptive Statistics
1
Outline of Lecturing
1. Introduction to statistics
a. Definition
b. Type of statistical procedure and Variables
c. Population and Sample
d. Type of measurement
2. Descriptive Statistics
1. Central Tendency
2. Dispersion and Data Position
3. The Shape of Distribution
2
Introduction: What is Statistics?
3
What is Meant by Statistics?
No matter what line of work you select, you will find yourself
faced with decisions where an understanding of data
analysis is helpful.
Who Uses Statistics?
16
Frequency Distribution
FREQUENCY DISTRIBUTION A grouping of data into
mutually exclusive classes showing the number of
observations in each class.
Class midpoint: A point that divides a class into two equal parts.
This is the average of the upper and lower class
limits
EXAMPLE – Creating a Frequency Distribution Table
Rmh tangga
3 2,384,693 25.31 49.47
4 2,479,664 26.32 75.79
1,500,000 15.00
5 1,313,154 13.94 89.72
6 612,157 6.50 96.22
7 245,706 2.61 98.83 1,000,000 10.00
8 54,279 0.58 99.41
9 37,852 0.40 99.81
500,000 5.00
10 10,426 0.11 99.92
11 546 0.01 99.92
12 4,958 0.05 99.98 0 0.00
13 2,235 0.02 100
1 2 3 4 5 6 7 8 9 10 11 12 13
percent
3,000,000 30.00
HH
1-2 2,275,566 24.15
2,000,000 20.00
5-6 1,925,311 20.44 1,000,000 10.00
It consists of line
segments connecting
the points formed by
the intersections of the
class midpoints and the
class frequencies.
Cumulative Frequency Distribution
Cumulative Frequency Distribution
Ogive diagram
Pie Charts
PIE CHART A chart that shows the proportion or percent
that each class represents of the total number of
frequencies.
Pie Charts
Describing Qualitative Data
Anggota
Rmh Freq Percent
5-6 1-2
Tangga 20%
7-9 >=10
4% 0% 24%
9,421,237 100.00
33
Parameter Versus Statistics
PARAMETER A measurable characteristic of a population.
measure of location.
Requires the interval scale.
Major characteristics:
EXAMPLE:
Suppose you receive a 5 percent increase in salary this year and a 15 percent
increase next year. The average annual percent increase is 9.886, not 10.0. Why is
this so? We begin by calculating the geometric mean.
48
Variance and Standard Deviation
VARIANCE The arithmetic mean of the squared deviations
from the mean.
The variance and standard deviations are nonnegative and are zero
only if all observations are the same.
For populations whose values are near the mean, the variance and
standard deviation will be small.
For populations whose values are dispersed from the mean, the
population variance and standard deviation will be large.
The variance overcomes the weakness of the range by using all the
values in the population
Variance – Formula and Computation
Step 2: Find the difference between each observation and the mean,
and square that difference.
Step 3: Sum all the squared differences found in step 3
Step 4: Divide the sum of the squared differences by the number of
items in the population.
EXAMPLE – Variance and Standard Deviation
The number of traffic citations issued during the last five months in
Beaufort County, South Carolina, is reported below:
2
(X ) 2
1,488
124
N 12
Sample Variance
Where :
s 2 is the sample variance
X is the value of each observatio n in the sample
X is the mean of the sample
n is the number of observations in the sample
EXAMPLE – Sample Variance
The hourly wages for
a sample of part-time
employees at Home
Depot are: $12, $20,
$16, $18, and $19.
What is the sample
variance?
Sample Standard Deviation
Where :
s 2 is the sample variance
X is the value of each observatio n in the sample
X is the mean of the sample
n is the number of observations in the sample
Interpreting the Standard Deviation
You have purchased compact fluorescent light bulbs for your home.
Average life length is 500 hours, standard deviation is 24, and
frequency distribution for the life length is mound shaped. One of your
bulbs burns out at 450 hours. Would you send the bulb back for a
refund?
Interval Range % of observations % of observations
included excluded
±1s 476
476 -- 524
524 Approximately
Approximately 68%
68% Approximately
Approximately 32%
32%
± 2s 452
452 -- 548
548 Approximately
Approximately 95%
95% Approximately
Approximately 5%
5%
± 3s 428 Approximately
Approximately 99.7% Approximately
Approximately 0.3%
428 -- 572
572 99.7% 0.3%
The Empirical Data: Interpreting the
Standard Deviation
Numerical Measures of Relative Standing
Percentile rankings make use of the pth percentile
The median is an example of percentiles.
Median is the 50th percentile – 50% of observations lie
above it, and 50% lie below it
For any p, the pth percentile has p% of the measures lying
below it, and (100-p)% above it
Location/position of pth percentile:
Then find the pth of data: fit the exact or estimate percentile data
Percentiles – Example (cont.)
Step 1: Organize the data from lowest to largest value
$1,460 $1,471 $1,637 $1,721
$1,758 $1,787 $1,940 $2,038
$2,047 $2,054 $2,097 $2,205
$2,287 $2,311 $2,406
Step 2: Compute the first and third quartiles. Locate L25 and L75 using
25 75
L25 (15 1) 4 L75 (15 1) 12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respective ly
L25 $1,721
L75 $2,205
Skewness
60
Skewness
To measure the central location of a set of data: the mean,
median, and mode
To measure data dispersion: range and the standard
deviation
Another characteristic of a set of data is the shape.
There are four shapes commonly observed:
symmetric,
positively skewed,
negatively skewed,
bimodal.
X
X
$74.26
$4.95
n 15
s
XX 2
($0.09 $4.95) 2 ... ($16.40 $4.95) 2 )
$5.22
n 1 15 1