Module 8 Measures of Variation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Statistics

MODULE 8: Measures of Variation

LEARNING OUTCOMES

At the end of the module, you are expected to exhibit the following competencies:
1. Recognise the concepts of measures of dispersion.
2. Describe the strengths and limitations of measures of dispersion.
3. Calculate measures of dispersion.

IMPORTANT CONCEPTS

Introduction

Stocks are shares of ownership in a company. When people buy stocks they become part owners of the
company, whether in terms of profits or losses of the company.

The history of performance of a particular stock maybe a useful guide to what may be expected of its
performance in the foreseeable future. This is of course, a very big assumption, but we have to assume it
anyway.

The following data to represent the rates of return for two stocks, which we will call Stock A and Stock B.

Year Stock A Stock B Year Stock A Stock B


2005 0.081 0.214 2010 0.241 0.081
2006 0.231 0.193 2011 0.193 0.181
2007 0.214 0.132 2012 0.133 0.230
2008 0.214 0.073 2013 0.071 0.214
2009 0.181 0.066 2014 0.066 0.241

The rate of return is defined as the increase in value of the portfolio (including any dividends or other
distributions) during the year divided by its value at the beginning of the year. For instance, if the parents of
Juana dela Cruz invests 50,000 pesos in a stock at the beginning of the year, and the value of the stock goes up to
60,000 pesos, thus having an increase in value of 10,000 pesos, then the rate of return here is 10,000/50,000 =
0.20

The rate of return may be positive or negative. It represents the fraction by which your wealth would have
changed had it been invested in that particular combination of securities.

Now, let us compute some measures of locations that we learned in previous lessons to describe the data given
above.

Module 8 Page 1 of 6
Measures of Variation

Notice that there are no differences in the computed summary statistics but the trend and actual values of the
rate of returns for the two stocks are different as depicted in the line graph. Such observation tells us that it is not
enough to simply use measures of location to describe a data set. We need additional measures such as measures
of variation or dispersion to describe further the data sets.

In particular, summary measures of variability (such as the range and the standard deviation) of the rates of
return are used to measure risk associated with investment. We could use measures of variation to decide
whether it would make any difference if we decide to invest wholly in Stock A, wholly in Stock B, or half of our
investments in Stock A and another half in Stock B. In general, there is higher risk in investing if the rate of return
fluctuates much or there is high variability in its historical values. Thus, we choose investment where the risk of
the rate of return has a small measure of dispersion.

There are two types of measures of variability or dispersion. One type is the absolute measure which includes the
range, interquartile range, variance, and standard deviation. Absolute measure of dispersion provides a measure
of variability of observations or values within a data set. On the other hand, the relative measure of dispersion
which is the other type of measure of dispersion is used to compare variability of data sets of different variables
or variables measured in different units of measurement. The coefficient of variation is a relative measure of
variability.

Absolute Measures of Dispersion: Range, Interquartile Range, Variance, and Standard Deviation

The range is a simple measure of variation defined as the difference between the maximum and minimum values.
The range depends on the extremes; it ignores information about what goes in between the smallest (minimum)
and largest (maximum) values in a data set. The larger the range, the larger is the dispersion of the data set. We
already encountered the range in previous lesson where we discussed the construction of an FDT.

Using the data on the scores of 150 Grade 11 students of a nearby Senior High School on a 50-item long test, we
could demonstrate the computation of these measures.

Module 8 Page 2 of 6
Measures of Variation

Score in a Long Test Number of Students < CF


10 4 4
16 5 9
18 5 14
20 15 29
25 19 48
30 22 70
33 18 88
38 28 116
40 10 126
42 7 133
45 8 141
50 9 150

In the above data, the maximum is 50 and the minimum is 10, hence the range is 40. But note that the range
could be easily affected by the values of the extremes as mentioned earlier as the range depends only on the
extremities. Because of this property, another measure, the interquartile range or IQR is used instead.

The interquartile range or IQR is the difference between the 3rd and the 1st quartiles. Hence, it gives you the
spread of the middle 50% of the data set. Like the range, the higher the value of the IQR, the larger is the
dispersion of the data set. Based on the computations we did in the previous lesson, the 3 rd quartile or Q3 is the
113th observation and is equal to 38 while Q 1 or P25 is the 38th observation and is equal to 25. Hence, IQR = 38 – 25
= 13.

The property of the mean when deviation or difference of each observation was obtained and summed for all the
observations we got the sum equal to zero. We said that this property shows that the deviation of the
observation from the mean cancels out indicating that the mean is indeed the center of the distribution. What if
we square the difference before we get the sum and use it to measure the spread of observations? Doing it in our
example, we have the following table:

Module 8 Page 3 of 6
Measures of Variation
So what we did is for each unique observation we subtract the mean, we refer to the difference as di, square the
difference and sum it for all observations. Note that in the table we have to multiply the square of the difference
with the number of students to account for all observations. We then divide the sum by the total number of
N

observations, denoted by N. Summarizing these steps in a formula, we have


∑ ( Xi−¿ µ)2 . We usually
i=1
¿
N
2 2
denote this expression as s or call it as variance. Thus in this example, s = 14009/150 = 93.39 For ease in
N N

computation, instead of
∑ ( Xi−¿ µ)2 we use an equivalent expression
∑ X 2i -
µ2 . When applied to our
i=1 i=1
¿
N N
N

∑ X 2iFi - 168,057
example, we have σ2 = i=1 µ2 = = 32.046672 ≅ 93.39 (rounded off).
150
N

Variance is a measure of dispersion that accounts for the average squared deviation of each observation from the
mean. Since we square the difference of each observation from the mean, the unit of measurement of the
variance is the square of the unit used in measuring each observation. Such property is a little bit problematic in
interpretation. For example, point2 or kilogram2 is difficult to interpret compared to inches 2.

Hence, instead of the variance the standard deviation is computed which is the positive square of the variance,
that is, σ2 = √ σ 2. In the example, σ2= √ 93.3933 = 9.6640. To interpret, we say that on the average, the scores of
the students deviate from the mean score of 32 points by as much as 9.6640 or approximately 10 points.

If all the observations are equal to a constant, then the mean is that constant, and the measure of variation is
zero. Furthermore, if for a given data set, the variance and standard deviation turn out to be zero, then all the
deviations from the average must be zero, which means that all observations are equal. Note that if a data set
were rescaled, that is if the observations were multiplied by some constant, then the standard deviation of the
new data set is merely the scaling factor multiplied to the standard deviation of the original data set.

The variance and standard deviation are based on all the observations items in the data set, and each item is
given a proper weight. They are extremely useful measures of variability as they measure the average scattering
of the data around the mean, that is how large data fluctuate above and below the mean. The variance and
standard deviation increase with an increase in the deviations about the mean, and decrease with decreases in
these deviations. A small standard deviation (and variance) means a high degree of uniformity in the observations
and of homogeneity in a series.

The variance is the most suitable for algebraic manipulations but as was pointed out earlier, its value is in squared
unit of measurements. On the other hand, the standard deviation has unit of measure same as with that of the
observations. Thus, standard deviation serves as the primary measure of variation, just as the mean is the
primary measure of central location.

Going back to the example on the stocks wherein we have two stocks, A and B. Both stocks have same expected
return measured by the mean. However, the standard deviation of the rates of return for Stock A is 0.0688 while
that for Stock B is 0.0685, indicating that Stock A has higher risk compared to Stock B although the difference is
not that large.
Module 8 Page 4 of 6
Measures of Variation

Relative Measure of Dispersion: Coefficient of Variation

To compare variability between or among different data sets, that is, the data sets are for different variables or
same variables but measured in different unit of measurement, the coefficient of variation (CV) is used as
σ
measure of relative dispersion. It is usually expressed as percentage and is computed as CV = x 100%. CV is a
µ
measure of dispersion relative to the mean of the data set. With and having same unit of measurement, CV is unit
less or it does not depend on the unit of measurement. Hence, it is used compare the variability across the
different data sets.

σ 9.6640
As an example, the CV of the scores of the students in the long test is computed as CV = x 100% = x
µ 32.04667
0.0688
100% = 30.16% while the CV of the rate of returns of Stock A is CV = x100% = 42.34%. Thus, we say the
0.1625
rate of returns of Stock A is more variable than the scores of the students in the test. Here, we used the CV to
compare the variability of two different data sets.

PRACTICE SKILLS

1. Three friends, Gerald, Carmina, and Rodolfo are planning their business of selling homemade peanut butter.
They start the planning by doing a market study where they obtained the prices (in pesos) of a 250-gram jar
of several known brands of peanut butter. Below is the data set they have collected:

100.80 197.60 158.00 131.60 184.40 149.20


136.00 109.60 360.40 122.80 131.60

After studying the data, Gerald said, “The prices of peanut butter are pretty similar. The range is only PhP
30.80.” Carmina said, “You are mistaken! The prices are very different. The range is PhP 259.60. Rodolfo said,
“I think you are both mistaken. The range isn’t a useful measure to describe the variation of the data set.

a. Explain what you think is the basis used by each person in support of their claims.
b. Who should we agree with? Why?

2. Three hundred students taking a basic course in Statistics are given similar final examination. After checking
the papers and while the professor is studying the distribution of the final examination scores, he taught of
several scenarios which are described below:

a. Suppose the professor will give 30% weight to the final examination, what effect would multiplying 30%
on all the final scores have on the mean of the final exam scores? On the standard deviation of the final
exam scores?

Module 8 Page 5 of 6
Measures of Variation
b. Suppose the professor wants to bloat the final examination scores, what will be the effect to the mean of
the final exam scores if 5 points will be added to each of the final score? On the standard deviation of the
final exam scores? (The mean will also go up by 5 points; while standard deviation stays

3. In a fitness center, weights of a certain group of students were taken resulting to a common weight of 140
pounds. What would be the standard deviation of the distribution of weights?

4. Determine which of the following statements is (are) TRUE or FALSE. Explain briefly your answer.

a. If each observation in a data set is doubled, then the standard deviation would also be doubled.
b. If in a set of data, positive numbers are changed to negative, while negative are changed to positive, then
the standard deviation changes its sign as well.

REFERENCES

Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, WelfredoPatungan, Nelia
Marquez), published by Rex Bookstore.

Bryant−Smith (2009): Practical Data Analysis, Second Edition. McGraw-Hill/Irvine, USA.

Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of Statistics, UP Los Baños,
College Laguna 4031

Moore, D.S. (2007). The Basic Practice of Statistics, Fourth Edition W.H. Freeman and Company.

“Range as a Measure of Variation” http://www.sharemylesson.com/teachingresource/range-as-a-measure-of-


variation-50009362

Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute of Statistics, UP Los
Baños, College Laguna 4031

Module 8 Page 6 of 6

You might also like