Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

Statistics for Business Decision

Lecture 9-10:
Correlation and Regression

Master of Management
Faculty of Economics and Business
Universitas Gadjah Mada

2019
Correlation and Regression
Correlation

Regression

Data Transformation
Correlation and Regression
Correlation

Regression

Data Transformation
Learning Objectives
▪ LO13-1: Explain the purpose of correlation analysis
▪ LO13-2: Calculate a correlation coefficient to test and interpret the
relationship between two variables
▪ LO13-3: Apply regression analysis to estimate the linear relationship
between two variables
▪ LO13-4: Evaluate the significance of the slope of the regression equation
▪ LO13-5: Evaluate a regression equation’s ability to predict using the
standard estimate of the error and the coefficient of determination
▪ LO13-6: Calculate and interpret confidence and prediction intervals
▪ LO13-7: Use a log function to transform a nonlinear relationship
Background
In this meeting, we shift the emphasis to the study of
relationships between two interval- or ratio-level variables
(such as the profit made on a car sale, the income of bank
presidents, etc.).
Background
In this meeting, we shift the emphasis to the study of
relationships between two interval- or ratio-level variables
(such as the profit made on a car sale, the income of bank
presidents, etc.).
In all business fields, identifying and
studying relationships between variables
can provide information on ways to increase
profits, methods to decrease costs, or
variables to predict demand.
Background
Examples of relationships between two variables are:
▪ Does the amount company spends per month on training its
sales force affect its monthly sales?
▪ In a study of fuel efficiency, is there a relationship between
miles per gallon an the weight of a car?
▪ Does the number of hours that students study for an exam
influence the exam score?
Relationship Between Two Variables
Graphical
Representation

Relationship between
two variables

Data
Variable 1
(Sample) 1
Scatter
Diagram
Data
Variable 2
(Sample) 2
Relationship Between Two Variables
Graphical
Representation Statistical Measures

Relationship between
two variables
Correlation

Data
Variable 1
(Sample) 1
Scatter
Covariance
Diagram
Data
Variable 2
(Sample) 2
Regression
Relationship Between Two Variables
Graphical
Representation Statistical Measures

Relationship between
two variables
Correlation

We develop numerical measures to express the


Data
Variable 1 relationship between two variables:
(Sample) 1 ▪ Is the relationship strong or weak?
▪ Is it direct or inverse?
Scatter
Covariance
Diagram
Data
Variable 2
(Sample) 2
Regression
Relationship Between Two Variables
Graphical
Representation Statistical Measures

Relationship between
two variables
Correlation

We develop numerical measures to express the


Data
Variable 1 relationship between two variables:
(Sample) 1 ▪ Is the relationship strong or weak?
▪ Is it direct or inverse?
Scatter
Covariance
Diagram
Data
Variable 2
(Sample) 2
Regression

We develop an equation to express the relationship


between variables.
This will allow us to estimate one variable on the
basis of another.
Relationship Between Two Variables
Graphical
Representation Statistical Measures

Relationship between
two variables
Correlation

Correlation coefficient
Data
Variable 1
(Sample) 1
Scatter
Covariance
Diagram
Data
Variable 2
(Sample) 2
Regression

Regression equation
Scatter Diagram
▪ A scatter diagram is a graphic tool used to portray the
relationship between two variables.
▪ Example:
A sales manager wants to know if there is a relationship between the
number of sales calls made in a month and the number of copiers sold
that month and begins the analysis with a random sample of 15 sales
representatives.
With this data, the number of sales calls is the independent variable and
number of copiers sold is the dependent variable.
Scatter Diagram

Graphing the data in a


scatter diagram will make
the relationship between
sales calls and copiers
sales easier to see.
Scatter Diagram

▪ The independent variable is


scaled on the X-axis and is
the variable used as the
predictor
▪ The dependent variable is
scaled on the Y-axis and is
the variable being estimated
Scatter Diagram

▪ It is perfectly reasonable for


the manager to tell the sales
people that the more sales
calls they make, the more
copiers they can expect to sell.
▪ Note, that while there does
seem to be a positive
relationship between the two
variables, all the points do not
fall on a line.
Correlation Analysis
How can we measure the way that two variables move together?

CORRELATION ANALYSIS
A group of techniques to measure the
relationship between two variables.
Correlation Analysis
How can we measure the way that two variables move together?
1. We start with the idea of variance (a measure of the average
dispersion around the mean):

This is intuitive: we take the squares to get rid of negative values and
then sum up in order to get the total variation.
Finally, we divide through to get the 'average variation' from the mean.
Correlation Analysis
How can we measure the way that two variables move together?
2. We identify the idea of covariance to measure the linear
association between two variables by looking at how they
vary in terms of deviations from means.

deviations from means


Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.
STEP 1: Start with a Scatter diagram

The trick here is to imagine


each 𝑋𝑖 and 𝑌𝑖 pair as a set of
co-ordinates on a plane.
Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.

STEP 2: Divide the


Scatter Plot so each
co-ordinate will be
expressed as being
above or below the
mean.
Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.
STEP 3: Look at the patterns across
quadrants; we can think of 4 cases for how
the X, Y co-ordinates will be distributed
across the quadrants:
1. Positive Correlation
2. Negative Correlation
3. Zero Correlation
4. Nonlinear Relationships
Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.
STEP 3: Look at the patterns across
quadrants:
1. Positive Correlation
Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.
STEP 3: Look at the patterns across
quadrants:
2. Negative Correlation
Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.
STEP 3: Look at the patterns across
quadrants:
3. Zero Correlation
Correlation Analysis
How can we measure the way that two variables move together?
3. We then analysis how the covariance works.
STEP 3: Look at the patterns across
quadrants:
4. Non-linear relationship
▪ It's a curve rather than a line;
▪ Algebraically, the value of the
derivative can be different for
different values of X
Correlation Analysis
This is what we mean when we say that covariance is a
measure of linear association.

HOWEVER COVARIANCE IS NOT QUITE 'CORRELATION‘!!!


Correlation Analysis
This is what we mean when we say that covariance is a
measure of linear association.

HOWEVER COVARIANCE IS NOT QUITE 'CORRELATION‘!!!


Correlation Analysis
This is what we mean when we say that covariance is a
measure of linear association.

HOWEVER COVARIANCE IS NOT QUITE 'CORRELATION‘!!!

▪ It's hard to interpret this number; what kind of units is the covariance measuring?
▪ Imagine that X is weight in kilograms and Y is number of hamburgers eaten in the past month.
Then the covariance is measuring a new unit of measurement called kilograms*hamburgers.
Correlation Analysis
We resolve this by normalizing the covariance by another number

𝟏 ഥ 𝒀𝒊 − 𝒀
σ𝒏𝒊=𝟏 𝑿𝒊 − 𝑿 ഥ
𝐧𝐨𝐫𝐦𝐚𝐥𝐢𝐳𝐞𝐝_𝐜𝐨𝐯(𝐗, 𝐘) =
𝒏
ഥ 𝟐 /𝒏 σ𝒏𝒊=𝟏 𝒀𝒊 − 𝒀
σ𝒏𝒊=𝟏 𝑿𝒊 − 𝑿 ഥ 𝟐 /𝒏

𝒏
𝟏 ഥ
𝑿𝒊 − 𝑿 ഥ
𝒀𝒊 − 𝒀
= ෍
𝒏 𝒔𝒙 𝒔𝒚
𝒊=𝟏
Coefficient Correlation
We can rewrite the formula to get the CORRELATION COEFFICIENT
𝟏 𝒏 ഥ
𝑿 𝒊 −𝑿 ഥ
𝒀 𝒊 −𝒀
𝐫= σ𝒊=𝟏
𝒏 𝒔𝒙 𝒔𝒚

𝟏 ഥ
𝑿 𝒊 −𝑿 ഥ
𝒀 𝒊 −𝒀
For sample data 𝐫= σ𝒏𝒊=𝟏
𝒏−𝟏 𝒔𝒙 𝒔𝒚
Coefficient Correlation
We can rewrite the formula to get the CORRELATION COEFFICIENT
𝟏 𝒏 ഥ
𝑿 𝒊 −𝑿 ഥ
𝒀 𝒊 −𝒀
𝐫= σ𝒊=𝟏
𝒏 𝒔𝒙 𝒔𝒚

𝟏 ഥ
𝑿 𝒊 −𝑿 ഥ
𝒀 𝒊 −𝒀
For sample data 𝐫= σ𝒏𝒊=𝟏
𝒏−𝟏 𝒔𝒙 𝒔𝒚
ഥ 𝒀𝒊 −𝒀
σ 𝑿𝒊 −𝑿 ഥ It represents the average level
For sample data
𝐫= of observed joint variation.

(𝒏−𝟏)𝒔𝒙 𝒔𝒚 It represents the maximum


possible joint variation between
X and Y.
Coefficient Correlation
We can rewrite the formula to get the CORRELATION COEFFICIENT
ഥ 𝒀𝒊 −𝒀
σ 𝑿𝒊 −𝑿 ഥ
𝐫=
(𝒏−𝟏)𝒔𝒙 𝒔𝒚

CORRELATION COEFFICIENT (𝐫)


▪ The sample correlation coefficient is identified as r
▪ It shows the direction and strength of the linear relationship between two
interval- or ratio-scale variables
Characteristics of Coefficient Correlation
Characteristics of the correlation
coefficient are
▪ It ranges from -1.00 to 1.00
▪ A value near 1.00 indicates a
direct or positive correlation
▪ A value near -1.00 indicates
a negative correlation
Characteristics of Coefficient Correlation

▪ If there is absolutely no relationship between the two sets of variables, the r is zero.
▪ if the r is below 0.5, it is considered weak relationship.
▪ if the r is above 0.5, it is considered strong relationship.
Characteristics of Coefficient Correlation

▪ if the correlation is weak, there is considerable scatter about a line drawn


through the center of the data.
▪ If the correlation is strong, there is very little scatter about the line.
Coefficient Correlation: An Example
How is the correlation coefficient determined?
We’ll use the North American Copier We begin with a scatter diagram, but this time we’ll draw a
Sales as an example. vertical line at the mean of the x-values (96 sales calls) and a
horizontal line at the mean of the y-values (45 copiers).
Coefficient Correlation: An Example
How is the correlation coefficient determined?
We’ll use the North American Copier
Sales as an example.

Now we find the deviations from the mean number of sales calls
and the mean number of copiers sold; then multiply the them.
The sum of their product is 6,672 and will be used in the
coefficient correlation formula to find r.
We also need the standard deviations.
6672
r= = 0.865
(15−1)(42.76)(12.89)

The result, r=.865 indicates a strong, positive relationship.


Testing the Significance of r
▪ Recall that the sales manager from North American Copier Sales found
an r of 0.865
▪ Could the result be due to sampling error? Remember only 15 sales
people were sampled
▪ We ask the question, could there be zero correlation in the population
from which the sample was selected?
▪ We’ll let "ρ" represent the correlation in the population and conduct a
hypothesis test to find out.
Testing the Significance of r
▪ Step 1: State the null and the alternate hypothesis
𝐻0 : 𝜌 = 0 The correlation in the population is zero
𝐻1 : 𝜌 ≠ 0 The correlation in the population is different from zero

▪ Step 2: Select the level of significance, we’ll use .05


▪ Step 3: Select the test statistic, we use t
Testing the Significance of r
▪ Step 4: Formulate the decision rule, reject 𝐻0 if 𝑡 < 2.16 or 𝑡 >
2.16
Testing the Significance of r
▪ Step 5: Make decision, reject 𝐻0 , 𝑡 = 6.216

▪ Step 6: Interpret, there is correlation with respect to the


number of sales calls made and the number of copiers sold in
the population of sales people.
Correlation and Regression
Correlation

Regression

Data Transformation
Regression
Regression analysis:
▪ It is concerned with the study of the dependence of one
variable on one (or more) other variable(s)
▪ It estimates one variable based on another variable
Regression
Regression analysis:
▪ It is concerned with the study of the dependence of one
variable on one (or more) other variable(s)
▪ It estimates one variable based on another variable

(y) (x)
The variable being estimated The variable used to make the
is the dependent variable estimate or predict the value is the
independent variable
Regression
Regression analysis:
▪ It is concerned with the study of the dependence of one
variable on one (or more) other variable(s)
▪ It estimates one variable based on another variable

(y) (x)

We are interested in “explaining y in terms of x,” or


in “studying how y varies with changes in x.”
Regression
Regression analysis:
▪ It is concerned with the study of the dependence of one
variable on one (or more) other variable(s)
▪ It estimates one variable based on another variable

▪ The relationship between the variables is linear


▪ Both the independent and the dependent
variables must be interval or ratio scale
Regression
Regression analysis:

▪ In regression analysis, our


objective is to use the data to
position a line that best
represents the relationship
between two variables
Regression
Regression analysis: Regression Line

▪ The first approach is to use a


scatter diagram to visually
position the line.
▪ The lines drawn in the chart
on the right represents
different judgement.
Regression
Regression analysis: Regression Line

▪ We would prefer a method


that results in a single, best
regression line.
▪ The method that results in a
single, best regression line is
called the least squares
principle.
Least Squares Principle
A mathematical procedure that uses the data to position a line with
the objective of minimizing the sum of the squares of the vertical
distances between the actual y values and the predicted values of y.

▪ The line drawn in chart 13-9 is the best fitting


line and is drawn using the least squares
method.
▪ It is the best fitting because the sum of the
squares of the vertical deviations about it is at a
minimum; the sum of the squares is 24.
Least Squares Principle

Chart 13-10 and 13-11 was drawn differently


and their sum of the squares is 44 and 132
respectively; They are not the best fitting lines .
Least Squares Regression Line
𝑦

𝑦𝑖 Prediction line that we want to fit


𝑦ො𝑖 = 𝛼 + 𝛽𝑥𝑖

𝑦ො is the estimated value of y for a selected value of x


𝛼 is the constant or intercept
𝛽 is the slope of the fitted line
𝑥 is the value of the independent variable

𝑥
𝑥1 𝑥2 𝑥3 𝑥4
Least Squares Method
The method accommodates us to minimize the total distance
associated with the gap between the actual values Y and fitted
values 𝑦ො
Least Squares Method
▪ We take squares of the distance (𝑦 − 𝑦)ො so that we treat distances above
and below the line equally (i.e. we get rid of negative values).
▪ Hence we form the following objective function for minimizing the
distance:

▪ We will minimize this objective function with respect to a linear


function with two parameters 𝛼 and 𝛽.
Least Squares Method
▪ The objective function in terms of the prediction line:

▪ It is the 'sum of squared residuals‘ (the total distance between the


prediction line and the actual values).
▪ We want to minimize this distance so we define first order conditions
(FOC) with respect to the parameters 𝛼 and 𝛽.
Least Squares Method
The FOC=0
Least Squares Estimators
The result:

σ𝑵 ഥ 𝒚𝒊 − 𝒚
𝒊=𝟏 𝒙𝒊 − 𝒙 ഥ 𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 (𝑿, 𝒀)
𝜷= =
σ𝑵 𝒙
𝒊=𝟏 𝒊 − ഥ
𝒙 𝟐 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 (𝑿)

ഥ − 𝜷ഥ
𝜶=𝒚 𝒙
Least Squares Estimators
The result:

σ𝑵 ഥ 𝒚𝒊 − 𝒚
𝒊=𝟏 𝒙𝒊 − 𝒙 ഥ 𝒔𝒚
𝜷= =𝒓
σ𝑵 𝒙
𝒊=𝟏 𝒊 − ഥ
𝒙 𝟐 𝒔𝒙

ഥ − 𝜷ഥ
𝜶=𝒚 𝒙
Least Squares Regression Example
Recall the example of North American Copier Sales. The sales manager
gathered information on the number of sales calls made and the number of
copiers sold.
Use the least squares method to determine a linear equation to express the
relationship between the two variables.
Least Squares Regression Example
Use the least squares method to determine a linear equation to express the
relationship between the two variables.
1. The first step is to find the slope of the least squares regression line, 𝛽

𝒔𝒚 𝟏𝟐. 𝟖𝟗
𝜷=𝒓 = 𝟎. 𝟖𝟔𝟓 = 𝟎. 𝟐𝟔𝟎𝟖
𝒔𝒙 𝟒𝟐. 𝟕𝟔

▪ The 𝛽 value of .2608 indicates that for each additional sales call, the sales representative can expect to
increase the number of copiers sold by about .2608.
▪ So 20 additional sales calls in a month will result in about five more copiers being sold.
Least Squares Regression Example
Use the least squares method to determine a linear equation to express the
relationship between the two variables.
2. The second step is to find 𝛼
ഥ − 𝜷ഥ
𝜶=𝒚 𝒙
𝜶 = 𝟒𝟓 − 𝟎. 𝟐𝟔𝟎𝟖 𝟗𝟔 = 𝟏𝟗. 𝟗𝟔𝟑
Least Squares Regression Example
Use the least squares method to determine a linear equation to express the
relationship between the two variables.
3. Then determine the regression line
ෝ = 𝟏𝟗. 𝟗𝟔𝟑 + 𝟎. 𝟐𝟔𝟎𝟖𝒙
𝒚
So if a salesperson makes 100 calls, he or she can expect to sell
46.0432 copiers

ෝ = 𝟏𝟗. 𝟗𝟔𝟑 + 𝟎. 𝟐𝟔𝟎𝟖 𝟏𝟎𝟎 = 𝟒𝟔. 𝟎𝟒𝟑𝟐


𝒚
Drawing the Regression Line

The line of regression is drawn on the scatter


diagram.
The regression line will always pass through the
mean of variables x and y.
Plus, there is no other line through the data
where the sum of the deviations is smaller.
Drawing the Regression Line
Estimated sales for all sales representatives are calculated using the
formula we determined earlier and placed in the table.
Regression Equation Slope Test
▪ The next step is to conduct a test of hypothesis to see if the slope of the
regression line is different from zero.
▪ If we find that the slope is different from zero, then we can conclude that
using the regression equation adds to our ability to predict the dependent
variable based on the independent variable.
▪ The test is equivalent to the test for the correlation coefficient. We use 𝛽ሶ
to represent the population slope.
Regression Equation Slope Test
▪ For a regression equation, the slope is tested for significance
▪ We test the hypothesis that the slope of the line in the population is 0
▪ If we do not reject the null hypothesis, we conclude there is no
relationship between the two variables
▪ When testing the null hypothesis about the slope, the test statistic is with
n – 2 degrees of freedom
▪ We begin with the hypothesis statements
𝐻0 : 𝛽ሶ = 0 The slope of the population is zero
𝐻1 : 𝛽ሶ ≠ 0 The slope of the population is different from zero
Regression Equation Slope Test
▪ For a regression equation, the slope is tested for significance
▪ We test the hypothesis that the slope of the line in the population is 0
▪ If we do not reject the null hypothesis, we conclude there is no
relationship between the two variables
▪ When testing the null hypothesis about the slope, the test statistic is with
n – 2 degrees of freedom
▪ We begin with the hypothesis statements
𝐻0 : 𝛽ሶ = 0 The slope of the population is zero
𝐻1 : 𝛽ሶ ≠ 0 The slope of the population is different from zero
Regression Equation Slope Test
Recall the North American Copier Sales example. We identified the slope
as b and it is our estimate of the slope of the population.
We conduct a hypothesis test.
▪ Step 1: State the null and alternate hypothesis
𝐻0 : 𝛽ሶ ≤ 0 The slope of the population is zero
𝐻1 : 𝛽ሶ > 0 The slope of the population is different from zero
▪ Step 2: Select the level of significance, we use .05
▪ Step 3: Select the test statistic, t
Regression Equation Slope Test
Recall the North American Copier Sales example. We identified the slope
as b and it is our estimate of the slope of the population.
We conduct a hypothesis test.
▪ Step 4: Formulate the decision rule, reject H0 if t > 1.771
▪ Step 5: Make decision, reject H0, t = 6.205
Regression Equation Slope Test
Recall the North American Copier Sales example. We identified the slope
as b and it is our estimate of the slope of the population.
We conduct a hypothesis test.
▪ Step 6: Interpret, the number of sales calls is useful in estimating copier
sales
Evaluating a Regression Equation’s Ability
to Predict
▪ Perfect prediction is practically impossible in almost all disciplines,
including economics and business
▪ The North American Copier Sales example showed a significant
relationship between sales calls and copier sales, the equation is
▪ Number of copiers sold = 19.9632 + .2608(Number of sales calls)
What if the number of sales calls is 84, we calculate the number of copiers sold is
41.8704—we did have two employees with 84 sales calls, they sold just 30 and 24
▪ So, is the regression equation a good predictor?
Evaluating a Regression Equation’s Ability
to Predict
▪ We need a measure that will tell how inaccurate the estimate might be.
▪ The measure we’ll use is
1. The standard error of the estimate, 𝑠𝑦,𝑥 .
2. The coefficient of determination
The Standard Error of Estimate
▪ The standard error of estimate measures the variation around the
regression line
STANDARD ERROR OF ESTIMATE A measure of the dispersion, or scatter, of the observed
values around the line of regression for a given value of x.

▪ It is in the same units as the dependent variable


▪ It is based on squared deviations from the regression line
The Standard Error of Estimate
▪ The standard error of estimate is computed using the following formula:

▪ If the standard error of estimate is small, this indicates that the data are
relatively close to the regression line and the regression equation can be
used.
▪ If it is large, the data are widely scattered around the regression line and
the regression equation will not provide a precise estimate of y.
The Standard Error of Estimate Example
▪ We calculate the standard error of estimate in this example.
▪ We need the sum of the squared differences between each observed
value of y and the predicted value of y, which is 𝑦ത
The Coefficient of Determination
▪ The coefficient of determination is the proportion of the total variation in
the dependent variable Y that is explained, or accounted for, by the
variation in the independent variable X.
▪ The coefficient of determination provides a more interpretable measure
of a regression equation’s ability to predict.
▪ It is found from the following formula
The Coefficient of Determination
▪ The characteristics of coefficient of determination:
▪ It ranges from 0 to 1.0
▪ It is the square of the correlation coefficient

▪ In the North American Copier Sales example, the correlation coefficient


was .865; just square that (.865)2 = .748; this is the coefficient of
determination
▪ This means 74.8% of the variation in the number of copiers sold is
explained by the variation in sales calls
𝟐
Relationships among 𝒓, 𝒓 and 𝒔𝒚,𝒙
▪ Recall the standard error of estimate (𝑠𝑦,𝑥 ) measures how close the actual values are to
the regression line
When it is small, the two variables are closely related
▪ The correlation coefficient (𝑟) measures the strength of the linear association between
two variables
When points on the scatter diagram are close to the line, the correlation coefficient tends to be
large
▪ The coefficient of determination is the correlation coefficient squared
▪ Therefore, the correlation coefficient (𝑟) and the coefficient of determination (𝑟 2 )have an
inverse relationship with the standard error of estimate (𝑠𝑦,𝑥 )
𝟐
Relationships among 𝒓, 𝒓 and 𝒔𝒚,𝒙
Two different predictions can be made for a selected value of the
independent variable;
1. Confidence interval and
2. Prediction interval.
Inference about Linear Regression
▪ We can predict the number of copiers sold (y) for a selected value of
number of sales calls made (x)
▪ But first, let’s review the regression assumptions:
1. Follow the normal distribution
2. Has a mean on the regression line
3. Has the same standard error of estimate
4. Is independent of the others
Constructing Confidence & Prediction Intervals
Two different predictions can be made for a selected value of the
independent variable;
1. Confidence interval and
In a confidence interval, the width of the interval is affected by the level of confidence,
the size of the standard error of the estimate, and the size of the sample, as well as the
value of the independent variable.
2. Prediction interval.
Constructing Confidence & Prediction Intervals
Two different predictions can be made for a selected value of the
independent variable;
1. Confidence interval and
2. Prediction interval
The prediction interval is also based on the level of confidence, the size of the
standard error of the estimate, the size of the sample, and the value of the
independent variable
Constructing Confidence & Prediction Intervals
▪ Use a confidence interval when the regression equation is used to
predict the mean value of y for a given value of x

▪ For instance, we would use a confidence interval to estimate the mean


salary of all executives in the retail industry based on their years of
experience
Constructing Confidence & Prediction Intervals
▪ Use a prediction interval when the regression equation is used to predict
an individual y for a given value of x

▪ For instance, we would estimate the salary of a particular retail executive


who has 20 years of experience
Confidence Interval & Prediction Interval
Example
We return to the North American Copier Sales example. Determine a 95% confidence
interval for all sales representatives who make 50 calls, and determine a prediction interval
for Sheila Baker, a west coast sales representative who made 50 sales calls.
Confidence Interval & Prediction Interval
Example
We return to the North American Copier Sales example. Determine a 95% confidence
interval for all sales representatives who make 50 calls, and determine a prediction interval
for Sheila Baker, a west coast sales representative who made 50 sales calls.
Correlation and Regression
Correlation

Regression

Data Transformation
Transforming Data
▪ Regression analysis and the correlation coefficient requires
data to be linear
▪ But what if data is not linear?
Transforming Data
But what if data is not linear?
1. Rescale: we can rescale one or both of the variables so the
new relationship is linear
2. Transform: the common transformation techniques include:
▪ Computing the log to the base 10 of y, Log(y)
▪ Taking the square root
▪ Taking the reciprocal
▪ Squaring one or both variables
Transforming Data: An Example
▪ The director of marketing of Grocery Land Supermarkets
wishes to study the effect of price on weekly sales of their
two-liter private brand diet cola.
▪ The objectives of the study are:
1. To determine whether there is a relationship between selling price
and weekly sales.
2. To determine the effect of price increases or decreases on sales.
Transforming Data: An Example

▪ To begin, the company decides to


price the two-liter diet cola from
$0.50 to $2.00.
▪ To collect the data, a random
sample of 20 stores is taken and
then each store is randomly
assigned a selling price.
Transforming Data: An Example
The objectives of the study are:
1. To determine whether there is
a relationship between selling
price and weekly sales.

Is this relationship direct or


indirect? Is it strong or weak?

Answer: A strong, inverse


relationship!
Transforming Data: An Example
The objectives of the study are:
The director of marketing decides to
2. To determine the effect of transform the dependent variable, Sales,
price increases or decreases by taking the logarithm to the base 10 of
on sales. each sales value.

Can we effectively forecast Note the new variable, Log-Sales, in the


sales based on the price? following analysis as it is used as the
dependent variable with Price as the
independent variable.
Transforming Data: An Example
The objectives of the study are:
2. To determine the effect of
price increases or decreases
on sales.

Can we effectively forecast


sales based on the price?
Transforming Data: An Example
By transforming the dependent variable, Sales, we increase the
The objectives of the study are: coefficient of determination from .889 to .989. So now Price
explains almost all of the variation in Log-Sales
2. To determine the effect of
price increases or decreases
on sales.

Can we effectively forecast


sales based on the price?
Transforming Data: An Example
The objectives of the study are:
2. To determine the effect of
price increases or decreases
on sales.

Can we effectively forecast


sales based on the price?

The transformed data “fit” the linear


relationship better
Transforming Data: An Example
The objectives of the study are:
2. To determine the effect of
price increases or decreases
on sales.

Can we effectively forecast


sales based on the price?

The regression equation is


𝑦ො = 2.685 − 0.8738𝑥
𝑦ො = 2.685 − 0.8738(1.25)
𝑦ො = 1.593 Now, undo the transformation by taking the antilog of 1.593; 39.174 or 39 bottles; That
is, if they price the cola at $1.25, they’ll sell 39 bottles; at $2.00 they’ll sell 9
Transforming Data: An Example
The objectives of the study are:
2. To determine the effect of
price increases or decreases
on sales.

Can we effectively forecast


sales based on the price?

Clearly, as price increases, sales


decrease.
This relationship will be very helpful
to Grocery Land when making
pricing decisions for this product.
THANK YOU

You might also like