Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

MULTICOLLINEAR

ITY
ASSUMPTIONS OF OLS

1. Random error terms.


2. Homoscedasticity.
3. Absence of autocorrelation.
4. The error terms are normally distributed. That is
5. The dependent variables are fixed (non-
stochastic).
6. No multicollinearity. The independent variables
are not strongly correlated. should be close to
zero when
WHAT IS
MULTICOLLINEARITY?
Multicollinearity refers to the case in which two or
more explanatory variables in the regression model are
highly correlated, making it difficult or impossible to
isolate their individual effects on the dependent
variable.

Y (consumption) = β0 + β1(Income) + β2(Wealth) + ɛ


TYPES OF
MULTICOLLINEARITY
Perfect multicollinearity – two or more of the
independent variables are perfectly correlated.
This means the changes in one independent
variable can be explained by changes in another
variable. For example if two independent
variables are miles and kilometres.
If multicollinearity is perfect the regression
coefficients of the X variables are indeterminate
and their standard errors are infinite.
IMPERFECT
MULTICOLLINEARITY
Imperfect multicollinearity can be defined as a linear
functional relationship between two or more
independent variables that is so strong that it can
significantly affect the estimation of the coefficients of
the variables.
Two or more of the explanatory variables are highly
but not perfectly correlated.
If multicollinearity is less than perfect the
regression coefficients, although determinate,
possess large standard errors (in relation to the
coefficients themselves), which means the
coefficients cannot be estimated with great
precision or accuracy
SOURCES OF
MULTICOLLINEARITY
1. The data collection method employed, for example, sampling
over a limited range of the values taken by the regressors in the
population.
2. Constraints on the model or in the population being sampled. For
example, in the regression of electricity consumption on income (X2)
and house size (X3) there is a physical constraint in the population in
that families with higher incomes generally have larger homes than
families with lower incomes.
3. An overdetermined model. This happens when the model
has more explanatory variables than the number of
observations. This could happen in medical research
where there may be a small number of patients about
whom information is collected on a large number of
variables.
WHAT IF
MULTICOLLINEARITY
EXISTS?
In the presence of multicollinearity the properties
of OLS estimators are still BLUE. That is,
estimates are still unbiased and consistent
however the coefficient estimates will not have
small standard errors.
CONSEQUENCES OF
MULTICOLLINEARITY

1. Although BLUE, the OLS estimators have large variances and


covariances, making precise estimation difficult.

2. Because of consequence 1, the confidence intervals tend to be much


wider, leading to the acceptance of the “zero null hypothesis” (i.e., the
true population coefficient is zero) more readily.

3. Also because of consequence 1, the t ratio of one or more


coefficients tends to be statistically insignificant.
4. Although the t ratio of one or more coefficients is
statistically insignificant, R2 the overall measure of
goodness of fit, can be very high.

5. The OLS estimators and their standard errors can be


sensitive to small changes in the data.
DETECTING
MULTICOLLINEARITY
1. A high R2 but few significant t-ratios – If the R2 is
high the F-test will reject the hypothesis that the
estimates are insignificant. However if the t-test are
showing that the estimates are insignificant then
this is a classic sign of multicollinearity.
2. High pairwise correlation coefficient
R is used a measure of the strength and direction
of the linear relationship between two variables.
The range of r is between +1 and -1 and the sign
gives the direction of the two variables.
The value of +1 indicates perfect collinearity.
If two variables are perfectly positively correlated, then r = +1

If two variables are perfectly negatively correlated, then r = -1

If two variables are totally uncorrelated, then r = 0


The problem with this criterion is that, although high
zero-order correlations may suggest collinearity, it is not
necessary that they be high to have collinearity in any
specific case. To put the matter somewhat technically,
high zero-order correlations are a sufficient but not a
necessary condition for the existence of
multicollinearity because it can exist even though the
zero-order or simple correlations are comparatively low
(say, less than 0.50).
CORRELATION
COEFFICIENT
CPI I OILP OILP_DOU... GASP
CPI 1 -0.6901607... 0.71416271... 0.71416271... -0.4209678...
I -0.6901607... 1 -0.5261076... -0.5261076... 0.25584676...
OILP 0.71416271... -0.5261076... 1 1 0.10784274...
OILP_DOU... 0.71416271... -0.5261076... 1 1 0.10784274...
GASP -0.4209678... 0.25584676... 0.10784274... 0.10784274... 1
VARIANCE INFLATION
FACTOR
The variance inflation factor (VIF) is a method of detecting the
severity of multicollinearity by looking at the extent to which a given
explanatory variable can be explained by all the other explanatory
variables in the equation.
1. Run an OLS regression with X as the dependent variable.
2. Calculate the variance inflation factor for β

^
VIF = ^
(βi) = 1
(1 – Ri2)
RULE OF THUMB

The higher the VIF the higher the level of


multicollinearity
A R2 of 1 indicates perfect multicollinearity and a VIF of
infinity.
A R2 of 0 indicates zero multicollinearity and a VIF of 1.
When Vif > 5 this indicates severe multicollinearity.
ADDRESSING
MULTICOLLINEARITY
Ignore it
Drop a variable
Increase the sample size
IGNORE IT
Sometimes removing an independent variable does not reduce t-
scores enough to make them significant or make estimated
coefficients reliable.

Removing a variable may lead to bias. When important variables are


removed it leads to specification bias (when an independent variable
related to the dependent variable and an independent variable is
omitted from the model).

Multicollinearity is essentially a data deficiency problem and some


times we have no choice over the data we have available for
empirical analysis
DROP A VARIABLE
Do this is two variables are practically the same. If
one variable is basically redundant you can drop
one variable.
But in dropping a variable from the model we may
be committing specification bias or specification
error. Specification bias arises from incorrect
specification of the model used in the analysis.
Thus, if economic theory says that income and
wealth should both be included in the model
explaining the consumption expenditure, dropping
the wealth variable would constitute specification
bias.
INCREASE SAMPLE SIZE

This can be done to reduce the degree of


multicollinearity. Larger data sets produce better
results than smaller samples since it reduces the
size of the variance thus reducing
multicollinearity.
Dependent Variable: GDP
Method: Least Squares
Date: 04/04/18 Time: 12:24
Sample: 2000 2015
Included observations: 16

Variable Coefficient Std. Error t-Statistic Prob.

C -64163.95 21101.45 -3.040737 0.0112


OILP 632.2881 137.7339 4.590651 0.0008
GASP 5676.765 1501.525 3.780667 0.0030
CPI 1647.148 211.5550 7.785908 0.0000
I 1015.039 1054.620 0.962469 0.3565

R-squared 0.978961 Mean dependent var 121721.0


Adjusted R-squared 0.971310 S.D. dependent var 46417.10
S.E. of regression 7862.165 Akaike info criterion 21.02782
Sum squared resid 6.80E+08 Schwarz criterion 21.26925
Log likelihood -163.2225 Hannan-Quinn criter. 21.04018
F-statistic 127.9582 Durbin-Watson stat 1.748444
Prob(F-statistic) 0.000000
Dependent Variable: GDP
Method: Least Squares
Date: 04/04/18 Time: 12:25
Sample: 2000 2015
Included observations: 16
HAC standard errors & covariance (Bartlett kernel, Newey-West fixed
bandwidth = 3.0000)

Variable Coefficient Std. Error t-Statistic Prob.

C -64163.95 20199.50 -3.176512 0.0088


OILP 632.2881 145.4368 4.347512 0.0012
GASP 5676.765 1628.689 3.485482 0.0051
CPI 1647.148 190.9724 8.625056 0.0000
I 1015.039 1142.040 0.888795 0.3931

R-squared 0.978961 Mean dependent var 121721.0


Adjusted R-squared 0.971310 S.D. dependent var 46417.10
S.E. of regression 7862.165 Akaike info criterion 21.02782
Sum squared resid 6.80E+08 Schwarz criterion 21.26925
Log likelihood -163.2225 Hannan-Quinn criter. 21.04018
F-statistic 127.9582 Durbin-Watson stat 1.748444
Prob(F-statistic) 0.000000 Wald F-statistic 157.2823
Prob(Wald F-statistic) 0.000000
CORRELATION MATRIX

CPI I GASP OILP OILP2


CPI 1 -0.6901607... -0.4209678... 0.71416271... 0.71416271...
I -0.6901607... 1 0.25584676... -0.5261076... -0.5261076...
GASP -0.4209678... 0.25584676... 1 0.10784274... 0.10784274...
OILP 0.71416271... -0.5261076... 0.10784274... 1 1
OILP2 0.71416271... -0.5261076... 0.10784274... 1 1

You might also like