This document discusses multicollinearity, which occurs when independent variables in a regression model are highly correlated, making their individual effects on the dependent variable difficult to isolate. It defines perfect and imperfect multicollinearity and describes some consequences of multicollinearity such as coefficients having large standard errors and significance tests rejecting variables that may actually be important predictors. The document also provides some methods for detecting and addressing multicollinearity issues.
This document discusses multicollinearity, which occurs when independent variables in a regression model are highly correlated, making their individual effects on the dependent variable difficult to isolate. It defines perfect and imperfect multicollinearity and describes some consequences of multicollinearity such as coefficients having large standard errors and significance tests rejecting variables that may actually be important predictors. The document also provides some methods for detecting and addressing multicollinearity issues.
This document discusses multicollinearity, which occurs when independent variables in a regression model are highly correlated, making their individual effects on the dependent variable difficult to isolate. It defines perfect and imperfect multicollinearity and describes some consequences of multicollinearity such as coefficients having large standard errors and significance tests rejecting variables that may actually be important predictors. The document also provides some methods for detecting and addressing multicollinearity issues.
2. Homoscedasticity. 3. Absence of autocorrelation. 4. The error terms are normally distributed. That is 5. The dependent variables are fixed (non- stochastic). 6. No multicollinearity. The independent variables are not strongly correlated. should be close to zero when WHAT IS MULTICOLLINEARITY? Multicollinearity refers to the case in which two or more explanatory variables in the regression model are highly correlated, making it difficult or impossible to isolate their individual effects on the dependent variable.
Y (consumption) = β0 + β1(Income) + β2(Wealth) + ɛ
TYPES OF MULTICOLLINEARITY Perfect multicollinearity – two or more of the independent variables are perfectly correlated. This means the changes in one independent variable can be explained by changes in another variable. For example if two independent variables are miles and kilometres. If multicollinearity is perfect the regression coefficients of the X variables are indeterminate and their standard errors are infinite. IMPERFECT MULTICOLLINEARITY Imperfect multicollinearity can be defined as a linear functional relationship between two or more independent variables that is so strong that it can significantly affect the estimation of the coefficients of the variables. Two or more of the explanatory variables are highly but not perfectly correlated. If multicollinearity is less than perfect the regression coefficients, although determinate, possess large standard errors (in relation to the coefficients themselves), which means the coefficients cannot be estimated with great precision or accuracy SOURCES OF MULTICOLLINEARITY 1. The data collection method employed, for example, sampling over a limited range of the values taken by the regressors in the population. 2. Constraints on the model or in the population being sampled. For example, in the regression of electricity consumption on income (X2) and house size (X3) there is a physical constraint in the population in that families with higher incomes generally have larger homes than families with lower incomes. 3. An overdetermined model. This happens when the model has more explanatory variables than the number of observations. This could happen in medical research where there may be a small number of patients about whom information is collected on a large number of variables. WHAT IF MULTICOLLINEARITY EXISTS? In the presence of multicollinearity the properties of OLS estimators are still BLUE. That is, estimates are still unbiased and consistent however the coefficient estimates will not have small standard errors. CONSEQUENCES OF MULTICOLLINEARITY
1. Although BLUE, the OLS estimators have large variances and
covariances, making precise estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much
wider, leading to the acceptance of the “zero null hypothesis” (i.e., the true population coefficient is zero) more readily.
3. Also because of consequence 1, the t ratio of one or more
coefficients tends to be statistically insignificant. 4. Although the t ratio of one or more coefficients is statistically insignificant, R2 the overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be
sensitive to small changes in the data. DETECTING MULTICOLLINEARITY 1. A high R2 but few significant t-ratios – If the R2 is high the F-test will reject the hypothesis that the estimates are insignificant. However if the t-test are showing that the estimates are insignificant then this is a classic sign of multicollinearity. 2. High pairwise correlation coefficient R is used a measure of the strength and direction of the linear relationship between two variables. The range of r is between +1 and -1 and the sign gives the direction of the two variables. The value of +1 indicates perfect collinearity. If two variables are perfectly positively correlated, then r = +1
If two variables are perfectly negatively correlated, then r = -1
If two variables are totally uncorrelated, then r = 0
The problem with this criterion is that, although high zero-order correlations may suggest collinearity, it is not necessary that they be high to have collinearity in any specific case. To put the matter somewhat technically, high zero-order correlations are a sufficient but not a necessary condition for the existence of multicollinearity because it can exist even though the zero-order or simple correlations are comparatively low (say, less than 0.50). CORRELATION COEFFICIENT CPI I OILP OILP_DOU... GASP CPI 1 -0.6901607... 0.71416271... 0.71416271... -0.4209678... I -0.6901607... 1 -0.5261076... -0.5261076... 0.25584676... OILP 0.71416271... -0.5261076... 1 1 0.10784274... OILP_DOU... 0.71416271... -0.5261076... 1 1 0.10784274... GASP -0.4209678... 0.25584676... 0.10784274... 0.10784274... 1 VARIANCE INFLATION FACTOR The variance inflation factor (VIF) is a method of detecting the severity of multicollinearity by looking at the extent to which a given explanatory variable can be explained by all the other explanatory variables in the equation. 1. Run an OLS regression with X as the dependent variable. 2. Calculate the variance inflation factor for β
^ VIF = ^ (βi) = 1 (1 – Ri2) RULE OF THUMB
The higher the VIF the higher the level of
multicollinearity A R2 of 1 indicates perfect multicollinearity and a VIF of infinity. A R2 of 0 indicates zero multicollinearity and a VIF of 1. When Vif > 5 this indicates severe multicollinearity. ADDRESSING MULTICOLLINEARITY Ignore it Drop a variable Increase the sample size IGNORE IT Sometimes removing an independent variable does not reduce t- scores enough to make them significant or make estimated coefficients reliable.
Removing a variable may lead to bias. When important variables are
removed it leads to specification bias (when an independent variable related to the dependent variable and an independent variable is omitted from the model).
Multicollinearity is essentially a data deficiency problem and some
times we have no choice over the data we have available for empirical analysis DROP A VARIABLE Do this is two variables are practically the same. If one variable is basically redundant you can drop one variable. But in dropping a variable from the model we may be committing specification bias or specification error. Specification bias arises from incorrect specification of the model used in the analysis. Thus, if economic theory says that income and wealth should both be included in the model explaining the consumption expenditure, dropping the wealth variable would constitute specification bias. INCREASE SAMPLE SIZE
This can be done to reduce the degree of
multicollinearity. Larger data sets produce better results than smaller samples since it reduces the size of the variance thus reducing multicollinearity. Dependent Variable: GDP Method: Least Squares Date: 04/04/18 Time: 12:24 Sample: 2000 2015 Included observations: 16