Jump to content

Variance inflation factor

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 76.76.220.34 (talk) at 20:11, 27 April 2011 (→‎Definition). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, the variance inflation factor (VIF) quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient (the square of the estimate's standard deviation) is increased because of collinearity.

Definition

Consider the following linear model with k independent variables:

Y = β0 + β1 X1 + β2 X 2 + ... + βk Xk + ε.

The standard error of the estimate of βj is the square root of the j+1, j+1 element of s2(XX)−1/2, where s is the standard error of the estimate (and SEE^2, is an unbiased estimator of the true variance of the error term, ) and X is the regression design matrix — a matrix such that Xi, j+1 is the value of the jth covariate for the ith case or observation, and Xi, 1 equals 1 for all i. It turns out that the square of this standard error, the variance of the estimate of βj, can be equivalently expressed as[citation needed]

where Rj2 is the multiple R2 for the regression of Xj on the other covariates (a regression that does not involve the response variable Y). This identity separates the influences of several distinct factors on the variance of the coefficient estimate:

  • s2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates
  • n: greater sample size results in proportionately less variance in the coefficient estimates
  • : greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate

The remaining term, 1 / (1 − Rj2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the design matrix is orthogonal, and is greater than or equal to 1 when the design matrix is not orthogonal. It is invariant to the scaling of the variables (that is, we could scale each variable Xj by a constant cj without changing the VIF).

Calculation and analysis

The VIF can be calculated and analyzed in three steps:

Step one

Calculate k different VIFs, one for each Xi by first running an ordinary least square regression that has Xi as a function of all the other explanatory variables in the first equation.
If i = 1, for example, the equation would be

where c0 is a constant and e is the error term.

Step two

Then, calculate the VIF factor for with the following formula:

where R2i is the coefficient of determination of the regression equation in step one.

Step three

Analyze the magnitude of multicollinearity by considering the size of the . A common rule of thumb is that if then multicollinearity is high. Also 10 has been proposed (see Kutner book referenced below) as a cut off value.

Some software calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference of the researcher.

Interpretation

The square root of the variance inflation factor tells you how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other independent variables in the equation.

Example
If the variance inflation factor of an independent variable were 5.27 (√5.27 = 2.3) this means that the standard error for the coefficient of that independent variable is 2.3 times as large as it would be if that independent variable were uncorrelated with the other independent variables.

References

  • Longnecker, M.T & Ott, R.L :A First Course in Statistical Methods, page 615. Thomson Brooks/Cole, 2004.
  • Studenmund, A.H: Using Econometrics: A practical guide, 5th Edition, page 258–259. Pearson International Edition, 2006.
  • Hair JF, Anderson R, Tatham RL, Black WC: Multivariate Data Analysis. Prentice Hall: Upper Saddle River, N.J. 2006.
  • Marquardt, D.W. 1970 "Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation", Technometrics 12(3), 591, 605–07
  • Allison, P.D. Multiple Regression: a primer, page 142. Pine Forge Press: Thousand Oaks, C.A. 1999.
  • Kutner, Nachtsheim, Neter, Applied Linear Regression Models, 4th edition, McGraw-Hill Irwin, 2004.