A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Variance Inflation Factor and Multicollinearity Diagnosis

College Depth 115 in the knowledge graph ☐ I know this ☆ Set as goal

2topics build on this

584prerequisites beneath it

Multicollinearity Multiple Regression→→Breusch-Godfrey Test for Serial Correlation Multicollinearity: Detection Using VIF

Core Idea

The Variance Inflation Factor (VIF) quantifies how much a coefficient's variance is inflated by multicollinearity: VIF = 1/(1 - Rⱼ²), where Rⱼ² comes from regressing regressor j on all others. VIF values above 5–10 typically indicate problematic multicollinearity requiring remediation.

Explainer

From your study of multicollinearity, you know the problem: when two or more regressors are highly linearly related, OLS cannot cleanly attribute variation in Y to one variable versus the other. The estimates are still unbiased — OLS is not broken — but they become imprecise. Standard errors blow up, t-statistics shrink, and coefficients can appear statistically insignificant even when the variables genuinely matter. What you need is a way to measure how severe this imprecision is for each coefficient individually. That is exactly what the Variance Inflation Factor provides.

The logic behind the VIF formula is elegant. For regressor j, run a separate regression: regress Xⱼ on all the *other* regressors and record the R² from that auxiliary regression, calling it Rⱼ². This Rⱼ² measures how well the other regressors can predict Xⱼ — in other words, how redundant Xⱼ is given the rest of the model. The VIF is then 1/(1 − Rⱼ²). When Rⱼ² = 0 (Xⱼ is completely unrelated to the other regressors), VIF = 1: no inflation, the coefficient's variance is exactly what it would be in a simple regression. When Rⱼ² = 0.9 (90% of Xⱼ's variation is explained by the others), VIF = 10: the variance of β̂ⱼ is ten times larger than it would be without multicollinearity. The standard error is thus √10 ≈ 3.16 times wider, dramatically shrinking t-statistics.

The conventional thresholds — VIF > 5 causes concern, VIF > 10 indicates serious problems — are rules of thumb, not mathematical cutoffs. What they communicate is that once VIF exceeds these values, your coefficient estimates are so imprecise that meaningful inference is difficult. A VIF of 25 means the standard error is five times larger than it would be in an orthogonal design; you would need a sample roughly 25 times larger to achieve the same precision. Understanding this relationship makes the diagnosis concrete: a high VIF is not a mysterious pathology but a quantitative statement about how much statistical power you are losing.

Remedies depend on the source of multicollinearity. If two regressors measure nearly the same thing conceptually, you can drop one or combine them into an index. If multicollinearity arises from the functional form (including both X and X² creates high correlation), centering the variable before squaring often helps. If you must retain both variables because theory requires them, the practical implication is humility: wide confidence intervals are honest, and you should report them as such rather than over-interpreting coefficient magnitudes. VIF does not tell you what to do, but it gives you the quantitative basis for understanding why your estimates are uncertain and how much worse the problem would have to get before the model becomes uninformative.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression → Prediction Intervals in Regression → Linear Regression Basics → Residuals and Goodness of Fit (R²) → Simple (Bivariate) OLS Regression → Classical OLS Assumptions (Gauss-Markov) → Multiple Regression → Interpreting Regression Coefficients → Hypothesis Testing in Regression → F-Test and Joint Significance → R-Squared and Model Fit → Multicollinearity → Variance Inflation Factor and Multicollinearity Diagnosis

Longest path: 116 steps · 584 total prerequisite topics

Prerequisites (2)

Multicollinearityhard Multiple Regressionsoft

Leads To (2)

Breusch-Godfrey Test for Serial Correlationsoft Multicollinearity: Detection Using VIFsoft