A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Multicollinearity: Detection Using VIF

Graduate Depth 117 in the knowledge graph ☐ I know this ☆ Set as goal

944prerequisites beneath it

Eigenvalues and Eigenvectors Multicollinearity +5 more→

Core Idea

The Variance Inflation Factor VIFⱼ = 1 / (1 - Rⱼ²) measures how much variance of β̂ⱼ is inflated by collinearity with other regressors. Rules of thumb: VIF > 10 indicates severe multicollinearity; values 5-10 suggest moderate concern. Correlation matrix and condition number also reveal collinearity patterns.

Explainer

From your study of multicollinearity, you know the core problem: when predictors move together, OLS has trouble distinguishing their individual effects on the outcome. The coefficient estimates become unreliable — large standard errors, wild sign flips when a variable is added or removed, coefficients that are individually insignificant yet jointly significant. The Variance Inflation Factor gives you a precise, interpretable measure of how severe this inflation is for each predictor.

The intuition behind VIFⱼ = 1 / (1 - Rⱼ²) comes from an auxiliary regression: regress predictor j on all other predictors in your model. The R² from that auxiliary regression tells you how well the other predictors can "explain" predictor j — in other words, how redundant predictor j is. If Rⱼ² = 0, predictor j is orthogonal to all others, and VIF = 1 (no inflation). If Rⱼ² = 0.9, ninety percent of predictor j's variation is explained by the others, and VIF = 10 (ten times as much variance as you'd have with no collinearity). This connects directly to linear independence: a VIF approaching infinity signals that the columns of your design matrix X are nearly linearly dependent.

The condition number of the matrix X'X, which you've encountered, provides a complementary diagnostic. It equals the square root of the ratio of the largest to smallest eigenvalue. Large eigenvalues correspond to directions in predictor space with lots of variation; small eigenvalues correspond to near-collinear combinations. A condition number above 30 is often flagged as problematic. While VIF diagnoses collinearity for individual predictors, the condition number and eigenvalue decomposition reveal which combinations of predictors are nearly collinear — useful when the problem involves several predictors interacting.

The harder question is what to do about multicollinearity once detected. OLS remains unbiased — multicollinearity doesn't cause bias, only imprecision. If your goal is prediction rather than causal inference, high VIFs may be tolerable. For causal interpretation, solutions include dropping one of a pair of highly correlated variables, constructing a composite index, using principal components, or collecting more data to increase precision. The key diagnostic insight is this: if removing one variable substantially changes the coefficients on others, you're seeing collinearity in action — the model is not identifying individual effects cleanly.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression → Prediction Intervals in Regression → Linear Regression Basics → Residuals and Goodness of Fit (R²) → Simple (Bivariate) OLS Regression → Classical OLS Assumptions (Gauss-Markov) → Multiple Regression → Interpreting Regression Coefficients → Hypothesis Testing in Regression → F-Test and Joint Significance → R-Squared and Model Fit → Multicollinearity → Variance Inflation Factor and Multicollinearity Diagnosis → Breusch-Godfrey Test for Serial Correlation → Multicollinearity: Detection Using VIF

Longest path: 118 steps · 944 total prerequisite topics

Prerequisites (7)

Multicollinearityhard Eigenvalues and Eigenvectorshard Condition Number of a Matrixsoft Linear Independence and Linear Dependencesoft Variance Inflation Factor and Multicollinearity Diagnosissoft Graphical Diagnostics: Residual Plots and QQ Plotssoft Breusch-Godfrey Test for Serial Correlationsoft

Leads To (0)

No topics depend on this one yet.