← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Introduction to Multiple Linear Regression

College Depth 82 in the knowledge graph ☐ I know this ☆ Set as goal

200topics build on this

327prerequisites beneath it

See this on the map →

Linear Regression and Least Squares Estimation→→Assumptions in Linear Regression Inference in Linear Regression

Core Idea

Multiple linear regression extends simple regression to many predictors: E[Y|X₁,...,Xₚ] = β₀ + β₁X₁ + ... + βₚXₚ. Coefficients represent partial effects (adjusted for other predictors). Model selection and multicollinearity are key concerns.

How It's Best Learned

Fit multiple regression models with software. Compare nested models using F-tests. Examine variance inflation factors (VIF) for multicollinearity. Interpret partial slopes as adjusted effects. Use visualization and residual diagnostics.

Common Misconceptions

Interpreting regression coefficients causally without experimentation. Ignoring multicollinearity and its effects on interpretability. Believing all significant predictors should be included. Overfitting with too many predictors.

Explainer

Simple linear regression asks: how does Y change with X? Multiple regression asks a harder question: how does Y change with X₁ *holding X₂, X₃, ... constant*? This "holding everything else constant" idea is the heart of the model. The equation E[Y|X₁,...,Xₚ] = β₀ + β₁X₁ + ... + βₚXₚ looks like a straight line extended to higher dimensions — a flat hyperplane through p-dimensional predictor space. Each slope βⱼ is a partial slope: it tells you the expected change in Y for a one-unit increase in Xⱼ when all other predictors are held fixed.

The key insight is that partial slopes can differ dramatically from simple slopes. Suppose you regress exam scores on study hours and find a positive slope. Now add a second predictor, prior GPA. The coefficient on study hours shrinks — not because study hours matter less, but because some of its apparent effect was actually attributable to GPA (better students both study more *and* score higher). Multiple regression disentangles these associations. This is called statistical control: by including a variable in the model, you partial out its contribution, isolating the unique relationship of each predictor with the outcome.

Multicollinearity occurs when predictors are strongly correlated with each other. Intuitively: if X₁ and X₂ move almost in lockstep, the model cannot tell which one is doing the work. Mathematically, the coefficient estimates become unstable — large standard errors, wildly varying slopes across similar datasets. The variance inflation factor (VIF) quantifies this instability for each predictor. A VIF above 5 or 10 is a warning sign. Remedies include dropping one of the correlated predictors, combining them (e.g., via PCA), or collecting more data. Multicollinearity does not bias predictions from the model as a whole; it only undermines the interpretability of individual coefficients.

Model selection — choosing which predictors to include — is one of the central practical challenges. Adding more predictors always improves R² on the training data, but can hurt predictive accuracy on new data (overfitting). Adjusted R², AIC, or cross-validation penalize model complexity. The deeper issue is conceptual: a model with 20 predictors and 25 observations is fitting noise, not signal. The rule of thumb is roughly 10–20 observations per predictor for stable estimates. When in doubt, prefer the simpler model that captures the essential relationships without chasing every fluctuation in the data.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Vectors in Two Dimensions → Vector Operations: Addition, Subtraction, and Scalar Multiplication → Dot Product (Inner Product in R^n) → Inner Product Spaces → Orthogonality → Orthogonal Projections → Orthogonal Projections and Least Squares Approximation → Linear Regression and Least Squares Estimation → Introduction to Multiple Linear Regression

Longest path: 83 steps · 327 total prerequisite topics

Prerequisites (1)

Linear Regression and Least Squares Estimationhard

Leads To (2)

Assumptions in Linear Regressionsoft Inference in Linear Regressionsoft