A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Inference in Linear Regression

College Depth 103 in the knowledge graph ☐ I know this ☆ Set as goal

197topics build on this

542prerequisites beneath it

Regression Diagnostics and Residual Analysis Introduction to Multiple Linear Regression +1 more→→Prediction Intervals and Out-of-Sample Forecasting Prediction Intervals in Regression

Core Idea

Under standard regression assumptions, regression coefficients are normally distributed. We construct confidence intervals and tests for slope using t-distributions. F-test assesses overall model significance. Inference requires assumptions about errors.

How It's Best Learned

Examine regression output with coefficients, SE, t-statistics, and p-values. Test whether slope differs from zero. Construct confidence intervals for slope and intercept. Compare F-test to t-test for single predictor.

Explainer

In simple linear regression, you fit a line ŷ = β₀ + β₁x to data by ordinary least squares. The estimated coefficients β̂₀ and β̂₁ come from one particular sample — a different sample would give different values. To make statements about the true population relationship, you need to understand the sampling distribution of β̂₁. The model assumes errors εᵢ = yᵢ - (β₀ + β₁xᵢ) are independent with mean zero and constant variance σ². Under these assumptions, the OLS estimators are linear combinations of the y values. Since the y values are the fixed x values plus normal errors, a linear combination of normal random variables is itself normal. This is why β̂₁ is normally distributed with mean β₁ (it is unbiased) and variance σ²/Σ(xᵢ - x̄)².

Because σ² is unknown, substitute the mean squared error s² = SSE/(n-2). The ratio (β̂₁ - β₁)/(s/√Σ(xᵢ - x̄)²) follows a t-distribution with n-2 degrees of freedom — the same form as the one-sample t-test you studied, just with a different standard error formula. The denominator s/√Σ(xᵢ - x̄)² is the standard error of the slope, written SE(β̂₁). Every regression output table reports: the estimate β̂₁, its SE(β̂₁), the t-statistic t = β̂₁/SE(β̂₁) (testing H₀: β₁ = 0), and the associated p-value. A confidence interval for β₁ is β̂₁ ± t* · SE(β̂₁), exactly parallel to the one-sample t-interval.

The F-test assesses overall model significance: is the regression model better than predicting with the mean alone? It computes F = (explained variance / k) / (unexplained variance / (n - k - 1)), where k is the number of predictors. For simple linear regression (k = 1), F = t², so the F-test and t-test for the slope test the same null hypothesis and always give the same p-value. With multiple predictors, the F-test becomes a joint test — all slopes are zero simultaneously — while t-tests address individual coefficients. The F-test answers "does this model explain anything?" while t-tests answer "does this specific predictor matter given the others?"

All of these inference procedures rest on the regression assumptions your diagnostics verify: linearity of the mean function, independence of errors, constant variance across all x values (homoskedasticity), and approximate normality of residuals. If residuals fan out at high fitted values, heteroskedasticity inflates or deflates standard errors and makes p-values misleading. If the residual plot curves, the linear form is wrong and the slope coefficient is not meaningfully estimating any population quantity. Inference in regression is inseparable from diagnostics — the t and F statistics are only trustworthy when the model's foundations are sound.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression

Longest path: 104 steps · 542 total prerequisite topics

Prerequisites (3)

Regression Diagnostics and Residual Analysishard One-Sample and Two-Sample T-Testssoft Introduction to Multiple Linear Regressionsoft

Leads To (2)

Prediction Intervals and Out-of-Sample Forecastinghard Prediction Intervals in Regressionhard