Inference in Linear Regression

College Depth 79 in the knowledge graph I know this Set as goal
Unlocks 5 downstream topics
regression inference testing

Core Idea

Under standard regression assumptions, regression coefficients are normally distributed. We construct confidence intervals and tests for slope using t-distributions. F-test assesses overall model significance. Inference requires assumptions about errors.

How It's Best Learned

Examine regression output with coefficients, SE, t-statistics, and p-values. Test whether slope differs from zero. Construct confidence intervals for slope and intercept. Compare F-test to t-test for single predictor.

Explainer

In simple linear regression, you fit a line ŷ = β₀ + β₁x to data by ordinary least squares. The estimated coefficients β̂₀ and β̂₁ come from one particular sample — a different sample would give different values. To make statements about the true population relationship, you need to understand the sampling distribution of β̂₁. The model assumes errors εᵢ = yᵢ - (β₀ + β₁xᵢ) are independent with mean zero and constant variance σ². Under these assumptions, the OLS estimators are linear combinations of the y values. Since the y values are the fixed x values plus normal errors, a linear combination of normal random variables is itself normal. This is why β̂₁ is normally distributed with mean β₁ (it is unbiased) and variance σ²/Σ(xᵢ - x̄)².

Because σ² is unknown, substitute the mean squared error s² = SSE/(n-2). The ratio (β̂₁ - β₁)/(s/√Σ(xᵢ - x̄)²) follows a t-distribution with n-2 degrees of freedom — the same form as the one-sample t-test you studied, just with a different standard error formula. The denominator s/√Σ(xᵢ - x̄)² is the standard error of the slope, written SE(β̂₁). Every regression output table reports: the estimate β̂₁, its SE(β̂₁), the t-statistic t = β̂₁/SE(β̂₁) (testing H₀: β₁ = 0), and the associated p-value. A confidence interval for β₁ is β̂₁ ± t* · SE(β̂₁), exactly parallel to the one-sample t-interval.

The F-test assesses overall model significance: is the regression model better than predicting with the mean alone? It computes F = (explained variance / k) / (unexplained variance / (n - k - 1)), where k is the number of predictors. For simple linear regression (k = 1), F = t², so the F-test and t-test for the slope test the same null hypothesis and always give the same p-value. With multiple predictors, the F-test becomes a joint test — all slopes are zero simultaneously — while t-tests address individual coefficients. The F-test answers "does this model explain anything?" while t-tests answer "does this specific predictor matter given the others?"

All of these inference procedures rest on the regression assumptions your diagnostics verify: linearity of the mean function, independence of errors, constant variance across all x values (homoskedasticity), and approximate normality of residuals. If residuals fan out at high fitted values, heteroskedasticity inflates or deflates standard errors and makes p-values misleading. If the residual plot curves, the linear form is wrong and the slope coefficient is not meaningfully estimating any population quantity. Inference in regression is inseparable from diagnostics — the t and F statistics are only trustworthy when the model's foundations are sound.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsInference in Linear Regression

Longest path: 80 steps · 394 total prerequisite topics

Prerequisites (3)

Leads To (1)