Hypothesis Tests and Inference in Regression

College Depth 83 in the knowledge graph I know this Set as goal
regression-inference

Core Idea

Test H₀:β₁=0 using T=(β₁−0)/SE(β₁) with n−2 df. Confidence interval for β₁: β₁±t_{n-2,α/2}·SE(β₁). F-test for overall model. Prediction intervals widen with distance from X̄ and with increased residual variation.

Explainer

From simple linear regression you know how to compute β̂₁ — the OLS estimate of the slope — from a sample of n observations. But β̂₁ is a statistic, not a parameter. Every new sample would give a slightly different slope. The central insight of regression inference is that β̂₁ has its own sampling distribution: under standard assumptions (linearity, constant variance, uncorrelated errors), β̂₁ is normally distributed with mean equal to the true population slope β₁ and standard error SE(β̂₁) = s / √(Σ(xᵢ − x̄)²), where s is the residual standard error. We cannot observe β₁ directly, but we can reason probabilistically about where it lies.

The most common question is whether the predictor matters at all — does X have a linear relationship with Y in the population? This is formalized as H₀: β₁ = 0. The t-statistic T = β̂₁ / SE(β̂₁) measures how many standard errors the estimate is from zero. Under H₀, T follows a t-distribution with n − 2 degrees of freedom (we lose two for estimating β₀ and β₁). A large |T| means the slope is far from zero relative to its sampling uncertainty, giving evidence against H₀. The p-value is the probability of observing a t-statistic at least as extreme, assuming H₀ is true. If p < α, we reject H₀ and conclude that X is a statistically significant linear predictor of Y.

A confidence interval for β₁ — β̂₁ ± t_{n−2, α/2} · SE(β̂₁) — inverts the same logic. Rather than asking whether a specific hypothesized value is plausible, the interval reports all values that would not be rejected at level α. An interval that excludes zero is equivalent to rejecting H₀: β₁ = 0 at that level. The F-test for the overall model generalizes to multiple predictors: it tests whether all slopes are simultaneously zero. In simple regression, the F-statistic equals T², so both tests are equivalent and give identical p-values.

Prediction intervals address a different question: where will a *new individual observation* fall, given a particular X value? Unlike a confidence interval for the mean response (which only captures uncertainty about the population mean at X = x*), a prediction interval must also account for residual variation — the irreducible scatter of individual points around the true line. As a result, prediction intervals are always wider than confidence intervals for the mean. Both intervals are narrowest at X = X̄ and widen as X moves away from the mean, because the OLS line is pinned by the data centroid — extrapolation increases uncertainty. The more residual variation in the data (larger s), the wider both intervals become, reflecting genuine uncertainty about the underlying relationship.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsInference in Linear RegressionPrediction Intervals in RegressionLinear Regression BasicsSimple Linear Regression: Theory and EstimationHypothesis Tests and Inference in Regression

Longest path: 84 steps · 412 total prerequisite topics

Prerequisites (1)

Leads To (0)

No topics depend on this one yet.