Multivariable regression simultaneously models associations between an outcome and multiple exposures/confounders, providing adjusted effect estimates. Linear regression is used for continuous outcomes; logistic regression for binary outcomes; Cox regression for time-to-event. Regression assumes specific functional forms, handles interactions explicitly, and is flexible for many confounders, but requires careful model specification and diagnostics.
When you learned stratification and standardization, you saw how to control confounding by dividing your data into strata — comparing exposed and unexposed individuals who are similar on the confounder. The limitation is that stratification breaks down quickly when multiple confounders are present simultaneously. With 5 binary confounders, you have up to 32 strata; with continuous confounders, stratification becomes impossible. Multivariable regression solves this by modeling all confounders mathematically at once, producing a single adjusted estimate of the exposure-outcome association.
The choice of regression model depends on the outcome type. Linear regression is appropriate for continuous outcomes (e.g., blood pressure, BMI), where the coefficient on the exposure represents a mean difference. Logistic regression handles binary outcomes (case/control, disease/no disease), producing coefficients on the log-odds scale — exponentiate to get the odds ratio. Cox proportional hazards regression, which you will encounter in survival analysis, extends this to time-to-event outcomes where participants are followed until an event or censoring.
The fundamental logic of confounding adjustment via regression is that each coefficient represents the association between a variable and the outcome *holding all other variables in the model constant*. If age confounds the relationship between exercise and heart disease, including age in the model allows you to compare the exercise coefficient among people of the same age — the adjusted estimate. Crucially, what you adjust for matters enormously. The set of covariates to include should be determined by a causal diagram (DAG), not by statistical significance or convenience. Adjusting for a *collider* — a variable caused by both the exposure and the outcome — can introduce spurious associations where none exist. Adjusting for a *mediator* — a variable on the causal pathway from exposure to outcome — can block the very effect you are trying to measure.
Model diagnostics are not optional. For logistic regression, check for separation (a variable perfectly predicts the outcome in some stratum), sparse cells, and excessive collinearity among predictors (multicollinearity inflates standard errors). For any regression, examine residuals to assess whether the linearity assumption holds, and consider whether a log-transformed or nonlinear term better fits continuous exposures. The output of a regression is only as trustworthy as the model specification behind it.
Multivariable regression is powerful, but it is not a substitute for good study design. Regression can adjust for measured confounders, but unmeasured confounding remains a threat in observational epidemiology. When you encounter a well-adjusted regression analysis, ask: what important confounders might still be unmeasured? Is the model form plausible? Were any colliders or mediators inadvertently adjusted? These are the questions that distinguish competent epidemiologic analysis from mechanical number-crunching.