Questions: Spatial Epidemiology and Geographic Analysis
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher uses ordinary least squares (OLS) regression to model county-level diabetes rates as a function of poverty and food access. The primary methodological concern with this analysis is:
AOLS cannot accept area-level data as inputs
BNeighboring counties likely have similar diabetes rates due to shared unmeasured environmental factors, violating OLS's independence assumption
CDiabetes rates cannot be mapped to county boundaries
DThe number of counties in the U.S. is too small for regression analysis
OLS assumes that observations are independent. In spatial data, neighboring areas tend to be more similar than distant ones (spatial autocorrelation) because they share unobserved environmental, socioeconomic, and demographic factors. This produces spatially correlated residuals, violating the independence assumption and biasing standard errors. Spatial regression models — which explicitly model the dependence structure — are the correct solution.
Question 2 Multiple Choice
A spatial scan statistic identifies a significant cluster of elevated lung cancer rates near an industrial facility. A critic invokes the ecological fallacy. This means:
AThe cluster is likely a statistical artifact requiring more data to confirm
BThe geographic boundary of the cluster was drawn arbitrarily, invalidating the result
CThe area-level association between proximity to the facility and lung cancer does not prove that individuals living near the facility have elevated personal risk
DThe Monte Carlo simulation used to assess significance was underpowered
The ecological fallacy is the error of inferring individual-level relationships from area-level data. Even if areas near the facility have higher lung cancer rates on average, this could reflect confounding by socioeconomic status, age distribution, or smoking rates that differ between areas — not necessarily individual exposure to the facility. The area-level association is a hypothesis-generating finding, not individual-level causal evidence. Supplementing with individual-level exposure data is the methodological remedy.
Question 3 True / False
A Moran's I value of +1 indicates that geographically adjacent areas have randomly distributed disease rates with no spatial clustering.
TTrue
FFalse
Answer: False
Moran's I ranges from −1 to +1. A value near 0 indicates spatial randomness (no autocorrelation). A value near +1 indicates perfect positive spatial autocorrelation — adjacent areas are maximally similar (strong clustering). A value near −1 indicates perfect spatial dispersion — adjacent areas are maximally different (a checkerboard pattern). The claim in the question reverses the interpretation.
Question 4 True / False
The modifiable areal unit problem (MAUP) means that spatial analysis results can change depending on how geographic boundaries are drawn, even when the underlying case data are identical.
TTrue
FFalse
Answer: True
MAUP is a fundamental limitation of area-based spatial analysis. Whether you aggregate data to counties, ZIP codes, or Census tracts — different choices of areal unit — can produce different and sometimes contradictory patterns from the same point-level data. This is because different aggregation schemes create different mixes of cases and populations within each unit. Reporting scale-sensitivity is a standard requirement in careful spatial epidemiology.
Question 5 Short Answer
Why do standard regression models often produce misleading results when applied to geographic disease rate data, and what does spatial regression do differently?
Think about your answer, then reveal below.
Model answer: Standard OLS assumes observations are independent, but neighboring areas share environmental, socioeconomic, and unmeasured factors that make their disease rates more similar than chance would predict. This spatial autocorrelation shows up as correlated residuals, violating OLS assumptions and biasing standard errors. Spatial regression models — spatial lag models (neighboring outcomes predict the current outcome) or spatial error models (residuals are spatially correlated) — explicitly model this dependence structure, producing valid inference.
Tobler's first law of geography — 'near things are more related than distant things' — is the underlying principle. OLS treats a Census tract in downtown Boston as statistically no more similar to adjacent tracts than to a tract in rural Montana. Spatial models encode the geographic neighborhood structure, allowing the analysis to distinguish the effect of measured covariates from the unmeasured spatial background that makes neighbors resemble each other.