Questions: Research Design: From Questions to Methods
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher collects data on police presence and crime rates across 50 cities, then decides — after observing a correlation — to frame the project as a causal study of whether police reduce crime. What is the primary design problem?
A50 cities is too small a sample for any statistical analysis of this kind
BThe design was not structured to support causal inference; confounders and reverse causality cannot be ruled out post hoc
CCity-level analysis is always invalid because cities are too heterogeneous to compare
DThe researcher should have used a survey instrument rather than observational administrative data
The core problem is that causal inference requires a data structure designed to rule out alternative explanations — confounders, reverse causality, selection bias. Rich cities may have both more police and less crime for unrelated reasons (confounding); high-crime cities may hire more police in response (reverse causality). A design capable of supporting causal claims must anticipate these threats before data collection — through random assignment, instrumental variables, difference-in-differences, etc. Retrofitting a causal claim onto an observational dataset collected without those features is not a design flaw in the data; it is a fundamental mismatch between the inferential goal and the data structure.
Question 2 Multiple Choice
A hypothesis is formulated after the researcher has already examined the data and observed the pattern it predicts. Why is this a methodological problem?
AIt violates the assumption of random sampling required for statistical inference
BIt commits the researcher to a conclusion before the analysis is complete, biasing interpretation
CIt is not falsifiable — the hypothesis was constructed to fit the data already observed, so no data could disconfirm it
DIt is always causally invalid because no experiment was conducted to test the prediction
A hypothesis must specify in advance what evidence would disconfirm it — that is what makes it a hypothesis rather than a post-hoc story. When a hypothesis is formulated after observing the data it 'predicts,' it cannot be disconfirmed by that data, because it was designed to fit it. This is HARKing (Hypothesizing After Results are Known) and produces apparent confirmation that is actually circular. The methodological requirement is that hypotheses commit you before collection to what counts as disconfirming evidence — which is only possible if they precede the data.
Question 3 True / False
A randomized controlled experiment that carefully eliminates confounders automatically produces results that generalize to real-world populations and settings.
TTrue
FFalse
Answer: False
Internal validity (the degree to which the design supports causal claims within the study) and external validity (the degree to which findings generalize beyond the study) are distinct and often in tension. A highly controlled laboratory experiment may eliminate confounders effectively (high internal validity) while using an unrepresentative sample, artificial conditions, or a constrained intervention that does not resemble real-world implementation (low external validity). Randomization addresses internal validity threats; generalizability requires deliberate attention to sampling, setting, and population representativeness.
Question 4 True / False
The choice between qualitative and quantitative methods should be guided primarily by the researcher's epistemological commitments and the nature of the research question, not by convention or disciplinary default.
TTrue
FFalse
Answer: True
Method choice flows from epistemology and research question. A researcher who wants to estimate a causal effect of a policy needs a design capable of causal inference — surveys and quantitative modeling. A researcher who wants to understand how participants construct meaning around an event needs interpretive depth — interviews, ethnography, discourse analysis. Using qualitative methods because 'that's what my field does' without asking whether they can answer the research question produces studies that are methodologically consistent but inferentially hollow.
Question 5 Short Answer
What does it mean to 'work backward from your inferential goal' in research design, and why must this analysis happen before data collection rather than after?
Think about your answer, then reveal below.
Model answer: Working backward means starting from the conclusion you want to be able to draw — for example, a causal claim that X causes Y — and then identifying what data structure would actually license that inference. For a causal claim, that typically means asking: what assignment mechanism (random, natural experiment, instrumental variable) would isolate the effect of X? What comparison group is needed? What measurements at what time points? You then design data collection to produce that structure. This must happen before collection because the inferential power of a study is determined by how data are collected, not by how they are analyzed afterward. You cannot add random assignment, create a comparison group, or introduce an instrument retroactively — those features must be built in from the start.
The discipline of working backward prevents the common error of designing convenient data collection and then asking what can be inferred from it. The question 'what data would let me answer this?' is more productive than 'what can I do with data I have?' The former leads to strong designs; the latter often leads to overstated conclusions from under-powered data structures.