Questions: Preregistration and Research Transparency Planning
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher collects data, runs 15 variations of their analysis (different exclusion criteria, covariates, and outcome measures), finds that one combination yields p = 0.04, and publishes this as 'confirmatory evidence' for their hypothesis. What is the fundamental problem with this approach?
AThe sample size was too small; any result with only 15 analytical variants is unreliable
BThe nominal α = 0.05 no longer reflects the true false positive rate — selecting the significant analysis post-hoc inflates the actual probability of a false positive far above 5%
CThe p-value threshold should be 0.01 for studies with multiple analytical variants
DExploratory analyses cannot be published in peer-reviewed journals
Each undisclosed analytical choice is a fork that multiplies the chance of finding a spurious significant result. If enough variants are tested, significance is almost guaranteed even in pure noise. The p-value only means what it claims (5% false positive rate) when the analysis was specified before looking at the data. Selecting the significant variant and presenting it as confirmatory is p-hacking: the true false positive rate in this example is far higher than the reported p-value implies. Preregistration prevents this by binding the researcher to a single pre-specified analysis.
Question 2 Multiple Choice
What does preregistration make impossible — or at minimum immediately detectable — in a published study?
AConducting any exploratory analyses not mentioned in the original plan
BHARKing (Hypothesizing After Results are Known) — presenting a post-hoc hypothesis as if it were specified in advance
CCollecting additional participants if the original sample was underpowered
DRunning sensitivity analyses or robustness checks on the primary findings
Preregistration timestamps your hypothesis before data collection. If the registered hypothesis differs from what was reported, readers can see the discrepancy. HARKing — finding an unexpected significant result and then writing the paper as though you predicted it — looks like a confirmatory test but is really exploratory. Preregistration makes the distinction transparent. Exploratory analyses, additional participants, and sensitivity analyses are still allowed; they just must be labeled as such rather than presented as confirmatory.
Question 3 True / False
A preregistered study that reports p = 0.04 provides stronger evidence for its hypothesis than an unregistered study reporting the same p = 0.04.
TTrue
FFalse
Answer: True
The p-value is only interpretable at face value when the hypothesis and analysis were specified before the data were collected. In a preregistered study, p = 0.04 means there is a 4% chance of this result under the null hypothesis — no more. In an unregistered study, the researcher may have examined many analyses; the reported p = 0.04 could reflect one significant result from dozens of attempts, making the actual false positive rate much higher. Same number, very different evidential meaning.
Question 4 True / False
Preregistration prevents researchers from conducting any analyses beyond what was specified in the original plan.
TTrue
FFalse
Answer: False
Preregistration does not prohibit exploratory analysis — it requires that any analysis not in the preregistered plan be clearly labeled as exploratory rather than confirmatory. Sensitivity analyses, robustness checks, and unexpected findings are still valuable and publishable; they just cannot be presented as tests of pre-specified hypotheses. The key move is transparency: label what was planned in advance and what emerged from the data. This distinction — not the prohibition of exploration — is what preregistration achieves.
Question 5 Short Answer
Explain how researcher degrees of freedom can inflate the false positive rate even when no individual analytical decision is dishonest or intended to deceive.
Think about your answer, then reveal below.
Model answer: Each legitimate-seeming analytical choice — which participants to exclude, whether to transform a variable, which covariates to include, which of several measured outcomes to report — is a branch point. If a researcher (consciously or not) gravitates toward choices that produce significant results and away from those that don't, the final analysis is implicitly selected from many possible analyses. Even if each choice seems reasonable in isolation, the effective Type I error rate across all those choices far exceeds the nominal α. Simulations show that even a handful of unconstrained choices can push the real false positive rate above 60% while appearing to be 5%.
This is why the problem is structural, not moral. Researchers don't need to be dishonest — natural motivated reasoning, the preference for coherent narratives, and confirmation bias guide analytical choices in ways the researcher may not notice. Preregistration removes the degrees of freedom by specifying choices before the data can influence them, making the p-value mean what it claims.