A trial researcher tests for significance at each of 10 unplanned interim points. After the 7th test, they find p = 0.04 and declare the drug effective. The most serious methodological problem with this approach is:
AThey used too few total participants to draw any conclusion.
BThey should have applied a Bonferroni correction to each individual test.
CRepeated testing without pre-specified stopping rules inflates the cumulative Type I error rate well beyond the nominal α = 0.05.
DThey should not have stopped before 100% enrollment under any circumstances.
Each unplanned peek at the data that could trigger stopping compounds the false-positive probability. Simulations show that 10 unplanned peeks can inflate the effective α to near 20-25%. The problem is not merely which correction to apply after the fact — it is the absence of pre-specification. Sequential analysis solves this by defining stopping rules and adjusted thresholds *before* data collection begins.
Question 2 Multiple Choice
In a properly designed group sequential trial using O'Brien-Fleming boundaries, the critical threshold for stopping at the first interim analysis (25% of planned enrollment) is:
AThe same as the conventional z = 1.96 threshold used at the final analysis.
BLower (more lenient) than at the final analysis, because early stopping requires less evidence.
CHigher (more stringent) than at the final analysis, requiring stronger evidence to stop early.
DChosen by the investigators after inspecting the interim results.
O'Brien-Fleming boundaries are very conservative early — the critical value at 25% enrollment is far above 1.96, requiring very strong evidence before stopping. The threshold approaches the conventional value only near the final planned analysis. This ensures that early stopping occurs only when the evidence is overwhelming, maintaining overall Type I error control. Stopping rules must always be pre-specified, never chosen after seeing the data.
Question 3 True / False
A properly designed group sequential trial that stops early for efficacy produces conclusions with lower statistical rigor than a conventional fixed-sample trial of the same intervention.
TTrue
FFalse
Answer: False
The core insight: properly designed sequential analysis is not a methodological shortcut. The stopping rules and alpha spending functions are calibrated precisely so that the overall Type I error rate is maintained at the nominal α across all analyses combined. Early stopping requires *stronger* evidence at interim stages, and the final inference is equally rigorous as a conventional design — often achieved with fewer participants when the true effect is large.
Question 4 True / False
If a trial pre-specifies exactly three interim analyses at 33%, 66%, and 100% enrollment with adjusted critical thresholds based on an alpha spending function, the overall Type I error rate across all three analyses is maintained at the nominal α.
TTrue
FFalse
Answer: True
This is the purpose of the alpha spending function: it allocates the total α budget across the planned analyses, adjusting the critical value at each look so that the cumulative probability of any false positive across all tests equals the nominal α. Pre-specification is essential — the guarantees hold because the rules were fixed before data were observed.
Question 5 Short Answer
Why does repeatedly testing accumulating data without pre-specified stopping rules inflate the overall Type I error rate, even when each individual test uses p < 0.05?
Think about your answer, then reveal below.
Model answer: Each test has a 5% false-positive probability assuming the null is true. But performing multiple tests on the same accumulating dataset means the probability of obtaining *at least one* false positive compounds across tests. Random variation will, with increasing probability, cross any fixed threshold if given enough opportunities. Sequential designs solve this by pre-specifying the number and timing of analyses and using adjusted critical thresholds that distribute the total α budget — ensuring the cumulative probability of a false positive stays at the nominal level across all analyses.
The inflation is a direct consequence of the multiple comparisons problem applied to repeated looks at the same study. Pre-specification and adjusted thresholds are the solution — not post-hoc corrections or avoiding sequential testing altogether.