Questions: Hypothesis Testing: Framework and Logic
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher obtains p = 0.03 and states: 'There is a 3% probability that the null hypothesis is true.' What is wrong with this interpretation?
ANothing — this is the correct definition of the p-value
BThe p-value is P(observing data this extreme or more extreme | H₀ is true), not P(H₀ is true | data). The researcher has reversed the conditioning.
CThe error is using 0.03 instead of 1 − 0.03 = 0.97 as the probability
DThe p-value only measures probability under the alternative hypothesis, not the null
This is the most important and most common misinterpretation of p-values. The p-value conditions on H₀ being true and asks how probable the observed data (or more extreme data) would be. It says nothing directly about the probability of H₀ itself. P(H₀ | data) is a Bayesian posterior probability that requires a prior; the frequentist p-value is P(data | H₀). Confusing these is called the 'prosecutor's fallacy' or the 'base rate neglect' error.
Question 2 Multiple Choice
A study with significance level α = 0.05 obtains p = 0.08. Which conclusion is correct?
AAccept H₀ — the data confirm the null hypothesis
BReject H₀ — the p-value is close enough to 0.05 to be practically significant
CFail to reject H₀ — the data are consistent with H₀, though this does not prove H₀ is true
DReject H₁ — the alternative hypothesis has been disproved
When p ≥ α, the correct language is 'fail to reject H₀,' never 'accept H₀.' Failing to reject means the data are consistent with H₀ — not that H₀ is true or confirmed. H₀ could be false but the study lacked sufficient power to detect the effect. 'Accept H₀' is wrong because hypothesis testing cannot prove a null hypothesis; it can only provide evidence against it. The distinction matters practically: a study with low power may 'fail to reject' a false H₀ frequently.
Question 3 True / False
A p-value of 0.04 means there is a 96% probability that the alternative hypothesis H₁ is correct.
TTrue
FFalse
Answer: False
The p-value does not measure the probability that any hypothesis is true or false. It is P(data this extreme or more | H₀ true) — a statement about how surprising the data are under H₀, not a statement about the probability of H₀ or H₁. The probability that H₁ is correct would require Bayesian methods and a prior probability for H₁. This misconception is extremely common and leads to overconfidence in the strength of statistical evidence.
Question 4 True / False
Lowering the significance level α from 0.05 to 0.01 reduces the Type I error rate but also reduces the probability of detecting a true effect (statistical power).
TTrue
FFalse
Answer: True
Type I error (false positive) is the probability of rejecting a true H₀, and its rate is controlled by α — lower α means fewer false positives. However, making α smaller also moves the rejection threshold farther into the tail, so smaller effects that are genuinely real become harder to detect. Power = 1 − P(Type II error) decreases as α decreases, for fixed sample size and true effect size. The two error types trade off: you cannot simultaneously minimize both without increasing sample size.
Question 5 Short Answer
Explain the logical structure of hypothesis testing: why does a very small p-value lead to rejecting H₀, and what does 'failing to reject' H₀ actually mean?
Think about your answer, then reveal below.
Model answer: Hypothesis testing works like proof by contradiction. You assume H₀ is true and derive the distribution of a test statistic under that assumption. The p-value is the probability of observing data as extreme as yours if H₀ were true. A very small p-value means: 'If H₀ were true, what we observed would be extremely unlikely.' This undermines the assumption — just as a contradiction undermines an assumed premise. 'Failing to reject' means the data are not sufficiently surprising under H₀ to warrant rejecting it; it does not mean H₀ is true, only that the evidence against it is insufficient.
The asymmetry of the logic is important: we can gather strong evidence against H₀ (very small p-value) but we can never gather evidence that definitively proves H₀. This is why 'fail to reject' — not 'accept' — is the correct language when p ≥ α. Low power makes false negatives more likely, so absence of significance is not evidence of absence of effect.