A friend reports that five people she knows personally felt much better after taking a new supplement. A large, well-designed double-blind RCT with 1,000 participants found no effect beyond placebo. Which evidence should carry more weight, and why?
AThe friend's reports — she knows these people personally and they have no reason to lie
BThey are equally valid — the RCT is just one study, and anecdotes represent real experiences
CThe RCT — it controls for placebo effects, selection bias, and memory distortion that make the anecdotes unreliable indicators of the supplement's actual effect
DThe friend's reports — firsthand experience is more specific and concrete than statistical averages
The five reports are vivid and personally credible, but they are highly vulnerable to exactly the biases an RCT is designed to eliminate: placebo effect (feeling better because you expect to), confirmation bias (noticing improvement and attributing it to the supplement), and selection bias (the friend may not know the people who tried it and felt nothing). The RCT randomizes participants, uses a control group, and blinds both participants and researchers to eliminate these effects. That is precisely why a well-designed RCT outranks anecdote in the evidence hierarchy.
Question 2 Multiple Choice
Peer review is important in evaluating scientific evidence primarily because:
AIt certifies that published results are correct and will replicate in future studies
BIt prevents researchers with conflicts of interest from publishing
CIt ensures that only credentialed researchers can make empirical claims
DIt raises the floor of reliability by applying expert scrutiny before publication, while not guaranteeing that published findings are correct
Peer review is a quality filter, not a guarantee. It catches many methodological errors, implausible claims, and poorly designed studies before they reach the public — but it is conducted by fallible humans under time pressure and cannot verify every calculation or assumption. Flawed studies are published regularly. The value of peer review is in raising the minimum standard, not in certifying correctness. This is why replication and meta-analysis, which aggregate across many studies, provide stronger evidence than any single peer-reviewed paper.
Question 3 True / False
Calibrating confidence proportionally to evidence means a single well-designed study should substantially increase your certainty about a contested empirical question.
TTrue
FFalse
Answer: False
A single well-designed study is one data point in a developing literature. It should update your beliefs in the direction of the evidence — but not to near-certainty about a contested question. Contested empirical questions remain contested precisely because multiple studies with varying designs and populations have not converged on a consistent answer. Proportional calibration means updating meaningfully but proportionately: one solid study moves you; a consistent pattern of replication across different labs moves you much further.
Question 4 True / False
Firsthand personal experience is generally less reliable evidence than aggregated data from large studies, even though it often feels more compelling.
TTrue
FFalse
Answer: True
This is one of the hardest calibration challenges in critical thinking. Personal experience is vivid, immediate, and emotionally real — but it is a sample of one (or a few), subject to memory distortion, confirmation bias, and the absence of a control condition. Aggregated data from thousands of participants averages out individual variation and controls for confounds that individual experience cannot. The vividness of personal testimony is a psychological property, not an evidential one.
Question 5 Short Answer
Why is personal experience — vivid firsthand testimony — often less reliable as evidence than aggregated data, even though it feels more compelling?
Think about your answer, then reveal below.
Model answer: Personal experience is vulnerable to several systematic biases: selection bias (you remember the cases that confirmed your expectation and forget the ones that didn't), memory distortion (recollections are reconstructive, not photographic), absence of a control condition (you don't know what would have happened without the thing you credit), and tiny sample size (one or a few cases cannot represent the full distribution of outcomes). Aggregated data across many participants averages out these individual distortions and uses experimental controls to isolate causal effects. The feeling of compellingness is a feature of vividness, not of epistemic reliability.
This is the central challenge of evidence-based thinking: the psychological features that make evidence feel convincing (personal relevance, concreteness, narrative form) are largely uncorrelated with the features that make it epistemically reliable (large representative samples, controls, independent replication). Training yourself to ask 'how reliable is this type of evidence?' rather than 'does this feel true?' is the core skill this topic builds.