Content validity evaluates whether test items adequately sample and represent the domain or construct being measured. Content validity rests on expert judgment and logical analysis rather than statistical indices. It is essential for educational achievement tests, credential exams, and domain-specific assessments.
From your study of the reliability-validity relationship, you know that validity is about whether a test measures what it claims to measure. Content validity is the most foundational form of that question, and it is answered differently from the statistical validity evidence you encounter elsewhere. You cannot compute a correlation coefficient and call it content validity — it lives in the logical relationship between the test and the domain, evaluated before data collection even begins.
The central idea is domain representation: the test is essentially a sample drawn from a larger universe of possible questions about the construct. For a licensing exam in nursing, that universe includes everything a competent nurse must know and do. Content validity asks whether the items on the exam actually cover that universe proportionally — not just the easy or frequently-tested parts, but the full scope of relevant knowledge and skill. A chemistry exam that only tests nomenclature while ignoring stoichiometry has poor content validity even if its items are reliable and well-written. The sampling logic is the issue, not the items themselves.
Because this is a sampling question, it requires expert judgment to define the domain and evaluate the coverage. This typically involves a structured process: first, a domain map or table of specifications is created (often called a content blueprint), specifying the major categories and their relative weights. Then, subject matter experts rate each item for relevance and representativeness, often using a structured rating form. A common quantitative output is the content validity ratio (CVR), where experts classify each item as essential, useful but not essential, or not necessary, and the ratio of "essential" votes above chance determines whether the item survives. But the CVR is a tool for organizing expert judgment, not a substitute for it.
The limits of content validity are important to understand. Even a perfectly representative item sample does not guarantee that the test measures the underlying construct well — a poorly written item could cover the right content while measuring reading comprehension more than substantive knowledge. Content validity is a necessary but not sufficient condition for overall validity. It is also inherently subjective in ways that require structured processes to manage. Two expert panels with different disciplinary perspectives may disagree substantially about what belongs in a domain, which is why explicit specifications and systematic review procedures are standard practice in high-stakes test development.