Questions: Test Development Workflow and Project Management
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A testing company completes a large validation study demonstrating strong evidence for their new achievement test, then archives all project files and begins operational use without maintaining further validity records. What is the primary problem with this approach?
AThe sample size in a single study is always insufficient to establish validity claims
BValidation evidence must accumulate across uses, populations, and time — a single study is a beginning, not an endpoint
CMarketing an operational test requires ongoing data collection to satisfy licensing requirements
DDocumentation is only necessary if the test will be used in high-stakes decisions
Validation is not a one-time event. Evidence accumulates across different populations, administration conditions, score uses, and time periods — each adding to or qualifying the body of validity support. A single study, however well-designed, cannot anticipate all future uses or subgroup differences. Ceasing documentation after one study also creates a legal and scientific liability: if validity claims are later challenged, the only defense is the accumulated record of decisions and evidence. Option A reflects the misconception that validity is a threshold property unlocked by sufficient N rather than a body of ongoing evidence.
Question 2 Multiple Choice
A test developer writes detailed test specifications — defining the construct, target population, score use, and content blueprint — before any items are written. This practice primarily serves to:
ASatisfy administrative requirements set by the credentialing board overseeing the test program
BEnsure items will have high difficulty levels, increasing their discriminating power
CAnchor content validity by ensuring that item development serves explicitly defined measurement goals rather than the developer's intuitions
DCalculate the sample size needed for the subsequent pilot study
Test specifications are the blueprint from which content validity is built. They define what the test should measure, for whom, and under what conditions — before a single item exists. This forces construct definition to happen explicitly rather than emergently from whatever items happen to get written. Items developed without specifications often produce a test that measures something vague or that drifts from its claimed construct. This is the engineering analogy: design before building, so the product meets its specifications by construction rather than by hope.
Question 3 True / False
Pilot data collected from an initial sample can typically serve as the normative base for operational test score interpretations, provided the pilot sample exceeds 200 participants.
TTrue
FFalse
Answer: False
Pilot samples are designed for item evaluation — estimating difficulty, discrimination, and model fit — not for norming. Stable norms require large, carefully stratified, representative samples that match the intended test-taking population across demographic and geographic variables. Pilot samples are rarely representative enough and almost never large enough for normative purposes. Using pilot data as norms introduces systematic bias and instability into score interpretations — a form of construct-irrelevant variance that invalidates the score scale.
Question 4 True / False
Documentation of test development decisions — including why cutoff scores were set at specific values, which items were revised and why, and what equating model was used — is essential for both scientific credibility and legal defensibility of the test program.
TTrue
FFalse
Answer: True
Documentation is not administrative overhead; it is the only record that allows validity claims to be evaluated, replicated, or defended years after the fact. When a test is challenged legally or scientifically — as high-stakes tests routinely are — the burden of proof falls on the test developer. Without documented rationale for each major decision, 'the test is valid' is an assertion without evidence. The Explainer makes the strong claim: a test without adequate documentation is a scientific and legal liability, not merely an organizational inconvenience.
Question 5 Short Answer
Why is test development described as an engineering process rather than a research process, and what does that analogy reveal about when validity should be built in?
Think about your answer, then reveal below.
Model answer: Engineering designs to meet specifications before building; it does not build first and then test whether requirements were met. Applied to test development, this means validity must be engineered in from the start — through explicit construct definition, content blueprinting, and item development guided by specifications — rather than hoped for after data are collected. Research can afford to be exploratory and discover what was actually measured; a test used for high-stakes decisions cannot. The analogy reveals that a test without upfront specifications is like a bridge built without load calculations: it might work, but there is no principled reason to expect it to, and no defense if it fails.
The contrast with research is important: in basic research, discovering that you measured something unexpected can be a finding. In applied test development, measuring something unexpected is a validity failure. The engineering frame forces developers to ask 'what are the requirements?' before asking 'how do we build it?' — which is exactly the order test specifications impose on item development.