Item Difficulty and Item Discrimination Analysis

Research Depth 76 in the knowledge graph I know this Set as goal
Unlocks 4 downstream topics
item-analysis p-value point-biserial item-revision

Core Idea

Item difficulty is the proportion of test-takers answering an item correctly; item discrimination is the correlation between item response and total score (point-biserial correlation). These indices identify problematic items that fail to contribute effectively to score precision and test reliability.

How It's Best Learned

Calculate p-values and discrimination indices for classroom or standardized test data. Create item analysis reports identifying items for revision or removal based on statistical evidence.

Common Misconceptions

Very high difficulty (p-value near 1.0) is always undesirable. Easy items can be valuable for confidence and accessibility. Similarly, low discrimination doesn't automatically warrant item removal; consider construct relevance and test purpose.

Explainer

Classical test theory and item response functions, which you've studied as prerequisites, both treat individual test items as the unit of analysis for understanding test quality. Item difficulty and discrimination are the two most basic numerical summaries of how a single item is performing — together they are the workhorses of practical test development, review, and revision.

Item difficulty in classical test theory is expressed as the p-value — not the statistical significance p-value, but the proportion of test-takers answering the item correctly. A p-value of 0.80 means 80% answered correctly; 0.30 means 30% did. The scale is counterintuitive: higher p-value means an easier item. For a test designed to discriminate across a wide range of ability, items near p = 0.50 contribute the most information because they split the group. Very easy items (p near 1.0) and very hard items (p near 0.0) tell you little about individual differences — almost everyone gets them right or wrong regardless of ability. But p-value targets must match test purpose: a mastery certification test may legitimately include many easy items if the threshold skill is expected of nearly all competent performers.

Item discrimination measures whether the item distinguishes between high and low scorers on the test overall. The most common index is the point-biserial correlation — the correlation between item response (0 = wrong, 1 = right) and total score. A high point-biserial (typically 0.30+ is considered good) means high scorers mostly got this item right and low scorers mostly got it wrong — the item is pulling in the same direction as the test. A near-zero discrimination means the item is essentially noise, contributing no information about the underlying construct. A *negative* discrimination is a red flag: high-scoring students are getting the item wrong more often than low scorers, which usually signals a miskeyed item (the wrong answer recorded as correct) or a genuinely ambiguous question.

The connection to item response theory (IRT) from your prerequisite is direct: IRT's difficulty parameter (*b*) is a more principled version of the p-value, estimated from the full item characteristic curve rather than a simple proportion. IRT's discrimination parameter (*a*) corresponds to the slope of the curve at the difficulty point — which is what the point-biserial is approximating in simpler form. Classical indices are computationally transparent and sufficient for most routine test review; IRT provides more information at the cost of greater complexity and larger sample requirements. In practice, item analysis combines both indices alongside expert review: statistics diagnose problems, but content knowledge determines the remedy.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionClassical Test Theory FoundationsItem Response Functions and Item Characteristic CurvesItem Difficulty and Item Discrimination Analysis

Longest path: 77 steps · 369 total prerequisite topics

Prerequisites (2)

Leads To (3)