Construct validity addresses whether your measures and manipulations actually represent the constructs you intend to study. A measure with poor construct validity may correlate with your outcomes due to method variance, item ambiguity, or shared confounds rather than the intended construct. Establishing construct validity requires convergent evidence (correlating with other indicators of the same construct) and discriminant evidence (not over-correlating with different constructs).
Create a multitrait-multimethod matrix comparing your measure with alternative measures of the same construct and measures of related but distinct constructs. Conduct confirmatory factor analysis to verify dimensional structure. Analyze whether your effects replicate with different operationalizations and measures.
From your earlier work on construct definition, you know that a psychological construct — anxiety, working memory, self-efficacy — is an unobservable theoretical entity that must be made measurable through operationalization: choosing or designing indicators that stand in for it. Construct validity is the question that follows naturally: does your operationalization actually capture the construct, or does it capture something else? A scale purporting to measure "depression" might actually be measuring response fatigue, social desirability, or the tendency to endorse any negative statement. The construct validity problem is real whenever you cannot directly observe what you claim to measure — which in psychology is almost always.
The classic framework for evaluating construct validity uses two complementary types of evidence. Convergent validity asks whether your measure correlates substantially with other indicators of the same construct: does your new anxiety scale correlate with existing anxiety measures, with observer-rated anxiety, with physiological stress markers? If it doesn't correlate with any of these, it is probably not measuring anxiety. Discriminant validity asks the complementary question: does your measure *not* over-correlate with measures of different constructs? If your anxiety scale correlates just as strongly with depression measures as with other anxiety measures, you may be measuring general negative affect or neuroticism rather than anxiety specifically. Campbell and Fiske's multitrait-multimethod matrix formalizes both tests: you measure multiple traits using multiple methods and examine the pattern of correlations. Valid measurement produces high same-trait/different-method correlations (convergent) and low different-trait/same-method correlations (discriminant).
Method variance is the construct validity threat most often overlooked by beginning researchers. If you measure self-esteem, depression, and loneliness all with self-report Likert scales administered in the same session, their intercorrelations will be inflated by shared method — the systematic tendency for people to be more or less acquiescent, more or less extreme in their responses, more or less influenced by mood at the moment of testing. None of this variance belongs to the constructs; it belongs to the measurement method. The solution is to vary methods across constructs: use behavioral observation, physiological measures, or informant ratings alongside self-report. When a correlation holds across methods, you can be more confident it reflects the constructs rather than the measurement apparatus.
Because construct validity is established by evidence accumulated across multiple studies, it is a property that can erode or improve over time. A measure validated on college students in the 1980s may have poor construct validity with clinical populations, older adults, or non-Western samples where the construct has different meanings or manifestations. Validity generalization — asking whether validity evidence from one context transfers to another — is itself a research question, not an assumption. This is why calling a measure "validated" without specifying for whom and for what purpose is misleading. Validity is always validity for a particular interpretation, in a particular population, for a particular use.