Language production involves conceptualizing what to say, formulating a linguistic structure through lexical selection and grammatical encoding, phonologically encoding the planned utterance, and executing the motor plan. Levelt's blueprint model articulates these stages with evidence from speech errors — spoonerisms, word substitutions, and blend errors reveal the planning units and ordering constraints in the production system. The tip-of-the-tongue state demonstrates partial access to lexical representations, dissociating semantic from phonological retrieval.
Collect and classify naturalistic speech errors, noting that slips tend to involve phonological or morphological units — this reveals the granularity of the encoding stages. Contrasting spoonerisms (segment transposition) with word substitutions reveals that phonological and lexical encoding are distinct.
Language comprehension — your prerequisite — moves from acoustic signal to meaning. Language production runs in roughly the opposite direction: from a preverbal message (the concept or intention to communicate) to an acoustic signal. But as the core misconception warns, this is not merely comprehension in reverse. The systems overlap substantially but also diverge in important ways, and understanding production requires tracing the distinct stages that transform thought into articulated speech.
Levelt's blueprint model articulates production as a cascade through three main stages. Conceptualization is the preverbal stage: deciding what to say, selecting the communicatively relevant aspects of a situation, and constructing a rough propositional representation of the intended message. Formulation is where language enters: the abstract propositional message is mapped onto a linguistic structure through two sub-processes. Lexical selection retrieves the right lemma — the semantic-syntactic entry for a word, specifying its meaning and grammatical role, but not yet its phonological form. Grammatical encoding then assembles the chosen lemmas into a syntactic frame, assigning grammatical roles and word order. Finally, phonological encoding fills in the sounds for each word and specifies the phonetic form of the full utterance. Motor execution then implements the plan as articulatory movement.
Speech errors are the primary evidence for this stage architecture. Consider a spoonerism like "a blushing crow" for "a crushing blow": the initial consonants of two words have been transposed. This tells you that phonological segments (/bl/ and /kr/) are independently mobile planning units — the error happened at the phonological encoding stage, not during conceptualization or lexical selection, because the intended *meanings* were preserved. Word substitutions (saying "table" when you meant "chair") reflect a semantic error at lexical selection — the wrong lemma was retrieved from among semantically related competitors. Blend errors (producing "flutterby" when someone confuses "butterfly" and "flutter") show that multiple competing lexical candidates can be simultaneously active and partially merged. Each error type, systematically collected and classified, maps onto a specific processing stage.
The tip-of-the-tongue (TOT) state is perhaps the most elegantly diagnostic phenomenon in language production. In a TOT, you know what concept you want to express, you may know the word's first letter, number of syllables, and stress pattern, but the full phonological form is inaccessible. This dissociation — intact semantic/lemma-level access with degraded phonological encoding — proves that the semantic and phonological stages of lexical retrieval are separable processes. It also demonstrates that phonological retrieval is the more fragile link in the production chain, and that failure at one stage does not necessarily mean failure at upstream stages.
Monitoring completes the picture: speakers do not simply produce and submit output but continuously check it for errors using an internal speech loop. Levelt proposed that speakers covertly monitor their formulated output before articulation, and overtly monitor their acoustic output after, comparing both against intended content. This explains why self-correction happens before errors are actually spoken (preemptive correction) as well as after. The disfluencies and hesitations pervasive in natural speech — "uh," "um," pauses, restarts — are not failures; they are the visible surface of an active planning and monitoring process. "Um" in particular reliably signals a longer upcoming delay in formulation, and may function as a communicative signal to the listener to hold the conversational floor while planning catches up.