Working memory has limited capacity, approximately 4-7 items (Miller's magical number). However, chunking—organizing information into meaningful groups—allows us to overcome these limitations by increasing the amount of information per item. Expertise often involves developing sophisticated chunk systems in domain-specific knowledge.
Your prerequisite on working memory's prefrontal circuits established that working memory is a limited-capacity workspace in the brain — a mental "scratchpad" that holds information in an active, manipulable state. The critical question now is: how limited, exactly, and can anything be done about it? George Miller's landmark 1956 paper gave us the answer to the first part: the magical number seven, plus or minus two. In controlled experiments, people can reliably hold roughly 4–7 discrete items in working memory at once before errors spike. But "item" is doing a lot of work in that sentence — and understanding what counts as an item is the key to understanding chunking.
A chunk is a unit of information that has been bound together through prior learning into a single meaningful package. When you read the letters C, A, T separately, that's three items. When you read "CAT" as a word you already know, it's one chunk — and the word likely activates rich semantic associations at no additional working memory cost. A chess expert doesn't see 32 pieces when they look at a board; they see 5–7 recognizable attack formations and defensive structures. This is why experts can reconstruct game positions from memory far better than novices when the position is from a real game, but perform no better when pieces are placed randomly — the chunks only exist for meaningful configurations that match stored patterns. Chunking doesn't increase the number of slots in working memory; it increases the information density of each slot.
The connection to cognitive load theory (your soft prerequisite) is direct: cognitive load theory describes the *instructional* implications of these limits, while chunking describes the *learner-state* mechanism. Extraneous cognitive load wastes working memory slots on irrelevant processing. Germane load builds new chunks that make future tasks cheaper. An expert's superior working memory performance on domain tasks isn't better hardware — it's richer software. They've offloaded much of the computational work into long-term memory, leaving more working memory capacity for the novel aspects of the problem.
The practical implication runs deep: if you want to teach complex skills, the bottleneck is often chunking, not raw intelligence. A beginning programmer who must consciously recall syntax rules, loop structure, and variable scoping simultaneously hits working memory limits fast. An experienced programmer has chunked all of that; their working memory is free to reason about architecture. This is why worked examples and deliberate practice with feedback are so effective — they are, mechanistically, chunk-building exercises. The path from novice to expert is partly a path from fragmented items to densely packed, automatically accessed chunks.