Transcription factors recognize specific DNA sequences through sequence-specific contacts in the major groove, where amino acids hydrogen bond to specific bases. The DNA-binding domain structure (zinc fingers, helix-turn-helix, basic leucine zipper, helix-loop-helix) determines which DNA sequences are recognized. Specificity arises from both direct base contacts and indirect DNA bending effects, and is often cooperative, where multiple transcription factors enhance each other's recruitment.
You already know that transcription factors regulate gene expression by binding to specific DNA sequences near promoters or enhancers, and that proteins fold into defined three-dimensional structures. The question this topic answers is: how does a protein "read" a DNA sequence? The answer lies in the geometry of the double helix itself. The major groove of DNA is wide enough to expose the edges of base pairs to incoming proteins, and crucially, each of the four possible base pairs (A-T, T-A, G-C, C-G) presents a unique pattern of hydrogen bond donors and acceptors in the major groove. A transcription factor does not need to unwind the DNA to read it — it simply slides amino acid side chains into the major groove and forms hydrogen bonds with the exposed edges of the bases.
Different families of transcription factors use different structural motifs to accomplish this reading. A zinc finger domain uses a zinc ion to stabilize a small protein fold that inserts an alpha helix into the major groove; each finger typically contacts three base pairs, and multiple fingers can be linked together to recognize longer sequences. The helix-turn-helix motif, found in many bacterial regulators, positions a "recognition helix" directly in the major groove while a second helix stabilizes the overall orientation. Basic leucine zipper (bZIP) proteins dimerize through their leucine zipper region, then grip the DNA like a pair of tweezers, with their basic regions contacting the major groove on opposite sides. Each structural family has evolved to solve the same problem — achieving sequence-specific binding — through a different architectural strategy.
Specificity is not just about direct base contacts. Indirect readout refers to the transcription factor's ability to sense the intrinsic shape of the DNA at a given sequence. Some sequences are inherently more flexible or curved than others, and a transcription factor may preferentially bind DNA that bends easily into the conformation it requires. This is why two binding sites with slightly different sequences can have very different affinities — even if the direct contact residues are the same, the DNA's mechanical properties at those sites may differ substantially.
A single transcription factor binding its target sequence is often insufficient to activate transcription. Cooperative binding amplifies specificity dramatically. When two or more transcription factors bind adjacent sites, they can physically interact with each other, stabilizing each other's binding. This means the combination of factors bound together is far more stable than either would be alone. Cooperativity converts a modest preference for a given DNA sequence into a sharp, switch-like response: either all the right factors are present and the gene turns on, or they are not and it stays off. This combinatorial logic is how a limited number of transcription factors — roughly 1,500 in the human genome — can regulate tens of thousands of genes with extraordinary precision.