Homology modeling (comparative modeling) predicts a protein's three-dimensional structure based on its sequence similarity to one or more experimentally determined template structures. Because protein structure is more conserved than sequence during evolution, proteins sharing >30% sequence identity typically share the same overall fold, enabling structure prediction by copying the template's backbone and rebuilding side chains and loops. The process involves template identification (BLAST, HHpred), sequence-template alignment, model building (MODELLER, SWISS-MODEL), and model validation (DOPE score, Ramachandran analysis). Homology modeling was the dominant computational structural biology method before AlphaFold, and understanding its principles remains important for interpreting and evaluating all predicted structures.
For decades before AlphaFold, the most reliable method for predicting a protein's structure was to find a relative with a known structure and copy it. This is homology modeling — the computational analog of the evolutionary insight that structure is more conserved than sequence. Two proteins that diverged from a common ancestor millions of years ago may share only 30% of their amino acid sequence, yet their three-dimensional folds are remarkably similar. Homology modeling exploits this conservation to build a structural model of a target protein using one or more experimentally determined template structures.
The workflow has four steps. Template identification: search the PDB for structures with sequence similarity to the target, using BLAST (sequence search) or HHpred (profile-profile search, more sensitive for distant homologs). Alignment: align the target sequence to the template sequence, determining which residues correspond. This is the most error-prone step — insertions, deletions, and ambiguous regions in the alignment directly produce errors in the model. Model building: programs like MODELLER or SWISS-MODEL copy the template backbone for aligned regions, model insertions/deletions (loops) using ab initio or knowledge-based methods, place side chains using rotamer libraries, and optimize the model through energy minimization. Validation: assess model quality using statistical potential scores (DOPE), stereochemical checks (Ramachandran plot), and comparison to experimental data if available.
The accuracy of homology models depends primarily on the sequence identity to the template. Above 50% identity, models are typically excellent — backbone RMSD to the true structure is ~1 Angstrom, and most side chain orientations are correct. Between 30-50%, the core fold is reliable but loops and surface features have significant errors. Between 20-30% (the "twilight zone"), alignment uncertainty becomes the dominant error source, and model quality varies widely. Below 20%, fold recognition itself becomes unreliable. The most problematic regions in any homology model are loops (connecting secondary structure elements), which diverge rapidly in evolution and are modeled ab initio rather than copied from the template — loop modeling remains one of the hardest unsolved problems in computational structural biology.
AlphaFold has transformed this landscape by using deep learning to predict structures from sequence with accuracy rivaling experimental structures for many proteins. But the conceptual framework of homology modeling remains relevant: AlphaFold's multi-sequence alignment input is effectively an automated version of homology modeling's template search, and its confidence scores (pLDDT, PAE) flag the same regions — loops, disordered segments, domains without homologs — that challenge traditional homology modeling. Understanding why some regions are hard to predict (lack of evolutionary conservation, conformational flexibility, sequence-structure ambiguity) is essential for interpreting any predicted structure, whether from MODELLER or AlphaFold.