Atomic Model Refinement

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Core Idea

Atomic model refinement is the iterative process of improving a macromolecular structure by adjusting atomic coordinates, B-factors (temperature factors), and occupancies to maximize the agreement between the model and the experimental data while maintaining reasonable stereochemistry. Starting from an initial model (obtained from molecular replacement, experimental phasing, or automated building), refinement alternates between reciprocal-space refinement (adjusting parameters computationally to minimize the difference between calculated and observed structure factors, using programs like REFMAC5 or phenix.refine) and real-space refinement (manual rebuilding in COOT, where the crystallographer inspects the electron density map and corrects errors in chain tracing, side chain conformers, and ligand placement). Chemical restraints (target bond lengths, angles, torsions from small-molecule databases) prevent the model from adopting physically unreasonable geometry during the optimization, acting as regularization that is especially critical at medium to low resolution where the data do not unambiguously define all atomic positions.

Explainer

After solving the phase problem and obtaining initial phases (via molecular replacement, isomorphous replacement, or anomalous dispersion), the crystallographer has an electron density map and a rough initial model. This model is far from final — it may have incorrect side chain conformations, missing loops, wrong rotamers, imprecise backbone geometry, and no water molecules or ligands. Refinement is the process of systematically improving this model, and it is where the crystallographer spends the majority of their time during structure determination. The goal is a model that simultaneously explains the experimental data (low R-free) and obeys the laws of chemistry (reasonable bond geometry, no steric clashes).

Reciprocal-space refinement is the computational workhorse. Programs like REFMAC5 (CCP4 suite) and phenix.refine (PHENIX suite) adjust atomic parameters — the x, y, z coordinates, the B-factor (modeling atomic displacement/disorder), and sometimes occupancy (fraction of molecules in a given conformation) — to minimize a target function that measures the discrepancy between the observed structure factor amplitudes |F_obs| and those calculated from the model |F_calc|. Modern programs use maximum likelihood targets that weight each observation by its estimated error, giving more influence to well-measured reflections. The optimization is constrained by geometric restraints: target values for bond lengths, bond angles, dihedral angles, and planarity of aromatic rings, derived from the ultra-high-resolution small-molecule structures in the Cambridge Structural Database. These restraints are essential because at typical protein resolution (2-3 A), the data alone do not uniquely determine all atomic positions — restraints provide the additional information needed to keep the model chemically reasonable.

Real-space refinement and manual rebuilding in programs like COOT complement the automated reciprocal-space optimization. After each round of reciprocal-space refinement, the crystallographer visualizes the electron density maps — the 2Fo-Fc map (showing what the density looks like overall) and the Fo-Fc difference map (showing where the model disagrees with the data). Positive Fo-Fc peaks indicate unmodeled features (missing atoms, ligands, waters); negative peaks indicate features in the model not supported by the data (wrong conformations, overfitted positions). The crystallographer corrects errors by manually adjusting the model in COOT: flipping peptide bonds, rotating side chains to match the density, adding water molecules into positive difference peaks, and rebuilding loop regions that automated refinement could not handle. This manual step requires expertise — the ability to read electron density maps, recognize common error patterns, and judge when density is clear enough to model versus too ambiguous to interpret.

The refinement cycle — reciprocal-space refinement followed by real-space inspection and rebuilding — typically requires 5-20 rounds before convergence. Each round improves the phases (because the model is better), which improves the maps (because phases are better), which reveals new features and errors that can be corrected in the next round. Convergence is assessed by monitoring R-free (which should decrease monotonically if the model is genuinely improving), geometric statistics (Ramachandran plot, MolProbity clashscore), and the crystallographer's judgment that the maps no longer show features requiring correction. The final model, along with its associated structure factor data and validation statistics, is deposited in the Protein Data Bank — but the model is only as good as the refinement process that produced it, which is why understanding refinement is essential for anyone who uses crystal structures.

Core Idea

Explainer

Prerequisite Chain

Prerequisites (3)

Leads To (1)