X-ray crystallography determines the three-dimensional atomic structure of molecules by directing an X-ray beam at a crystal, measuring the diffraction pattern produced by the regularly repeating lattice of molecules, and computationally reconstructing the electron density map from which atomic coordinates are derived. It has been the dominant method for protein structure determination, contributing the majority of structures in the Protein Data Bank. The method requires well-ordered crystals, and the resulting structure represents a time-averaged and space-averaged snapshot of the molecule in the crystal environment. Resolution (typically 1.5-3.0 Angstroms for protein crystals) determines the level of atomic detail visible in the electron density map.
X-ray crystallography has been the engine of structural biology for nearly a century, from the first protein structure (myoglobin, 1958) to the vast majority of the ~200,000 structures in the Protein Data Bank today. The method exploits a fundamental physical principle: when X-rays (electromagnetic radiation with wavelength ~1 Angstrom, comparable to interatomic distances) interact with the regular array of atoms in a crystal, they scatter in specific directions determined by the crystal's atomic arrangement. The resulting diffraction pattern — a collection of spots (reflections) on the detector, each with a measurable intensity — encodes the information needed to reconstruct the three-dimensional distribution of electrons in the crystal.
The reconstruction requires solving the Fourier transform that relates the diffraction pattern to the electron density map. Each reflection contributes a wave to the electron density, characterized by its amplitude (derivable from the measured intensity, which is the amplitude squared) and its phase (the relative timing of the wave). The fundamental problem is that detectors measure only intensity — the phases are lost. This is the phase problem, the central computational challenge. Without phases, the Fourier transform cannot be computed. Solutions include molecular replacement (using the phases from a known similar structure), isomorphous replacement (introducing heavy atoms into the crystal and using the intensity differences to derive phases), and anomalous dispersion (exploiting the wavelength-dependent scattering of atoms like selenium to extract phase information).
Once phases are obtained, the electron density map is calculated and interpreted. At good resolution (1.5-2.5 A), the density reveals the protein backbone, side chain orientations, bound ligands, and ordered water molecules. The atomic model is iteratively refined against the experimental data — adjusting atomic coordinates and B-factors (which model atomic mobility/disorder) to minimize the difference between the calculated and observed diffraction patterns. The R-factor and R-free (cross-validation metric) assess agreement between model and data, and stereochemical validation (Ramachandran plot, bond geometry) checks the model against known chemical constraints.
The limitations of crystallography are well understood: it requires crystals (which not all proteins form, especially membrane proteins and large flexible complexes), the crystal environment may distort the structure, and it provides a static picture that obscures dynamics. Despite these limitations, crystallography remains the gold standard for high-resolution structural information and the foundation for structure-based drug design, enzyme mechanism analysis, and understanding molecular recognition. Its combination of atomic resolution, mature methodology, and vast database of solved structures makes it an indispensable tool in structural biology.