DNA sequencing determines the precise order of nucleotides in a DNA molecule. Sanger sequencing (1977) uses chain-terminating dideoxynucleotides to produce fragments of every possible length, separated by size to read the sequence. Next-generation sequencing (NGS) platforms like Illumina massively parallelize sequencing-by-synthesis, generating millions to billions of short reads (75-300 bp) simultaneously at dramatically lower cost per base. Each technology involves tradeoffs between read length, accuracy, throughput, and cost that determine its suitability for different applications.
Trace through the Sanger method manually: draw a template strand, show how ddNTPs terminate chains at every position, and reconstruct the sequence from the resulting ladder. Then compare the conceptual workflow to Illumina sequencing-by-synthesis, noting what changed (parallelization, detection method) and what stayed the same (complementary strand synthesis with modified nucleotides).
DNA sequencing is the enabling technology of modern genomics — virtually every topic in this course depends on it. Understanding the principles, capabilities, and limitations of sequencing technologies is essential for designing experiments, interpreting data, and appreciating why certain computational challenges exist.
Sanger sequencing (also called chain-termination sequencing) was developed by Frederick Sanger in 1977 and dominated for nearly three decades. The method exploits modified nucleotides — dideoxynucleotides (ddNTPs) — that lack the 3'-hydroxyl group required for chain elongation. When a DNA polymerase incorporates a ddNTP instead of a normal dNTP, synthesis terminates at that position. By running the reaction with a mixture of normal dNTPs and a small proportion of fluorescently labeled ddNTPs, the polymerase produces fragments that terminate at every possible position in the template. Capillary electrophoresis separates these fragments by size, and a laser reads the fluorescent label on each fragment as it passes the detector. Reading the colors from smallest to largest fragment gives the sequence. Sanger reads are long (~800 bp) and highly accurate (99.99%), but throughput is limited to one read per capillary.
Next-generation sequencing (NGS), exemplified by Illumina's platform, achieved a throughput revolution by parallelizing sequencing-by-synthesis across millions of clusters on a glass flow cell. The workflow begins by fragmenting the DNA, ligating adapters, and amplifying fragments on the flow cell surface to form clusters of identical molecules. Sequencing proceeds by adding fluorescently labeled reversible terminators — modified nucleotides that allow incorporation of exactly one base per cycle, followed by imaging to identify which base was added, then chemical removal of the terminator to allow the next cycle. After 75-300 cycles, each cluster has produced one read. Because millions of clusters are sequenced simultaneously, a single Illumina run can generate hundreds of gigabases of data.
The choice of sequencing technology depends on the application. Sanger remains preferred for validating specific mutations, sequencing single genes, and applications where per-read accuracy matters more than throughput. Illumina dominates for whole-genome sequencing, RNA-seq, ChIP-seq, and any application requiring deep, cost-effective coverage. The short read lengths of Illumina (75-300 bp) create challenges for genome assembly in repetitive regions and for resolving structural variants, which motivated the development of third-generation long-read technologies. Each technology's strengths and limitations propagate directly into the computational methods used to analyze its output.