Population genomics analyzes genome-wide variation across individuals within and between populations to infer demographic history, migration, selection, and adaptation. Key analyses include population structure inference (PCA, ADMIXTURE), selection scans (Fst outliers, extended haplotype homozygosity), demographic modeling (effective population size changes over time), and admixture detection. Whole-genome data provides orders of magnitude more power than single-locus studies, enabling detection of subtle signals like soft sweeps, polygenic adaptation, and recent gene flow between populations.
Download 1000 Genomes Project VCF data for a single chromosome, compute PCA across populations, and plot the first two components. Observe how continental population groups separate. Then compute Fst between populations for each SNP and identify outlier regions that may be under divergent selection.
Population genetics, as a field, developed mathematical theory for how allele frequencies change under mutation, drift, selection, and migration. Population genomics applies these principles to entire genomes, using the massive datasets produced by modern sequencing to answer questions that single-gene studies could not resolve. The genome becomes both the subject of study and the statistical reference frame.
Population structure is typically the first analysis. PCA and model-based methods (ADMIXTURE, STRUCTURE) decompose genome-wide variation into components that reflect shared ancestry. In humans, the first few PCs closely mirror continental geography, reflecting ancient migration patterns. Within continents, finer structure emerges — European PCA mirrors the geographic map of Europe. These patterns inform every downstream analysis: GWAS must correct for structure to avoid confounding, selection scans must distinguish drift from selection, and demographic models must account for population splitting and admixture.
Selection scans search for genomic regions where natural selection has left a detectable signature. Classic selective sweeps produce regions of reduced variation around the selected allele, unusual allele frequency spectra (Tajima's D), elevated Fst between populations, and extended haplotype homozygosity. Genome-wide data enables systematic scanning for these signatures — comparing each locus to the genome-wide distribution to identify outliers. Iconic examples include the lactase persistence allele in European and East African pastoralists, skin pigmentation genes at different latitudes, and malaria resistance alleles in tropical populations. More subtle signals — soft sweeps (selection on standing variation), polygenic adaptation (many loci shifting slightly in the same direction) — require sophisticated statistical methods and very large sample sizes to detect.
Demographic inference uses the patterns of genetic variation across the genome to reconstruct population history. Methods like PSMC (pairwise sequentially Markovian coalescent) estimate changes in effective population size over hundreds of thousands of years from a single diploid genome, by analyzing the distribution of heterozygous sites along the chromosomes. More recent history (thousands of years) can be inferred from rare variants, LD patterns, and identity-by-descent tract lengths. These analyses have revealed population bottlenecks, expansions, and admixture events that corroborate and extend the archaeological and linguistic records of human history, and they are equally powerful when applied to other species for conservation and evolutionary biology.