Metagenomics sequences all DNA from an environmental sample (soil, ocean, gut) to characterize the community of organisms present without culturing them individually. Amplicon sequencing (16S/18S/ITS) uses a single marker gene for taxonomic profiling, while shotgun metagenomics sequences all DNA randomly, enabling both taxonomic and functional characterization. Computational challenges include assembling genomes from mixed communities (metagenome-assembled genomes, or MAGs), binning contigs by organism of origin, and handling uneven coverage across species. Metagenomics has revealed vast microbial diversity, with most environmental microbes unculturable by standard methods.
Analyze a 16S rRNA amplicon dataset from a human gut sample using QIIME2: denoise with DADA2, assign taxonomy, compute alpha and beta diversity, and compare communities between healthy and diseased individuals. Then examine a shotgun metagenomics dataset and see how functional profiling (HUMAnN) adds information that 16S alone cannot provide.
Most microorganisms cannot be grown in laboratory culture — estimates suggest 99% of environmental microbes resist standard culturing techniques. Before metagenomics, these organisms were invisible to science. By extracting and sequencing all DNA from an environment, metagenomics bypasses culture entirely, opening a window into the full diversity of microbial communities in any habitat: soil, oceans, the human gut, deep-sea vents, hospital surfaces.
The two main approaches serve different purposes. Amplicon sequencing (most commonly 16S rRNA for bacteria) PCR-amplifies a specific marker gene from the community DNA, sequences the amplicons, and uses the sequences to identify which organisms are present and at what relative abundances. This is fast, inexpensive, and well-standardized, but it only tells you who is there — not what they can do. It also targets only organisms with the selected marker gene (16S misses viruses and eukaryotes). Shotgun metagenomics fragments all community DNA and sequences it without any targeted amplification. This captures everything — bacterial, archaeal, viral, eukaryotic, and plasmid DNA — and enables both taxonomic profiling (by matching reads to reference databases with tools like Kraken2 or MetaPhlAn) and functional profiling (mapping reads to gene databases with HUMAnN to identify metabolic pathways present in the community).
The most ambitious metagenomic analysis is genome reconstruction. By assembling reads into contigs and then grouping contigs by organism (binning), researchers can reconstruct near-complete genomes of uncultured organisms — metagenome-assembled genomes (MAGs). Binning algorithms use two signals: sequence composition (each organism has a characteristic GC content and tetranucleotide frequency) and coverage co-variation (contigs from the same genome should have correlated abundance patterns across multiple samples). Tools like MetaBAT2 and MaxBin2 automate this process. Quality assessment (CheckM) evaluates completeness and contamination by checking for expected single-copy marker genes. High-quality MAGs have enabled the discovery of entirely new phyla, metabolic capabilities, and ecological roles, expanding the tree of life dramatically.
Metagenomic studies have transformed our understanding of human health (the gut microbiome influences digestion, immunity, and even neurological function), agriculture (soil microbiomes affect crop productivity), and ecology (ocean microbiomes drive global carbon cycling). The field continues to evolve with long-read sequencing enabling more complete MAGs, metatranscriptomics (RNA-seq of communities) revealing which genes are actually active, and integration with metabolomics to connect community function to measured biochemistry.
No topics depend on this one yet.