Explain why long reads are essential for resolving structural variants that short-read sequencing misses.
Think about your answer, then reveal below.
Model answer: Structural variants (SVs) — insertions, deletions, inversions, duplications, and translocations >50 bp — often involve or are flanked by repetitive sequences. Short reads (150-300 bp) cannot span these events: a 5-kb deletion flanked by homologous repeats produces ambiguous short-read alignments because the reads from flanking repeats map to multiple locations. Long reads spanning the entire SV, including both breakpoints and flanking unique sequences, resolve the variant unambiguously. Studies comparing short-read and long-read SV calling on the same genomes find that long reads detect 2-3 times more SVs, revealing a previously hidden layer of genetic variation.
The T2T (Telomere-to-Telomere) Consortium used ultra-long Oxford Nanopore reads and PacBio HiFi to complete the first gapless human genome assembly in 2022, filling in centromeres, segmental duplications, and telomeres that had been missing from the human reference for 20 years. This required reads long enough to span the megabase-scale tandem repeats in centromeric regions.