Explain the difference between de novo assembly and reference-guided assembly, and when you would choose each approach.
Think about your answer, then reveal below.
Model answer: De novo assembly reconstructs the genome from reads alone, without using any existing reference sequence. It is necessary when no closely related reference exists (novel organisms, highly divergent strains) and when you want to detect structural variants, novel sequences, or rearrangements that a reference would miss. Reference-guided assembly maps reads to an existing reference genome and calls variants relative to it. It is faster, less computationally demanding, and appropriate when a high-quality reference from a closely related organism is available and the goal is to identify variants rather than discover novel genomic content.
In practice, many projects use both: reference-guided assembly for variant calling and a de novo assembly to capture sequences absent from the reference. The human genome reference (GRCh38) is the backbone of most human genomics, but de novo assembly of individual genomes has revealed substantial structural variation missed by reference-guided approaches.