Explain the role of highly variable gene (HVG) selection in the scRNA-seq analysis workflow.
Think about your answer, then reveal below.
Model answer: HVG selection identifies genes whose expression varies more across cells than expected from technical noise alone. These genes carry the biological signal that distinguishes cell types and states. By restricting downstream analysis (PCA, clustering) to the top 1,000-3,000 HVGs, you remove the noise contributed by non-informative genes (housekeeping genes, lowly expressed genes dominated by dropout) and focus computational resources on the features that actually differentiate cells. This improves clustering quality and reduces computational cost.
Without HVG selection, PCA would be dominated by technical noise and highly expressed but non-variable housekeeping genes, producing components that do not separate cell types. The variance-mean relationship is used to identify genes with excess variability: genes must have variance above what is expected at their mean expression level to qualify as highly variable.