Why do DESeq2 and edgeR use the negative binomial distribution rather than the Poisson distribution to model RNA-seq count data?
AThe negative binomial distribution is computationally faster
BRNA-seq counts have more variance than the Poisson distribution can accommodate due to biological variability between replicates
CThe Poisson distribution cannot handle zero counts
DThe negative binomial distribution is required for normalized data
The Poisson distribution assumes the mean equals the variance — appropriate if the only source of variability were random sampling of reads. But biological replicates of the same condition show additional variability (biological dispersion), making the variance exceed the mean. The negative binomial distribution has a separate dispersion parameter that captures this extra-Poisson variability (overdispersion). DESeq2 and edgeR estimate gene-specific dispersion by borrowing information across genes with similar expression levels, enabling accurate statistical testing even with few replicates.
Question 2 True / False
Applying a p-value threshold of 0.05 without multiple testing correction is appropriate when testing 20,000 genes for differential expression.
TTrue
FFalse
Answer: False
Testing 20,000 genes at p < 0.05 would produce approximately 1,000 false positives by chance alone (5% of 20,000). Multiple testing correction is essential. The Benjamini-Hochberg procedure controls the false discovery rate (FDR) — ensuring that among all genes declared significant, only a specified proportion (typically 5% or 10%) are expected to be false discoveries. This is less conservative than Bonferroni correction (which controls the family-wise error rate) but much more appropriate for genomics, where finding most true positives matters alongside controlling false ones.
Question 3 Short Answer
Explain why increasing the number of biological replicates improves differential expression analysis more than increasing sequencing depth per sample.
Think about your answer, then reveal below.
Model answer: Biological replicates capture the true variability between independent samples of the same condition, which is what the statistical test needs to estimate to determine whether observed differences between conditions are real. Sequencing depth reduces technical sampling noise but does not reduce biological variability. Beyond a moderate depth (~10-20 million reads for most RNA-seq experiments), additional reads provide diminishing returns for DGE because the variance is dominated by biological rather than technical components. More replicates directly improve the precision of the dispersion estimate and increase statistical power to detect true expression differences.
This is one of the most important experimental design principles in RNA-seq. Three replicates per condition is often treated as a minimum, but power analyses consistently show that increasing from 3 to 6 replicates improves DGE sensitivity far more than doubling sequencing depth. The budget is almost always better spent on more replicates.