A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Single-Cell RNA Sequencing

Research Depth 246 in the knowledge graph ☐ I know this ☆ Set as goal

2topics build on this

1,606prerequisites beneath it

Differential Gene Expression Analysis RNA-seq Analysis Pipeline→→Single-Cell Trajectory Analysis Spatial Transcriptomics

Core Idea

Single-cell RNA sequencing (scRNA-seq) profiles gene expression in individual cells rather than bulk tissue averages, revealing cellular heterogeneity, rare cell types, and cell state transitions. Droplet-based platforms (10x Genomics Chromium) encapsulate single cells with barcoded beads to tag each cell's transcripts uniquely. Analysis involves quality filtering, normalization, dimensionality reduction (PCA, UMAP), clustering to identify cell types, and differential expression between clusters. scRNA-seq has revealed that tissues previously thought to be homogeneous contain diverse cell populations with distinct transcriptional programs.

How It's Best Learned

Analyze a published scRNA-seq dataset (e.g., PBMCs from 10x Genomics) using Scanpy or Seurat. Perform the standard workflow: filter low-quality cells, normalize, find highly variable genes, run PCA and UMAP, cluster, and annotate clusters using known marker genes. Compare the UMAP visualization before and after batch correction if multiple samples are involved.

Common Misconceptions

scRNA-seq does not capture all transcripts in a cell; current methods detect only 10-30% of expressed genes per cell (dropout), requiring specialized statistical approaches.
Clusters on a UMAP plot do not necessarily represent discrete cell types — they can reflect continuous processes like differentiation gradients.

Explainer

Bulk RNA-seq measures the average gene expression across millions of cells — like blending a fruit salad and analyzing the smoothie's composition. You can tell there are strawberries and bananas, but you cannot tell which pieces are next to which. Single-cell RNA-seq sequences each cell individually, preserving the identity and heterogeneity that bulk methods erase. This resolution has transformed our understanding of development, immune responses, cancer, and tissue organization.

The dominant platform, 10x Genomics Chromium, uses microfluidics to encapsulate individual cells in oil droplets, each containing a gel bead coated with barcoded oligonucleotides. Inside each droplet, the cell is lysed, its mRNA captured on the bead via poly-T sequences, and each transcript tagged with a cell-specific barcode and a unique molecular identifier (UMI). After reverse transcription and amplification, the barcoded cDNA from thousands of cells is pooled and sequenced together. Computational demultiplexing uses the barcodes to assign each read back to its cell of origin, and UMI counting eliminates PCR amplification bias. A typical experiment profiles 5,000-20,000 cells.

The analysis workflow begins with quality control: removing cells with too few genes detected (empty droplets or dead cells), too many genes (possible doublets — two cells in one droplet), or high mitochondrial gene percentages (indicator of cell stress or lysis). After normalization, the key step is selecting highly variable genes (HVGs) — genes whose expression varies across cells more than expected from noise. PCA on HVGs reduces the data from ~20,000 dimensions to 20-50 principal components that capture the major axes of biological variation. UMAP or t-SNE then projects these components into 2D for visualization, and graph-based clustering algorithms (Louvain, Leiden) identify groups of transcriptionally similar cells.

Cell type annotation — assigning biological identities to clusters — is both the goal and the bottleneck. Automated methods compare cluster expression profiles to reference databases (CellTypist, SingleR), but manual annotation using known marker genes remains the gold standard for novel tissues or species. Downstream analyses include differential expression between clusters, trajectory inference (ordering cells along developmental paths using tools like Monocle or scVelo), RNA velocity (predicting future cell states from spliced/unspliced transcript ratios), and integration of multiple datasets to build comprehensive cell atlases. The Human Cell Atlas project aims to map every cell type in the human body, using scRNA-seq as its primary technology.

Practice Questions 3 questions