Survival analysis studies time-to-event outcomes — time to death, disease recurrence, or hospital readmission. The defining challenge is censoring: some subjects have not yet experienced the event when observation ends, so their true event times are unknown but known to exceed the censored time. The Kaplan-Meier estimator is a nonparametric method that estimates the survival function S(t) — the probability of surviving beyond time t — by computing the cumulative product of conditional survival probabilities at each observed event time. It produces the characteristic step-function survival curve that declines at each event, properly accounting for censored observations by removing them from the risk set without treating them as events.
Standard statistical methods assume you observe the outcome for every subject, but time-to-event data violates this assumption. In a 5-year clinical trial, some patients die (the event of interest), some are still alive when the trial ends (administratively censored), and some are lost to follow-up before the trial ends (right-censored). You know these censored patients survived at least until they were last observed, but you do not know their true event time. Ignoring censored observations — either by excluding them or by treating them as events — introduces serious bias. Survival analysis methods exist precisely to handle this incomplete information.
The Kaplan-Meier estimator constructs the survival function S(t) — the probability of surviving beyond time t — without assuming any particular distributional form. At each observed event time, it computes the conditional probability of surviving past that time given survival up to it: (number at risk - number of events) / number at risk. The cumulative survival probability is the product of all these conditional probabilities up to time t. The resulting step function starts at S(0) = 1 and decreases at each event time. Censored observations reduce the risk set (the denominator) at the censoring time but do not cause a step down — they contribute information about survival up to the moment they were last observed.
Reading a Kaplan-Meier curve is a core clinical skill. The median survival time is the time at which the curve crosses 0.50 — the time by which half the subjects have experienced the event. If the curve never reaches 0.50, the median is undefined (more than half the subjects survived the entire observation period). Confidence intervals for the survival function at any time point can be computed using Greenwood's formula, and these widen over time as the number at risk decreases. Tick marks on the curve indicate censoring events, showing where observations were lost. A curve with heavy censoring late in follow-up has wide uncertainty, even if it appears to plateau.
The critical assumption underlying the KM estimator is non-informative censoring: the reason a subject was censored must be unrelated to their prognosis. If patients who are getting sicker preferentially drop out (informative censoring), the remaining subjects are healthier than the full cohort, and the survival curve will be optimistically biased. This assumption cannot be tested from the data alone — it requires understanding why subjects were censored. The Kaplan-Meier estimator describes the survival experience of a single group; to compare survival between groups, you need the log-rank test, which is the next topic in this sequence.