The log-rank test is a nonparametric hypothesis test that compares the survival distributions of two or more groups. At each observed event time, it compares the number of events in each group to the number expected under the null hypothesis that the survival curves are identical. The test statistic sums these observed-minus-expected differences across all event times and follows an approximate chi-squared distribution under the null. The log-rank test gives equal weight to all time points and is most powerful when the hazard ratio between groups is approximately constant over time (proportional hazards). When survival curves cross, the log-rank test may fail to detect a difference even when the groups clearly differ.
The Kaplan-Meier estimator describes the survival experience of a single group, but the clinical question is usually comparative: is Treatment A better than Treatment B? Simply eyeballing two KM curves is not sufficient because apparent differences may be due to chance, especially with small samples or heavy censoring. The log-rank test provides a formal statistical framework for this comparison.
The test works by examining what happens at every observed event time across the combined sample. At each event time t_i, you know how many subjects are at risk in each group and how many events occurred. Under the null hypothesis that the groups have identical survival distributions, the expected number of events in each group is proportional to its share of the risk set at that moment. If Group A has 40 of 80 subjects at risk when 2 events occur, the expected number of events in Group A is 2 × (40/80) = 1. If both events actually occurred in Group A, the observed-minus-expected contribution at that time point is 2 - 1 = 1, suggesting Group A is doing worse than expected.
The test statistic sums these observed-minus-expected contributions across all event times and standardizes by the variance. Under the null hypothesis, the statistic follows a chi-squared distribution with k-1 degrees of freedom (where k is the number of groups). A large test statistic indicates that the observed event pattern deviates systematically from what equal survival would predict. The p-value then tells you how unlikely such a deviation would be under the null.
The log-rank test has an important limitation: it gives equal weight to all event times, making it most powerful when the hazard ratio (the ratio of instantaneous event rates) is constant over time — the proportional hazards assumption. When survival curves cross — one treatment is better early but the other is better late — the positive and negative contributions cancel, and the log-rank test may return a non-significant result despite a clear qualitative difference. In these situations, alternatives like weighted log-rank tests (which emphasize early or late differences) or tests designed for crossing hazards (e.g., the max-combo test) are more appropriate. The proportional hazards assumption is also foundational for the Cox regression model, which extends this framework to adjust for covariates.