A researcher studying unemployment spells drops all observations where workers were still unemployed at the survey end date, keeping only workers who found jobs during the study period. What is the most likely effect on the estimated average unemployment duration?
ANo effect — the dropped observations contain no information about how long spells last
BOverestimation — the retained completed spells are systematically longer than those still ongoing
CUnderestimation — only shorter spells are likely to complete within the study window, so the retained sample is biased toward quicker exits
DRandom noise — censoring is a random process that introduces symmetric error
Dropping censored observations creates severe selection bias. Workers whose spells completed during the study window are disproportionately short-spell workers — longer spells are more likely to be ongoing (censored) at survey's end. Retaining only completers selects for the fastest exits, systematically underestimating average duration. This is precisely why survival analysis handles censoring by including these observations in the likelihood with the information they do provide: the spell lasted at least this long.
Question 2 Multiple Choice
A Cox proportional hazards model yields β = 0.5 for a binary variable indicating college education (1 = college graduate). What is the correct interpretation?
ACollege graduates have unemployment spells that are 50% shorter on average
BCollege graduates exit unemployment at a rate exp(0.5) ≈ 1.65 times higher than non-graduates at every point in time
CThe probability of being employed after 10 weeks is 50% higher for college graduates
DThe baseline hazard h₀(t) is shifted upward by 0.5 for college graduates
In a proportional hazards model, h(t|X) = h₀(t)·exp(Xβ). A coefficient of 0.5 means the hazard for college graduates is exp(0.5) ≈ 1.65 times the hazard for non-graduates — and this multiplicative factor is constant across time (the 'proportional' assumption). Option A confuses the hazard ratio with a duration ratio. Option C confuses the hazard with a probability. Option D misunderstands the model: the baseline hazard h₀(t) is the hazard for the reference group; covariates multiply it, they don't shift it additively.
Question 3 True / False
A censored observation — where the event has not occurred by the end of the study — contains no useful information about duration and can be safely dropped from a survival analysis.
TTrue
FFalse
Answer: False
Censored observations carry genuine information: we know the event had not occurred by the censoring time, meaning the true duration is at least that long. The survival likelihood correctly incorporates this by including the survival function S(t_c) for a censored observation at time t_c — the probability of surviving past the censoring time. Dropping censored observations ignores this information and, crucially, creates selection bias: longer spells are more likely to be censored, so dropping them systematically underrepresents long durations.
Question 4 True / False
The Cox proportional hazards model requires the researcher to specify the shape of the baseline hazard h₀(t) in order to estimate the effects of covariates.
TTrue
FFalse
Answer: False
This is precisely what makes the Cox model so widely used. Its genius is the partial likelihood: covariate coefficients β can be estimated using only the ordering of event times (who fails when, relative to others at risk), without ever specifying h₀(t). The baseline hazard is left entirely unspecified — it's 'estimated' nonparametrically and typically not of interest. This semiparametric flexibility is why the Cox model dominates applied work; parametric models (Weibull, exponential) are more efficient when the correct hazard shape is known but sensitive to misspecification.
Question 5 Short Answer
Why is it problematic to simply remove censored observations from a survival analysis, and how does the likelihood function address this problem?
Think about your answer, then reveal below.
Model answer: Removing censored observations causes selection bias: longer spells are more likely to still be ongoing (censored) at study's end, so removing them over-represents short durations and underestimates average duration. The survival likelihood solves this by including censored observations with their actual contribution: for a censored observation at time t_c, the likelihood contribution is S(t_c) — the probability of surviving at least that long. This uses the information that the event had not occurred by t_c without pretending to know when it did occur.
The key distinction is between 'no information' and 'right-censored information.' A censored observation at week 20 tells you the duration exceeded 20 weeks — that's real, usable information. The survival likelihood is constructed as a product over all observations: event-observations contribute the hazard (the density at failure time), censored observations contribute the survival function (the probability of no event by the censoring time). Both types of contribution correctly update our estimate of the duration distribution.