5 questions to test your understanding
A data scientist randomly splits two years of hourly sales data 80/20 into train and test sets, trains an LSTM, and reports excellent test accuracy. What is the fundamental problem?
A naive baseline that predicts the last observed value ('predict t+1 = t') outperforms a carefully tuned LSTM on a stationary demand series. What does this most likely indicate?
Computing normalization statistics (mean and standard deviation) over the entire dataset — including the test period — before splitting is a valid preprocessing step for time series forecasting.
Walk-forward (rolling-origin) validation is more appropriate than k-fold cross-validation for evaluating time series forecasting models.
Explain why non-stationarity (trend and seasonality) must be diagnosed before fitting a classical forecasting model, and what happens if it is ignored.