Time Series Forecasting

Graduate Depth 79 in the knowledge graph I know this Set as goal
time-series forecasting sequence

Core Idea

Time series forecasting predicts future values based on historical patterns in sequentially-dependent data (stocks, weather, demand). RNNs, LSTMs, and Transformers capture temporal dependencies. Challenges include trend, seasonality, external variables, and non-stationarity. Evaluation requires careful temporal splitting to prevent data leakage.

Explainer

Time series data is fundamentally different from the tabular datasets you have worked with in supervised learning. Each observation is indexed by time, and the order matters — shuffling the rows destroys the information. This temporal structure creates both opportunities and constraints. The opportunities come from autocorrelation: today's temperature, stock price, or sales volume is heavily influenced by yesterday's values, and by the values from a week or year ago. The constraints come from the fact that you cannot randomly split the data into train and test sets. You must always train on the past and evaluate on the future, because anything else simulates a form of time travel that your model will not have in production.

From your work with recurrent neural networks, you know that RNNs process sequences step by step, maintaining a hidden state that accumulates information from past inputs. This makes them natural candidates for time series: feed in observations one timestep at a time, and the hidden state captures the relevant history for predicting the next value. LSTMs improve on vanilla RNNs by using gating mechanisms to selectively remember and forget, which is critical for time series with both short-term fluctuations and long-term patterns. A retail sales series, for example, has daily noise, weekly cycles (weekend spikes), and annual seasonality (holiday surges) — the model must simultaneously track patterns at all these scales.

The components of a time series — trend, seasonality, and residual — must be understood before modeling. Trend is the long-term direction (rising, falling, flat). Seasonality is the repeating pattern at fixed intervals (daily, weekly, annual). The residual is what remains after removing trend and seasonality. Classical methods like ARIMA model the residual as a linear function of past values and past errors, requiring the series to be stationary (constant mean and variance over time). Neural approaches are more flexible — they can learn non-linear relationships and handle non-stationarity more gracefully — but they need substantially more data and are harder to interpret. Moving averages and exponential smoothing, which you have already seen, represent the simplest end of this spectrum: weighted averages of past values where the weights decay over time.

Evaluation in time series forecasting requires particular discipline. Walk-forward validation (also called rolling-origin evaluation) is the gold standard: train on data up to time t, forecast from t+1 to t+h, then advance the training window and repeat. This simulates how the model will actually be used. Common pitfalls include using future information in feature engineering (e.g., normalizing with statistics computed over the entire dataset including the test period), and failing to account for the forecast horizon — a model that predicts one step ahead well may degrade rapidly at longer horizons. Metrics like MAE and RMSE measure absolute error, while MAPE normalizes by actual values but breaks down near zero. Comparing against naive baselines (predicting the last observed value, or the value from the same season last year) is essential — many sophisticated models fail to beat these simple benchmarks on well-behaved series.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsOne-Sided LimitsContinuity DefinitionLimit Definition of the DerivativePower RuleConstant Multiple and Sum/Difference RulesProduct RuleChain RuleHigher-Order DerivativesConcavity and Inflection PointsSecond Derivative TestCurve SketchingOptimization ProblemsCritical Points of Multivariable FunctionsCritical Points and Classification of ExtremaSecond Partial Test for Local Extrema (Hessian)The Hessian Matrix and Second Derivative TestUnconstrained Optimization: Finding ExtremaOptimization in Multiple VariablesIntroduction to Reinforcement LearningPolicy Gradient MethodsActor-Critic MethodsTemporal Difference LearningQ-Learning AlgorithmDeep Q-Networks (DQN)Recurrent Neural NetworksTime Series Forecasting

Longest path: 80 steps · 563 total prerequisite topics

Prerequisites (5)

Leads To (0)

No topics depend on this one yet.