Parametric models represent signals as outputs of linear systems driven by white noise. Autoregressive (AR) models use feedback (poles only); moving-average (MA) models use feedforward (zeros only); ARMA uses both. These models are more parsimonious than non-parametric methods for sufficiently regular signals, enabling spectral estimation with fewer parameters. Model order selection and parameter estimation are critical for accuracy.
Generate autoregressive signal using known AR coefficients. Estimate AR model order and coefficients from the data using Yule-Walker method. Verify estimated parameters match generation parameters.
Your prerequisite — the autocorrelation function — tells you how similar a signal is to a shifted version of itself. A slowly decaying autocorrelation indicates a signal that changes slowly and has narrow, peaked spectral features; a rapidly decaying autocorrelation indicates a broadband signal that changes quickly. Parametric signal models exploit this structure: instead of describing the spectrum nonparametrically (by computing the DFT of a finite data record), you hypothesize that the signal was generated by passing white noise through a linear filter, and you estimate that filter's parameters from data. If the model is a good fit, you need far fewer numbers to describe the signal than a full nonparametric spectrum requires — and you can achieve much higher frequency resolution from a short data record.
The three model families differ in their assumed filter structure. An autoregressive (AR) model uses only feedback — the current sample x[n] is a weighted sum of p past samples plus white noise e[n]: x[n] = a₁x[n−1] + ... + aₚx[n−p] + e[n]. In z-transform terms, the transfer function has only poles: H(z) = 1/(A(z)). An all-poles model can represent sharp spectral peaks (resonances) very efficiently — an AR(2) model captures a single narrow resonance with just two parameters. This makes AR models natural for speech (strong formant resonances), EEG (alpha and beta band oscillations), and vibration monitoring (rotating machinery with dominant shaft frequencies). A moving-average (MA) model uses only feedforward — the current sample depends on a weighted sum of past noise inputs. Its transfer function has only zeros and is better at representing spectral nulls. An ARMA model uses both poles and zeros, giving a compact rational representation of signals with both peaks and notches — more flexible than either AR or MA alone, and the right choice when the signal comes from a physical system with both resonances and anti-resonances.
The practical power of AR models specifically comes from the Yule-Walker equations: R[k] = a₁R[k−1] + ... + aₚR[k−p] for k = 1, 2, ..., p, where R[k] is the autocorrelation at lag k. This is a linear system in the AR coefficients, solvable directly from the autocorrelation sequence — no iterative optimization required. The Levinson-Durbin algorithm solves this system efficiently in O(p²) operations and, as a byproduct, computes the reflection coefficients that reveal whether the model is stable (all poles inside the unit circle) at every step. Once the AR coefficients are estimated, the parametric spectral estimate is P̂(ω) = σ²_e / |A(e^{jω})|², where σ²_e is the white noise variance. This spectrum can resolve two closely spaced sinusoids that would appear as a single broadened peak in a periodogram of the same data length — the key advantage of parametric methods.
Model order selection is the central challenge. Too low an order: the model underfits and the spectral estimate is oversmoothed, blurring together nearby frequency components into a single broad hump. Too high an order: the model overfits, fitting noise structure as if it were signal, and spurious spectral peaks appear at meaningless frequencies. The Akaike Information Criterion AIC = N ln(σ²_e) + 2p and the Minimum Description Length MDL = N ln(σ²_e) + p ln(N) both penalize increasing order — MDL more strongly so. The optimal order minimizes the criterion: it is the point where the reduction in fitting error from adding another parameter no longer compensates for the complexity cost. This principle — that the right model uses the fewest parameters that adequately capture the structure — carries directly into every domain of statistical learning, from regression to neural network architecture selection.
No topics depend on this one yet.