System Identification Using Least-Squares Methods

Research Depth 93 in the knowledge graph I know this Set as goal
system-identification least-squares parameter-estimation

Core Idea

System identification estimates unknown parameters (filter coefficients, plant poles) from input-output measurements. Least-squares minimizes prediction error ‖y – H·θ‖², with closed-form solution θ = (H^T·H)^(–1)·H^T·y. Recursive algorithms update estimates as new data arrives. Regularization prevents overfitting to noisy data by penalizing large parameter magnitudes.

Explainer

The fundamental problem system identification solves is this: you have a black box, you can feed it inputs and record outputs, and you want to discover the rules governing its behavior. From your transfer-function prerequisite, you know that a linear system is characterized by its poles and zeros — but that theory tells you the *form* of the model, not its *parameters*. System identification uses data to fill in the numbers. The least-squares framework turns this into a geometry problem: you are looking for the parameter vector θ whose predictions are as close as possible (in the squared-error sense) to the actual measurements.

The construction works by building a regressor matrix H, where each row captures the system's observable history at one time step. For an AR (autoregressive) model, row k contains past output values; for an ARX model it also contains past inputs. The measurement vector y stacks the corresponding current outputs. The system is now just a linear equation: y ≈ H·θ. This is typically overdetermined — more equations than unknowns, because you have many data points but few parameters — so there is no exact solution and you minimize the residual. The normal equations H^T·H·θ = H^T·y give the optimal θ directly. The invertibility of H^T·H is your prerequisite's condition for a well-posed system: if the input is not persistently exciting (doesn't probe all system modes), H^T·H becomes singular and the identification fails.

Recursive least squares (RLS) extends this to the case where data arrives sequentially. Rather than re-solving the normal equations with each new point, RLS maintains a running estimate and updates it efficiently. The update equation is a form of the Kalman filter: the new estimate equals the old estimate plus a gain times the innovation (the difference between what you predicted and what you actually observed). If you studied the LMS adaptive filter, you saw a stochastic approximation to gradient descent; RLS is the exact (non-stochastic) counterpart — it converges faster but requires more computation per step.

Regularization becomes essential when the model is complex relative to the data, or when H^T·H is nearly singular. Ridge regression (Tikhonov regularization) adds a penalty λ‖θ‖² to the cost, replacing the normal equations with (H^T·H + λI)·θ = H^T·y. The matrix H^T·H + λI is always invertible for λ > 0, providing numerical stability. The tradeoff is bias versus variance: larger λ shrinks parameter estimates toward zero (adding bias) but reduces sensitivity to noise (reducing variance). The optimal λ is typically chosen by cross-validation or by physical knowledge about the system's expected parameter magnitudes. This bias-variance tradeoff is the central tension in all of statistical learning, and system identification is where many engineers first encounter it concretely.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionFundamental Theorem of Calculus Part 1Fundamental Theorem of Calculus Part 2U-SubstitutionIntegration by PartsSeparable Differential EquationsIntegrating Factor Method for First-Order Linear ODEsFirst-Order Linear Ordinary Differential EquationsSecond-Order Linear Homogeneous Differential EquationsCharacteristic Equation Method for Linear ODEsComplex Roots and Oscillatory SolutionsSpring-Mass Systems and Mechanical VibrationsResonance and Damping in Forced VibrationsRLC Circuit Applications of Differential EquationsIntroduction to Differential EquationsLaplace Transform: Fundamentals and PropertiesZ-Transform: Fundamentals for Discrete-Time SignalsDiscrete-Time Fourier Transform (DTFT)Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT) AlgorithmsWindow Functions and Spectral LeakageDigital Spectral Analysis: Nonparametric MethodsWiener Filter for Optimal EstimationAdaptive Filtering with LMS AlgorithmRecursive Least-Squares Adaptive FilteringSystem Identification Using Least-Squares Methods

Longest path: 94 steps · 410 total prerequisite topics

Prerequisites (3)

Leads To (0)

No topics depend on this one yet.