Questions: System Identification Using Least-Squares Methods
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
An engineer applies least-squares system identification to a linear system, using a step-function input. The resulting H^T·H matrix is singular, making the normal equations unsolvable. What is the most likely cause?
AThe measurement noise is too high, corrupting H^T·H
BA step input is not persistently exciting — it only excites the DC (zero frequency) component and fails to probe the system's dynamic modes, making some columns of H linearly dependent
CThe model order is too low; adding more parameters would make H^T·H invertible
DThe sampling rate is too fast, causing aliasing that corrupts the regressor matrix
Persistent excitation is the key condition: the input must contain enough frequency content to probe all the system's modes. A step function has energy only at zero frequency (DC) — it tells you the system's steady-state gain but nothing about its dynamics (poles, resonances). This makes columns of the regressor matrix H linearly dependent, causing H^T·H to become singular (or near-singular). A system with n poles requires an input with at least n distinct frequency components to be fully identifiable. This is not a noise problem (option A) or model complexity problem (option C) — it's a fundamental identifiability condition related to input design.
Question 2 Multiple Choice
You increase the regularization parameter λ in ridge regression from 0.01 to 10 for a system identification problem. What effect does this have?
AThe estimates become unbiased and the variance decreases — both improve simultaneously
BThe estimates become more biased (shrunk toward zero) but less sensitive to noise — a bias-variance tradeoff
CThe estimates become less biased and more sensitive to noise — trading variance for accuracy
DRegularization has no effect on bias; it only improves the numerical conditioning of H^T·H
Regularization introduces a deliberate bias by pulling parameter estimates toward zero (the prior that parameters are small). In exchange, it reduces variance — the estimates are less sensitive to noise fluctuations in the data. Larger λ = more bias, less variance. At λ = 0 you have ordinary least squares (minimum bias, maximum variance). The optimal λ balances these two effects to minimize total prediction error. Option A is the key misconception: bias and variance are in fundamental tension — you cannot reduce both simultaneously by changing λ. Option D understates regularization's effect: it does both improve numerical conditioning AND introduce bias.
Question 3 True / False
Least-squares system identification formulates the parameter estimation problem as an overdetermined linear system y ≈ Hθ, which typically has more equations than unknowns.
TTrue
FFalse
Answer: True
This is exactly right. Each row of H corresponds to one time step of observed data (typically past inputs and outputs). A system with, say, 4 parameters to identify might have 500 rows of data — 500 equations for 4 unknowns. This overdetermined system has no exact solution (noise means y ≠ Hθ for any θ exactly), so least-squares finds the θ that minimizes the sum of squared residuals. The overdetermination is desirable: more data means a better-conditioned estimate. Underdetermined systems (fewer equations than unknowns) are far harder and not uniquely solvable without additional constraints.
Question 4 True / False
If the system being identified is nonlinear, least-squares estimation will fail to produce any useful model.
TTrue
FFalse
Answer: False
Least-squares identifies the best *linear* approximation to the system — which is often useful even for mildly nonlinear systems in a neighborhood around an operating point. Additionally, the least-squares framework extends to nonlinear-in-parameters models through basis expansion: the regressor matrix H can contain nonlinear functions of the inputs and past outputs (e.g., H_k = [y_{k-1}, y_{k-1}², u_{k-1}, u_{k-1}²]), and the identification problem remains linear in the unknown coefficients. Even for strongly nonlinear systems, a linear ARX or ARMAX model identified by least squares may provide a useful control-design approximation. 'Will fail' overstates the limitation considerably.
Question 5 Short Answer
Why must the input signal be 'persistently exciting' for least-squares system identification to succeed, and what happens if this condition is violated?
Think about your answer, then reveal below.
Model answer: A system with n parameters (e.g., n poles and zeros) requires that the input excites all n independent 'directions' in the frequency domain — at least n distinct frequency components. Persistent excitation ensures the regressor matrix H has full column rank, making H^T·H invertible and the normal equations uniquely solvable. If the input lacks frequency content at some modes, H^T·H becomes singular or near-singular: the identification problem has infinitely many solutions (different parameter vectors predict the data equally well), so the algorithm cannot distinguish between them. In practice, a near-singular H^T·H produces numerically unstable estimates that are highly sensitive to small noise perturbations.
The geometric intuition: H^T·H being invertible means the input data 'spans' the parameter space — you can see the effect of every parameter independently. If two parameters always change together in the data (because the input never separates their effects), you cannot determine them individually. Persistent excitation is the input-design condition that guarantees this geometric spanning. Practical inputs like PRBS (pseudorandom binary sequences) or sinusoidal sweeps are explicitly designed to be persistently exciting across the relevant frequency band.