An adaptive equalizer is applied to a highly frequency-selective channel where the input autocorrelation matrix has eigenvalues spanning several orders of magnitude. LMS converges very slowly; RLS converges in about N iterations. What is the fundamental reason for RLS's advantage?
ARLS uses a larger step size than LMS, allowing faster gradient descent along every direction
BRLS maintains an inverse correlation matrix P that captures the curvature of the error surface in every direction, enabling Newton-like updates that are optimal across all dimensions simultaneously
CRLS operates in the frequency domain, bypassing the time-domain eigenvalue spread problem entirely
DRLS averages over more past data points, reducing variance and allowing larger effective step sizes
LMS uses a single scalar step size μ — one scalar for all directions in weight space. When the error surface has very different curvatures in different directions (high eigenvalue spread), a single step size cannot be simultaneously optimal everywhere: it must be small enough to avoid divergence in steep directions, which makes it unnecessarily slow in shallow directions. RLS tracks the full inverse correlation matrix P, which encodes the curvature in every direction. The RLS update is a Newton step — it adjusts weights optimally across all dimensions at once, eliminating the eigenvalue-spread sensitivity.
Question 2 Multiple Choice
A RLS filter is tracking a slowly time-varying channel. The forgetting factor is set to λ = 0.97. What is the practical effect of decreasing λ to 0.90?
AThe filter converges more slowly because smaller λ gives less weight to recent data
BThe filter tracks faster but with more noise variance, because older data is down-weighted more aggressively, making estimates more responsive but less stable
CThe filter becomes equivalent to batch least squares, ignoring the time-varying nature of the channel
DThe filter converges faster and more stably because smaller λ improves the conditioning of the inverse correlation matrix
The forgetting factor λ determines how quickly old data is discounted: past errors are weighted by λ^k where k is how many samples ago they occurred. Smaller λ means older data is forgotten more quickly — the filter effectively uses a shorter window of past observations. This makes the filter more responsive to changes in channel statistics (faster tracking) but also more sensitive to noise (higher variance in steady-state estimates). Setting λ = 1 means all past data is equally weighted — no forgetting, no tracking capability for time-varying channels.
Question 3 True / False
RLS converges in approximately N iterations regardless of the eigenvalue spread of the input autocorrelation matrix, because the inverse correlation matrix P allows the algorithm to make optimal updates in every direction simultaneously.
TTrue
FFalse
Answer: True
This is the key advantage of RLS over LMS. Because P encodes the error surface curvature in all directions, RLS effectively normalizes the step in each direction by the local curvature — taking a large step in shallow directions and a small step in steep directions. This is the Newton's method principle applied to adaptive filtering. The convergence time is approximately N iterations (the filter order), independent of eigenvalue spread — a stark contrast to LMS, which can require thousands of iterations for ill-conditioned channels.
Question 4 True / False
Setting the forgetting factor λ = 1 in RLS makes the filter maximally responsive to sudden changes in channel statistics.
TTrue
FFalse
Answer: False
This is the opposite of the truth. λ = 1 means no forgetting — all past errors are weighted equally regardless of how old they are. This gives the filter the longest possible memory: it minimizes the total sum of all past squared errors, which is correct for stationary problems but means it cannot adapt to sudden changes. The filter is 'stuck' tracking the average channel over its entire history. To maximize responsiveness to sudden changes, you would use a small λ (e.g., 0.90–0.95), which aggressively downweights older data.
Question 5 Short Answer
Explain why RLS converges much faster than LMS for adaptive equalization of a highly frequency-selective channel, and what cost is paid for this improved convergence.
Think about your answer, then reveal below.
Model answer: LMS uses a single scalar step size for all directions in weight space. When the input autocorrelation matrix has very unequal eigenvalues — as it does for frequency-selective channels — the error surface has different curvatures in different directions. A single step size must be small enough to avoid divergence in the steepest direction, making it far too small in shallow directions and causing slow overall convergence. RLS maintains the inverse correlation matrix P, which tracks the curvature in every direction. The RLS update scales each direction by its curvature, taking a Newton-like step that is simultaneously optimal in all dimensions — this eliminates the eigenvalue-spread problem, achieving convergence in ~N iterations. The cost is computational: RLS requires O(N²) multiplications per sample (versus LMS's O(N)), plus O(N²) memory for storing P, plus numerical sensitivity that requires periodic reinitialization.
The core trade-off is convergence speed versus computational complexity. RLS buys fast convergence by maintaining a full matrix description of the error surface, but this matrix requires O(N²) work per update and becomes prohibitively expensive for large filter orders.