Questions: Sparse Signal Recovery and Basis Pursuit
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A signal x is sparse (K nonzero coefficients out of N) and observed as y = Ax + noise. To recover x via basis pursuit, you solve minimize ||x||₁ subject to ||Ax − y||₂ ≤ ε. Why is minimizing ||x||₁ (sum of absolute values) better than minimizing ||x||₀ (counting nonzeros) directly?
Aℓ₀ minimization is nonconvex and NP-hard; ℓ₁ is convex and solvable in polynomial time. Under coherence conditions, they give the same answer
Bℓ₁ is always better than ℓ₀; ℓ₀ is never a good sparsity measure
Cℓ₁ requires fewer measurements than ℓ₀, violating the Nyquist theorem
Dℓ₀ and ℓ₁ are equivalent; the choice is purely computational
ℓ₀ counts nonzeros directly (the true sparsity) but is a nonconvex function: the combinatorial search over all possible K-sparse supports is NP-hard. ℓ₁ (sum of absolute values) is convex: you minimize it via linear programming, polynomial-time solvable. Remarkably, for many realistic dictionaries and noise levels, the ℓ₁ solution equals the ℓ₀ solution — the convex relaxation is exact. This is not always guaranteed, but coherence and restricted isometry conditions ensure it in many cases. The success of basis pursuit hinges on this: we solve an intractable ℓ₀ problem via a tractable ℓ₁ proxy.
Question 2 Multiple Choice
Iterative Hard Thresholding (IHT) recovers a sparse signal by repeating: update x ← shrink(x + A^T(y − Ax), K), where shrink(·, K) keeps the K largest magnitude coefficients and zeros the rest. Why does this work, and why is it faster than basis pursuit?
AIHT directly enforces the cardinality constraint ||x||₀ = K, guaranteeing sparsity; it is faster because each iteration is simple thresholding, not solving a linear program
BIHT is a gradient descent variant that projects onto the sparse set at each step, trading guaranteed convergence for computational speed
CIHT only works for noise-free signals; in the presence of noise, basis pursuit is required
DIHT and basis pursuit are mathematically equivalent; the difference is purely implementation
IHT is a projected gradient descent algorithm: you take a gradient step x ← x + A^T(y − Ax) (moving toward the data fit), then project back to the set of K-sparse vectors (keeping only the K largest magnitudes). This alternation between gradient steps and hard projection is faster than solving a linear program, especially for large-scale problems. However, IHT is not convex (the hard projection is nonconvex) — it only guarantees convergence to a local minimum if the RIP constant is small. Basis pursuit has global optimality guarantees (it's a convex problem), but solves a larger LP. Modern sparse recovery often uses IHT or similar greedy methods for speed, then validates against a convex solution.
Question 3 True / False
Coherence of a dictionary A is the maximum absolute inner product between any two distinct atoms: μ(A) = max_{i≠j} |⟨a_i, a_j⟩|. High coherence (μ ≈ 1) makes sparse recovery harder; low coherence (μ ≈ 0) makes it easier. Why?
TTrue
FFalse
Answer: True
Low coherence means the dictionary atoms are nearly orthogonal — each atom has a unique 'signature' in measurement space. If two atoms are nearly collinear (high coherence), measurements cannot distinguish which one is present in the true sparse signal; two different sparse solutions (one using atom i, the other using atom j) produce nearly identical measurements. Basis pursuit or matching pursuit cannot disambiguate and may recover the wrong atom. Low coherence ensures that atoms are distinguishable, allowing recovery algorithms to identify which atoms are active. This is why incoherent dictionaries (random dictionaries, orthogonal dictionaries) are preferred.
Question 4 True / False
In a noisy setting, basis pursuit (minimize ||x||₁ subject to ||Ax − y||₂ ≤ ε) can sometimes recover a non-sparse solution that fits the data. What prevents this, and when does basis pursuit fail?
TTrue
FFalse
Answer: True
Basis pursuit prefers sparse solutions: minimizing ||x||₁ penalizes spreading energy across many atoms. Even with noise, a sparse solution with small ℓ₁ norm will be preferred over a dense solution with the same data fit. However, if the true signal is genuinely non-sparse, or if the dictionary is highly coherent, basis pursuit may fail to recover it. Failure modes: (1) the noise level ε is too large (more uncertainty than signal content); (2) the dictionary is incoherent or the signal structure is not well-represented; (3) the true signal is not sparse. In such cases, other priors (structured sparsity, group sparsity, low-rank structure) may be needed.
Question 5 Short Answer
Explain the relationship between the coherence μ(A) and the minimum sparsity level K_min such that basis pursuit recovers K-sparse signals. Why does reducing coherence allow recovery of sparser signals?
Think about your answer, then reveal below.
Model answer: The theoretical guarantee is: if K < 1/(2μ(A)), basis pursuit recovers any K-sparse signal from noiseless measurements. Low coherence μ ≈ 0 allows K to be close to 1 (very sparse signals are recoverable), while high coherence μ ≈ 1 requires K < 1/2 (only half-sparse signals are recoverable, which is weak). The intuition: coherence measures dictionary redundancy. If atoms are similar (high μ), the algorithm cannot distinguish which atoms are active, limiting recovery to cases where sparsity is very low (only a few atoms possible). If atoms are dissimilar (low μ), even moderately sparse signals are identifiable. Modern dictionaries are designed with low coherence: random projections (random Gaussian entries), Fourier + wavelets (complementary time-frequency structures), or learned dictionaries optimized for coherence.
This quantifies the trade-off: larger, more redundant dictionaries (higher coherence) have better signal representation capability (can express many signals) but worse identifiability (hard to determine which atoms were used). Sparser recovery (smaller K) requires more orthogonal dictionaries. In practice, you either accept slightly higher coherence (dictionaries represent signals better but need higher sparsity for guaranteed recovery), or use learned dictionaries optimized jointly for representation and coherence.