Questions: Cepstral Analysis and Homomorphic Filtering
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Speech is modeled as the convolution of a glottal source with a vocal tract filter. Cepstral analysis applies a logarithm to the magnitude spectrum as its key step. The primary purpose of this logarithm is:
ATo compress the dynamic range so that weak spectral peaks become visible alongside strong ones
BTo convert the multiplicative combination of source and filter into an additive one, enabling linear separation
CTo normalize the spectrum so that all magnitude values fall between 0 and 1
DTo apply an implicit windowing operation that removes time-aliasing artifacts in the spectral domain
The fundamental problem is that speech x(t) = e(t)*h(t) (convolution of source and vocal tract), which in the frequency domain is X(f) = E(f)·H(f) — multiplication. Multiplication cannot be undone by linear filtering: you cannot separate two multiplied functions pointwise without knowing one of them. But log|X(f)| = log|E(f)| + log|H(f)| converts the product to a sum, and additive components can be separated if they occupy different regions of a transformed domain. The log is not primarily about dynamic range (option A), though that is a side effect.
Question 2 Multiple Choice
After computing the cepstrum of a speech signal, a low-quefrency lifter (window retaining only small quefrency values) is applied before inverting back to the frequency domain. The result represents:
AThe pitch period of the vocal cords, extracted directly from the cepstral peak location
BThe smooth spectral envelope corresponding to the vocal tract filter response
CA denoised version of the original speech waveform with the harmonic structure preserved
DThe fine harmonic structure of the glottal source, with the spectral envelope removed
The vocal tract filter H(f) has a smoothly varying spectral envelope (broad formant peaks spaced roughly 1 kHz apart). In the cepstrum, this slow frequency variation maps to small quefrency values. The glottal source E(f), with fine harmonic lines spaced at F₀ (100–300 Hz), maps to large quefrency values (near the quefrency corresponding to 1/F₀). A low-quefrency lifter isolates the slow-varying component — the vocal tract envelope — discarding the fast-varying harmonics. Option D is the result of a *high*-quefrency lifter; option A describes reading the peak location (for pitch detection), not applying the lifter.
Question 3 True / False
The cepstrum separates the vocal tract filter from the glottal source because the two components vary at different rates in the frequency domain — the envelope varies slowly while the harmonic structure varies rapidly.
TTrue
FFalse
Answer: True
This is the fundamental insight that makes cepstral analysis work. After taking the log of the magnitude spectrum, the two additive components (log vocal tract envelope + log glottal harmonics) differ in their 'frequency' within the log-spectrum: the envelope has slow, broad undulations (~formant spacing of ~1 kHz), while the harmonics repeat at a fast rate equal to F₀ (~100–300 Hz). Taking the inverse Fourier transform of the log-spectrum maps these different rates to different quefrency values, enabling spatial separation by liftering.
Question 4 True / False
The cepstrum is most directly useful for measuring the energy of a signal at specific frequencies, since it is defined as the inverse Fourier transform of the signal's power spectrum.
TTrue
FFalse
Answer: False
The cepstrum is defined as the inverse Fourier transform of the *log-magnitude spectrum*, not the power spectrum: c[n] = IFFT{log|FFT{x[n]}|}. Its primary use is to deconvolve multiplicatively combined components — specifically to separate signals that were convolved (and thus multiplied in the frequency domain) by exploiting the log transformation. Measuring energy at specific frequencies is the role of the spectrum itself, not the cepstrum. The cepstrum is a representation of the spectrum's structure, not of the signal's energy distribution.
Question 5 Short Answer
Explain why taking the logarithm of the magnitude spectrum is the key step that makes cepstral separation of the glottal source from the vocal tract filter possible.
Think about your answer, then reveal below.
Model answer: Speech is the convolution of source and vocal tract: in the frequency domain, X(f) = E(f)·H(f) — a product. You cannot separate two multiplied functions using linear operations alone. Taking the logarithm converts the product to a sum: log|X(f)| = log|E(f)| + log|H(f)|. The two additive components now vary at different rates in the frequency domain (the vocal tract envelope varies slowly; the harmonic structure varies rapidly), so an inverse Fourier transform places them at different quefrency values where a simple windowing operation (liftering) separates them.
Without the log step, source and filter are entangled multiplicatively and no linear filter can disentangle them — you would need to know one to find the other. The log is the homomorphic transformation that makes a nonlinear separation problem solvable with linear tools. This is the general principle of homomorphic filtering: find a domain transform that converts the combination law (here: convolution → multiplication → addition via log) to addition, perform linear operations there, then invert. The cepstrum is the specific realization of this principle for convolutionally mixed signals.