You are analyzing a speech signal and need to detect very brief consonant bursts (lasting a few milliseconds) with precise timing. Which STFT window choice best serves this goal?
AA very wide window — more data points give sharper frequency resolution, revealing the consonant's spectral signature
BA very narrow window — short duration gives precise time localization, capturing when the burst occurs
CWindow width doesn't matter — STFT always provides equally sharp time and frequency resolution
DA medium window — STFT resolution is independent of window width, so any choice works
The STFT time-frequency uncertainty principle states that you cannot simultaneously achieve arbitrarily sharp time and frequency resolution. A narrow window gives good time localization — you know precisely when a frequency appears — but the short duration means the Fourier transform sees very few oscillations, smearing frequency content across a wide band. For detecting a brief event with precise timing, a narrow window is the correct choice, accepting the tradeoff of poorer frequency resolution. The opposite choice (wide window) would give precise frequency content but blur the timing of the burst.
Question 2 Multiple Choice
Compared to the standard (global) Fourier transform, what information does the STFT provide that the standard transform does not?
ASTFT provides better frequency resolution by using longer analysis windows
BSTFT reveals when in time different frequency components appear, not just which frequencies are present overall
CSTFT removes noise more effectively because the window function suppresses spectral leakage
DSTFT provides phase information, which the standard Fourier transform discards
The standard Fourier transform integrates over the entire signal, producing a spectrum that shows which frequencies are present but gives no information about when they occur. For a piece of music, it shows every note ever played but not their sequence. The STFT slides a localized window across the signal and computes a Fourier transform of each windowed segment, creating a two-dimensional time-frequency map — the spectrogram. This reveals how the frequency content of the signal evolves over time, which is essential for speech, music, and many biomedical signals.
Question 3 True / False
Using a wider window in the STFT gives better frequency resolution but at the cost of poorer time localization.
TTrue
FFalse
Answer: True
This is the time-frequency tradeoff at the heart of the STFT. A wide window contains many oscillation cycles, so the Fourier transform can precisely determine the frequency of each component — frequencies appear as sharp peaks. But a wide window spans a long time interval, so events that happen at different times within that window are blurred together in the time axis. The uncertainty principle Δt · Δω ≥ 1/2 formalizes this: you cannot shrink both simultaneously. A Gaussian window achieves the minimum uncertainty product, but the tradeoff itself cannot be avoided.
Question 4 True / False
By carefully choosing the right window function (e.g., a Gaussian or Hann window), you can achieve arbitrarily sharp resolution in both time and frequency in an STFT.
TTrue
FFalse
Answer: False
No window function can overcome the time-frequency uncertainty principle. Different windows make different tradeoffs — Hann windows reduce spectral leakage, Gaussian windows achieve minimum time-bandwidth product — but none can provide arbitrarily sharp resolution in both dimensions simultaneously. The Gaussian window achieves the theoretical bound Δt · Δω = 1/2, but this is the minimum possible product, not zero. Window choice determines the shape and sidelobe structure of the resolution cells, but the fundamental constraint Δt · Δω ≥ 1/2 is inescapable for any linear time-frequency representation.
Question 5 Short Answer
Explain why the fixed window width of the STFT is a limitation for analyzing signals like speech, and how the wavelet transform addresses this limitation.
Think about your answer, then reveal below.
Model answer: The STFT uses the same window width for all frequencies. This means low-frequency components (which oscillate slowly) and high-frequency components (which oscillate rapidly) are both analyzed with the same time-frequency resolution tradeoff. For speech, this is suboptimal: low-frequency vowel formants change slowly and need good frequency resolution; high-frequency consonant bursts are brief and need good time resolution. The wavelet transform uses a window that automatically scales with frequency — narrow windows at high frequencies for good time resolution, wide windows at low frequencies for good frequency resolution. This provides constant relative (rather than absolute) resolution, matching the analysis to the signal's natural structure.
The wavelet's adaptive window scaling is its key advantage over the STFT. Formally, wavelets are obtained by scaling and translating a single mother wavelet function, so that at high frequencies the analysis window automatically shrinks and at low frequencies it widens. This gives the wavelet transform a 'logarithmic' time-frequency tiling, compared to the STFT's uniform rectangular tiling. For signals like ECG, speech, and music — where low-frequency trends evolve slowly and high-frequency transients are brief — wavelets provide a more efficient and informative representation. Understanding the STFT's fixed-resolution limitation is precisely what motivates the wavelet as a more flexible successor.