What are MFCCs (Mel-Frequency Cepstral Coefficients) and what audio property do they primarily capture?
AMFCCs measure tempo and rhythm patterns in audio
BMFCCs capture the spectral envelope (timbre) of audio using a perceptually-scaled frequency transformation, making them effective for characterizing instrument and vocal sounds
CMFCCs directly encode pitch and note information for transcription
DMFCCs measure the loudness and dynamic range of audio
MFCCs compress the spectral shape of audio (its timbre) into a compact representation by applying the Mel frequency scale (perceptually-spaced) and a cepstral transformation. This makes them effective for distinguishing instruments, voices, and musical styles without capturing absolute pitch.
Question 2 True / False
True or false: A chromagram preserves the octave information of detected pitches.
TTrue
FFalse
Answer: False
A chromagram folds all detected pitches into 12 pitch classes (C, C#, D... B), ignoring octave. Middle C, C4, and low C all contribute to the same 'C' bin. This makes chromagrams useful for chord and key analysis but removes octave register information.
Question 3 Short Answer
What is Dynamic Time Warping (DTW) used for in MIR?
Think about your answer, then reveal below.
Model answer: DTW finds the optimal alignment between two time series of different lengths or tempos by allowing elastic warping of the time axis. In MIR, it aligns audio recordings of the same piece performed at different tempos, enabling cover song detection, score-to-audio alignment, and performance comparison.
Direct comparison of audio sequences fails when they have different tempos. DTW finds the minimum-cost path through a distance matrix between the two sequences, warping time to find the best correspondence.
Question 4 Multiple Choice
Why does the Constant-Q Transform (CQT) have advantages over the standard STFT for musical pitch analysis?
AThe CQT is faster to compute than the FFT
BThe CQT uses logarithmically-spaced frequency bins matching musical pitch spacing, so each octave spans the same number of bins — aligning with how musicians and listeners perceive pitch
CThe CQT provides better time resolution for fast transients
DThe CQT does not require windowing, reducing spectral leakage
The STFT has linear frequency spacing, meaning low octaves get few bins while high octaves get many. The CQT's logarithmic spacing gives equal resolution across all octaves — matching musical note relationships and making chord, key, and melody analysis more natural.