Questions: Information-Theoretic Lower Bounds

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

An information-theoretic lower bound proves that estimating a d-dimensional parameter to epsilon accuracy requires at least Omega(d/epsilon^2) samples. A colleague proposes an algorithm that uses only O(sqrt(d)/epsilon^2) samples. What can you conclude?

AThe colleague's algorithm is incorrect — it must have a bug, because no algorithm can beat the lower bound
BThe lower bound may not apply to the colleague's specific problem setting — lower bounds hold for worst-case instances within the stated class, and the colleague's algorithm may exploit additional structure not present in the lower-bound construction
CLower bounds are only proved for Bayesian methods, so the colleague's frequentist algorithm can beat them
DThe lower bound is likely wrong because information-theoretic bounds are known to be loose
Question 2 Multiple Choice

Fano's inequality states that the probability of error in identifying a hypothesis from data is at least 1 - (I(X; Y) + 1) / log(M), where M is the number of hypotheses and I(X; Y) is the mutual information between the data and the hypothesis. Why is mutual information the right quantity here?

AMutual information measures computational complexity, which limits the algorithm's ability to process information
BMutual information quantifies the total amount of statistical information the data provides about which hypothesis is true — if the data carries little information about the identity of the hypothesis, no algorithm can reliably identify it
CMutual information is the only quantity that can be computed in closed form for Gaussian distributions
DMutual information measures the noise level in the data, and higher noise means harder learning
Question 3 True / False

Information-theoretic lower bounds apply only to specific algorithms (like ERM or gradient descent), not to all possible learning methods.

TTrue
FFalse
Question 4 Short Answer

Explain Le Cam's method for proving lower bounds and why it reduces the learning problem to a hypothesis testing problem.

Think about your answer, then reveal below.