Questions: Mutual Information

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

A machine learning engineer uses mutual information to select features for a classifier. Why might mutual information be preferred over Pearson correlation for feature selection?

AMutual information is faster to compute than correlation
BMutual information detects any statistical dependence (including nonlinear), while Pearson correlation only measures linear association
CMutual information accounts for the causal direction between features and the target
DPearson correlation is undefined for discrete variables
Question 2 Multiple Choice

I(X;Y) = H(X) + H(Y) - H(X,Y). If X and Y are independent, I(X;Y) = 0. If Y is a deterministic function of X, what is I(X;Y)?

AI(X;Y) = 0 because deterministic relationships contain no randomness
BI(X;Y) = H(X) + H(Y)
CI(X;Y) = H(Y), because knowing X completely determines Y, so H(Y|X) = 0
DI(X;Y) = infinity because the dependence is perfect
Question 3 True / False

Mutual information is symmetric: I(X;Y) = I(Y;X). This means that if knowing X reduces your uncertainty about Y by 2 bits, then knowing Y also reduces your uncertainty about X by 2 bits.

TTrue
FFalse
Question 4 Short Answer

Explain the Venn diagram interpretation of mutual information and how it relates H(X), H(Y), H(X,Y), H(X|Y), and H(Y|X).

Think about your answer, then reveal below.