Questions — Mutual Information — Open Knowledge Graph

Question 1 Multiple Choice

A machine learning engineer uses mutual information to select features for a classifier. Why might mutual information be preferred over Pearson correlation for feature selection?

AMutual information is faster to compute than correlation

BMutual information detects any statistical dependence (including nonlinear), while Pearson correlation only measures linear association

CMutual information accounts for the causal direction between features and the target

DPearson correlation is undefined for discrete variables

Question 2 Multiple Choice

I(X;Y) = H(X) + H(Y) - H(X,Y). If X and Y are independent, I(X;Y) = 0. If Y is a deterministic function of X, what is I(X;Y)?

AI(X;Y) = 0 because deterministic relationships contain no randomness

BI(X;Y) = H(X) + H(Y)

CI(X;Y) = H(Y), because knowing X completely determines Y, so H(Y|X) = 0

DI(X;Y) = infinity because the dependence is perfect

Question 3 True / False

Mutual information is symmetric: I(X;Y) = I(Y;X). This means that if knowing X reduces your uncertainty about Y by 2 bits, then knowing Y also reduces your uncertainty about X by 2 bits.

TTrue

FFalse

Question 4 Short Answer

Explain the Venn diagram interpretation of mutual information and how it relates H(X), H(Y), H(X,Y), H(X|Y), and H(Y|X).

Think about your answer, then reveal below.

Questions: Mutual Information