Questions — Semi-Supervised Learning

Question 1 Multiple Choice

A machine learning team has 200 labeled examples and 200,000 unlabeled examples. They apply a semi-supervised method and find it performs worse than a supervised model trained only on the 200 labeled examples. What is the most likely explanation?

A200,000 unlabeled examples is too many; semi-supervised methods work best with a 1:10 labeled-to-unlabeled ratio

BThe cluster assumption does not hold — class boundaries pass through dense regions of the feature space, so unlabeled data misleads the model

CSemi-supervised learning requires at least 1,000 labeled examples to function properly

DThe model architecture was too simple to exploit the unlabeled data structure

Question 2 Multiple Choice

In self-training (pseudo-labeling), a model assigns confident predictions to unlabeled examples and adds them to the training set. What is the primary risk of this approach?

AThe model will label too few examples, failing to benefit from the unlabeled data

BConfident but incorrect pseudo-labels compound through subsequent retraining iterations, amplifying early errors

CThe approach violates the i.i.d. assumption because pseudo-labels are correlated with the original predictions

DThe model will overfit the labeled data because pseudo-labels lack the diversity of real annotations

Question 3 True / False

Semi-supervised methods like FixMatch rely on the principle that a model's prediction should be consistent across different augmented views of the same unlabeled example, which pushes decision boundaries away from dense data regions.

TTrue

FFalse

Question 4 True / False

Adding more unlabeled data to a semi-supervised learning system will typically improve or at least not harm model performance compared to supervised learning on the labeled set alone.

TTrue

FFalse

Question 5 Short Answer

What is the cluster assumption in semi-supervised learning, and why does whether it holds determine whether SSL helps or hurts?

Think about your answer, then reveal below.

Questions: Semi-Supervised Learning