Questions — Word Embeddings and Representations

Question 1 Multiple Choice

The Word2Vec Skip-gram model learns word embeddings by:

ACounting how often each pair of words co-occurs across the entire corpus, then factorizing the resulting matrix

BTraining a shallow neural network to predict surrounding context words given a center word

CAssigning random dense vectors and iteratively adjusting them based on word frequency rankings

DEncoding each word as a weighted sum of the vectors of its definition words

Question 2 Multiple Choice

A well-trained embedding model produces the result: vec('Paris') − vec('France') + vec('Germany') ≈ vec('Berlin'). This works because:

AThe model memorized that Paris and Berlin are both capital cities from explicit labels in the training data

BCities that frequently appear together in the same sentence end up geometrically close in the embedding space

CThe embedding space encodes the 'capital city of' relationship as a consistent geometric direction, so subtracting and adding that direction navigates the analogy

DGloVe's co-occurrence matrix directly encodes country-capital pairs as high co-occurrence counts

Question 3 True / False

In one-hot encoding, the vectors for 'cat' and 'kitten' are geometrically closer to each other than to 'airplane,' because cats and kittens are semantically related.

TTrue

FFalse

Question 4 True / False

The distributional hypothesis — the theoretical foundation of word embeddings — holds that words appearing in similar contexts tend to have similar meanings.

TTrue

FFalse

Question 5 Short Answer

Why does Word2Vec learn semantically meaningful word representations even though it is trained on the seemingly simple task of predicting context words, with no explicit semantic labels?

Think about your answer, then reveal below.

Questions: Word Embeddings and Representations