Questions: Word Embeddings and Representations

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

The Word2Vec Skip-gram model learns word embeddings by:

ACounting how often each pair of words co-occurs across the entire corpus, then factorizing the resulting matrix
BTraining a shallow neural network to predict surrounding context words given a center word
CAssigning random dense vectors and iteratively adjusting them based on word frequency rankings
DEncoding each word as a weighted sum of the vectors of its definition words
Question 2 Multiple Choice

A well-trained embedding model produces the result: vec('Paris') − vec('France') + vec('Germany') ≈ vec('Berlin'). This works because:

AThe model memorized that Paris and Berlin are both capital cities from explicit labels in the training data
BCities that frequently appear together in the same sentence end up geometrically close in the embedding space
CThe embedding space encodes the 'capital city of' relationship as a consistent geometric direction, so subtracting and adding that direction navigates the analogy
DGloVe's co-occurrence matrix directly encodes country-capital pairs as high co-occurrence counts
Question 3 True / False

In one-hot encoding, the vectors for 'cat' and 'kitten' are geometrically closer to each other than to 'airplane,' because cats and kittens are semantically related.

TTrue
FFalse
Question 4 True / False

The distributional hypothesis — the theoretical foundation of word embeddings — holds that words appearing in similar contexts tend to have similar meanings.

TTrue
FFalse
Question 5 Short Answer

Why does Word2Vec learn semantically meaningful word representations even though it is trained on the seemingly simple task of predicting context words, with no explicit semantic labels?

Think about your answer, then reveal below.