Questions — Topic Modeling and Latent Dirichlet Allocation

Question 1 Multiple Choice

In LDA, a 'topic' is best described as which of the following?

AA human-assigned label (like 'politics' or 'sports') provided during training to guide word grouping

BA probability distribution over vocabulary words, where words that co-occur frequently receive high probability

CA cluster of documents that discuss the same subject, identified by their TF-IDF vectors

DA latent embedding of document meaning in continuous vector space, similar to word2vec

Question 2 Multiple Choice

You train an LDA model with k=5 topics on a corpus of academic papers. After examining the top words per topic, topics 3 and 4 appear to cover very similar themes and overlap heavily. What does this suggest?

AThe model has converged incorrectly and needs to be retrained with better initialization

BThe number of topics k may be too high for this corpus, causing coherent themes to be split across multiple topics

CThis always happens with LDA because it cannot separate similar topics without labeled data

DTopic overlap means the Gibbs sampler did not run long enough and needs more iterations

Question 3 True / False

LDA requires labeled training data — for example, document categories — in order to discover topics from a text corpus.

TTrue

FFalse

Question 4 True / False

In LDA, the number of topics k must be specified by the modeler before training, similar to how k must be chosen in K-Means clustering.

TTrue

FFalse

Question 5 Short Answer

Explain the 'dual representation' that LDA produces and why this enables more diverse applications than a model that only classifies documents into categories.

Think about your answer, then reveal below.

Questions: Topic Modeling and Latent Dirichlet Allocation