Questions: Topic Modeling and Latent Dirichlet Allocation

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

In LDA, a 'topic' is best described as which of the following?

AA human-assigned label (like 'politics' or 'sports') provided during training to guide word grouping
BA probability distribution over vocabulary words, where words that co-occur frequently receive high probability
CA cluster of documents that discuss the same subject, identified by their TF-IDF vectors
DA latent embedding of document meaning in continuous vector space, similar to word2vec
Question 2 Multiple Choice

You train an LDA model with k=5 topics on a corpus of academic papers. After examining the top words per topic, topics 3 and 4 appear to cover very similar themes and overlap heavily. What does this suggest?

AThe model has converged incorrectly and needs to be retrained with better initialization
BThe number of topics k may be too high for this corpus, causing coherent themes to be split across multiple topics
CThis always happens with LDA because it cannot separate similar topics without labeled data
DTopic overlap means the Gibbs sampler did not run long enough and needs more iterations
Question 3 True / False

LDA requires labeled training data — for example, document categories — in order to discover topics from a text corpus.

TTrue
FFalse
Question 4 True / False

In LDA, the number of topics k must be specified by the modeler before training, similar to how k must be chosen in K-Means clustering.

TTrue
FFalse
Question 5 Short Answer

Explain the 'dual representation' that LDA produces and why this enables more diverse applications than a model that only classifies documents into categories.

Think about your answer, then reveal below.