5 questions to test your understanding
In LDA, a 'topic' is best described as which of the following?
You train an LDA model with k=5 topics on a corpus of academic papers. After examining the top words per topic, topics 3 and 4 appear to cover very similar themes and overlap heavily. What does this suggest?
LDA requires labeled training data — for example, document categories — in order to discover topics from a text corpus.
In LDA, the number of topics k must be specified by the modeler before training, similar to how k must be chosen in K-Means clustering.
Explain the 'dual representation' that LDA produces and why this enables more diverse applications than a model that only classifies documents into categories.