A music streaming service uses content-based filtering. A new user has only ever rated heavy metal songs highly. What will the system most likely recommend next?
AGlobally popular songs, since the system lacks enough data to personalize recommendations
BOther heavy metal and similar hard rock tracks, because the user profile's feature weights match those item features
CA diverse mix of genres to prevent the user from getting bored
DSongs that other users who liked metal also enjoyed, based on shared listening history
Content-based filtering recommends items whose feature vectors are most similar to the user's profile. The user profile is built by aggregating the feature vectors of previously rated items, so after rating only heavy metal songs, the profile has strong weights on metal-related features. The system scores candidate items by cosine similarity to this profile and surfaces the closest matches — more metal and similar genres. Option D describes collaborative filtering, which uses other users' ratings rather than item features.
Question 2 Multiple Choice
A content-based filtering system handles new items that no users have ever rated much better than a collaborative filtering system. What is the core reason?
AContent-based systems are computationally faster and can index new items instantly
BCollaborative filtering requires ratings from multiple users to identify similar items; a new item with no ratings is invisible to it, while content-based filtering only needs item metadata
CCollaborative filtering systems do not store item features, so new items cannot be compared
DNew items always have better metadata than older items, giving content-based systems an advantage
This is the item cold-start advantage of content-based filtering. Collaborative filtering predicts preferences by finding users or items with similar rating patterns — but a brand-new item has no ratings yet, so it cannot participate in similarity computations. Content-based filtering needs only item metadata (genre, keywords, director, etc.) to match the item against user profiles, making it immediately recommendable. The limitation, of course, is that rich metadata must be available.
Question 3 True / False
Over time, a content-based filtering system will naturally surface increasingly diverse content as the user's interaction history grows and the profile becomes richer.
TTrue
FFalse
Answer: False
This is the opposite of what happens. Content-based filtering has an inherent over-specialization problem: as the user rates more items of a certain type, the profile's weights become even more concentrated on those features, making the system recommend more of the same. A user who has only watched comedies will never be shown a documentary, no matter how many comedies they rate. The system reinforces existing preferences rather than exploring new territory. This is the 'filter bubble' effect, and it is the primary motivation for combining content-based methods with collaborative filtering in hybrid systems.
Question 4 True / False
In content-based filtering, a user profile is constructed by aggregating the feature vectors of items that user has previously interacted with, weighted by engagement or rating signals.
TTrue
FFalse
Answer: True
This is the standard architecture: each item is represented as a feature vector (genre weights, keywords, actor presence, etc.), and the user profile is formed by accumulating and averaging (or weighted-summing) those vectors based on the user's interaction history. High-rated items contribute more heavily than low-rated or skipped items. Recommendation then becomes a nearest-neighbor search: find items whose feature vectors are most similar (by cosine similarity or dot product) to the user profile vector.
Question 5 Short Answer
What is the over-specialization problem in content-based filtering, and why does it arise structurally from the approach?
Think about your answer, then reveal below.
Model answer: Over-specialization (also called the filter bubble) occurs when the system only recommends items similar to what the user has already consumed, making it impossible to discover content in categories the user has never explored. It arises because content-based filtering scores items purely by feature similarity to the existing user profile — if the profile contains only comedy features, items with strong comedy features always score highest. The system has no mechanism to reward novelty or diversity; it optimizes for similarity to past behavior, which inherently reinforces existing tastes.
Understanding why over-specialization is structural — not a bug to be fixed but a consequence of the design philosophy — is the key insight. The system is doing exactly what it was designed to do: find items similar to what the user liked. The limitation becomes apparent only when you want something the system cannot provide by design: serendipitous discovery. This is why hybrid systems pair content-based filtering (good for cold starts and explainability) with collaborative filtering (good for introducing serendipity via user similarity).