Questions: Language Models and Neural Language Modeling

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

You want to build a text generation system — a model that produces fluent, multi-sentence responses from a prompt. Which training paradigm is best suited, and why?

ABERT-style masked language modeling — reading context from both directions makes it more powerful
BGPT-style autoregressive modeling — it generates tokens left to right, making it naturally suited for text generation
CEither approach works equally well — the training task doesn't affect generation capability
DNeither — you need a separate sequence-to-sequence architecture, not a language model
Question 2 Multiple Choice

A research team trains a large transformer on billions of web pages using next-token prediction, then trains it for three more epochs on 10,000 labeled customer-service dialogues. What best describes this workflow?

ASupervised learning followed by unsupervised learning
BSelf-supervised pre-training followed by fine-tuning on task-specific data
CSelf-supervised learning only — the labeled dialogues are unnecessary given the scale of pre-training
DZero-shot learning — the model was never explicitly trained on the target task
Question 3 True / False

Autoregressive language models like GPT process the full sentence bidirectionally when predicting each token, using future context to inform earlier predictions.

TTrue
FFalse
Question 4 True / False

All of the capabilities large language models demonstrate — grammar, factual knowledge, reasoning patterns — emerge from the single training objective of predicting tokens in text.

TTrue
FFalse
Question 5 Short Answer

Why can language models trained only on next-token prediction learn to perform seemingly unrelated tasks like question answering, translation, or summarization?

Think about your answer, then reveal below.