Questions — Language Models and Neural Language Modeling

Question 1 Multiple Choice

You want to build a text generation system — a model that produces fluent, multi-sentence responses from a prompt. Which training paradigm is best suited, and why?

ABERT-style masked language modeling — reading context from both directions makes it more powerful

BGPT-style autoregressive modeling — it generates tokens left to right, making it naturally suited for text generation

CEither approach works equally well — the training task doesn't affect generation capability

DNeither — you need a separate sequence-to-sequence architecture, not a language model

Question 2 Multiple Choice

A research team trains a large transformer on billions of web pages using next-token prediction, then trains it for three more epochs on 10,000 labeled customer-service dialogues. What best describes this workflow?

ASupervised learning followed by unsupervised learning

BSelf-supervised pre-training followed by fine-tuning on task-specific data

CSelf-supervised learning only — the labeled dialogues are unnecessary given the scale of pre-training

DZero-shot learning — the model was never explicitly trained on the target task

Question 3 True / False

Autoregressive language models like GPT process the full sentence bidirectionally when predicting each token, using future context to inform earlier predictions.

TTrue

FFalse

Question 4 True / False

All of the capabilities large language models demonstrate — grammar, factual knowledge, reasoning patterns — emerge from the single training objective of predicting tokens in text.

TTrue

FFalse

Question 5 Short Answer

Why can language models trained only on next-token prediction learn to perform seemingly unrelated tasks like question answering, translation, or summarization?

Think about your answer, then reveal below.

Questions: Language Models and Neural Language Modeling