Questions: Neural Scaling Laws

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

Neural scaling laws reveal that test error decreases predictably with model size, data size, and compute. Which statement best captures the relationship?

ALoss = O(N^{-alpha}) where N is any of {model size, data size, compute} and alpha ≈ 0.07-0.1
BLoss decreases linearly with model size but exponentially with data size
CPerformance is independent of data size; only model size and compute matter
DScaling laws apply only to transformer models, not other architectures
Question 2 Short Answer

Scaling laws suggest there is an optimal allocation between model size and training data size. What is the Chinchilla scaling rule?

Think about your answer, then reveal below.
Question 3 Multiple Choice

Do neural scaling laws have theoretical justification, or are they purely empirical?

AScaling laws are fully explained by statistical learning theory and can be derived analytically
BScaling laws are purely empirical observations with no theoretical grounding
CScaling laws are partially understood through connections to renormalization, critical phenomena, and information theory, but a complete theoretical explanation remains open
DScaling laws apply only to language models; other domains have different scaling behavior
Question 4 True / False

A company has a fixed compute budget of 10^20 FLOPs. According to Chinchilla scaling, how should they allocate between model size and data to maximize performance?

TTrue
FFalse