Questions — Neural Scaling Laws — Open Knowledge Graph

Question 1 Multiple Choice

Neural scaling laws reveal that test error decreases predictably with model size, data size, and compute. Which statement best captures the relationship?

ALoss = O(N^{-alpha}) where N is any of {model size, data size, compute} and alpha ≈ 0.07-0.1

BLoss decreases linearly with model size but exponentially with data size

CPerformance is independent of data size; only model size and compute matter

DScaling laws apply only to transformer models, not other architectures

Question 2 Short Answer

Scaling laws suggest there is an optimal allocation between model size and training data size. What is the Chinchilla scaling rule?

Think about your answer, then reveal below.

Question 3 Multiple Choice

Do neural scaling laws have theoretical justification, or are they purely empirical?

AScaling laws are fully explained by statistical learning theory and can be derived analytically

BScaling laws are purely empirical observations with no theoretical grounding

CScaling laws are partially understood through connections to renormalization, critical phenomena, and information theory, but a complete theoretical explanation remains open

DScaling laws apply only to language models; other domains have different scaling behavior

Question 4 True / False

A company has a fixed compute budget of 10^20 FLOPs. According to Chinchilla scaling, how should they allocate between model size and data to maximize performance?

TTrue

FFalse

Questions: Neural Scaling Laws