After 30 Bayesian optimization trials, the acquisition function assigns a high score to a point where the Gaussian process predicts only mediocre performance — lower than the current best. Why would Bayesian optimization choose to evaluate this point?
AThe acquisition function is malfunctioning — it should always select the point with the highest predicted mean
BThe point has high uncertainty, giving it high expected improvement potential even if the mean prediction is mediocre
CBayesian optimization ignores predicted means entirely and maximizes uncertainty only
DThe Gaussian process needs more evaluations before its predictions become reliable enough to trust
This is the exploration-exploitation tradeoff in action. Expected Improvement integrates over the GP's uncertainty: a point with mediocre predicted mean but high uncertainty has a meaningful probability of being much better than predicted, and thus high expected improvement. Maximizing the predicted mean alone (pure exploitation) would miss potentially good regions that haven't been explored. The acquisition function automatically handles this balance — it's not a bug but a core feature of why Bayesian optimization outperforms greedy search.
Question 2 Multiple Choice
What does a Gaussian process contribute to Bayesian optimization that makes it fundamentally different from random search?
APredicted values at untried points, which random search also provides through interpolation
BBoth predicted values AND calibrated uncertainty estimates at every point, enabling principled exploration
CExact true values at untried points computed from the observed data analytically
DA guarantee that the global optimum will be found within a fixed number of evaluations
The Gaussian process provides a probability distribution over function values at every untried point — not just a predicted value but also a confidence interval that widens where data is sparse and narrows where evaluations have occurred. This uncertainty map is what random search completely lacks. Without knowing where the function is well-explored versus unknown, random search cannot make informed decisions about where to look next. The uncertainty estimate is the key ingredient that allows the acquisition function to balance exploitation (high predicted value) and exploration (high uncertainty).
Question 3 True / False
Bayesian optimization is most valuable when each objective function evaluation is cheap, because the overhead of fitting and maximizing the Gaussian process is the main computational bottleneck.
TTrue
FFalse
Answer: False
Bayesian optimization is most valuable when each evaluation is EXPENSIVE — training a large neural network, running a physical simulation, or conducting a wet-lab experiment. When evaluations are cheap, simpler methods like random search or grid search are perfectly adequate and have no GP overhead. The whole point of Bayesian optimization is to minimize the number of expensive evaluations by using every past result intelligently. The cost of fitting the GP (which scales cubically with observations) is trivial compared to the cost of model training runs measured in hours.
Question 4 True / False
Expected Improvement (EI) as an acquisition function automatically handles the exploration-exploitation tradeoff without requiring a manually tuned exploration parameter.
TTrue
FFalse
Answer: True
EI computes the expected amount by which a new point would surpass the current best observed value, integrating over the GP's uncertainty distribution. This formulation naturally produces exploration: where predictions are high and certain (good exploitation opportunity), EI is high. But where predictions are uncertain (unexplored territory), the probability of exceeding the current best through a lucky sample is also non-trivial, keeping EI high. As good regions become well-explored and their uncertainty drops, EI naturally shifts attention to uncertain regions — no manual temperature schedule or exploration parameter needed.
Question 5 Short Answer
Explain why Bayesian optimization is more efficient than random search, focusing on how the surrogate model and acquisition function change where the algorithm looks next.
Think about your answer, then reveal below.
Model answer: Random search evaluates hyperparameter configurations blindly — each trial is independent of all previous results. Bayesian optimization instead fits a Gaussian process surrogate model to all past evaluations, giving it a probabilistic map of which regions look promising and which are uncertain. The acquisition function (e.g., Expected Improvement) then uses this map to select the single point most likely to improve on the best result so far, considering both predicted performance and prediction uncertainty. Each new evaluation updates the surrogate and refines the map, so later trials become increasingly targeted. The result is that Bayesian optimization concentrates evaluations in genuinely promising regions rather than sampling uniformly, typically finding near-optimal configurations in 20–50 trials where random search might need hundreds.
The efficiency gain is entirely due to the feedback loop: past observations inform future decisions, which is impossible in random or grid search. This matters most in regimes where evaluations are expensive (hours per trial) and budgets are limited (tens of trials total) — exactly the setting of deep learning hyperparameter tuning or drug discovery.