Questions: Bellman Equation and Dynamic Programming
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A student tries to solve an infinite-horizon consumption problem by writing out first-order conditions for every period t = 0, 1, 2, … and solving them simultaneously. Why is the Bellman equation approach more tractable?
AIt restricts the agent to a finite planning horizon, making the system of equations solvable
BIt assumes the agent follows a simple consumption rule, eliminating the need for optimization
CIt reduces the infinite sequence of interdependent decisions to a single functional equation that determines the optimal choice at each state, exploiting the problem's recursive structure
DIt replaces the value function with a linear approximation, making the math tractable
The student's approach fails because an infinite-horizon problem has infinitely many first-order conditions, all coupled to each other. The Bellman insight is the principle of optimality: if you know the value of every possible next-period state (captured by V(x')), then today's problem collapses to a single optimization over today's choice. The recursive structure means the same equation applies at every time period — there is no 'last period' to work backward from, yet the problem is still solvable. Value function iteration exploits this by repeatedly updating V until it converges to a fixed point.
Question 2 Multiple Choice
In the Bellman equation V(x) = max_c [u(c, x) + βV(x')], what does the term βV(x') represent?
AThe cost of transitioning from state x to state x'
BThe discounted value of being in next-period state x' and behaving optimally from that point onward forever
CThe marginal utility of current consumption, discounted to present value
DThe total undiscounted sum of all future period payoffs
V(x') is the value function evaluated at next period's state — by definition, it captures the maximum discounted payoff achievable starting from x' and behaving optimally at every subsequent period. The β multiplier discounts it to present value, reflecting time preference (a unit of payoff tomorrow is worth β < 1 units today). Together, βV(x') summarizes the entire infinite future in a single term, which is what makes the Bellman equation so powerful — the whole infinite horizon is compressed into the value of the next-period state.
Question 3 True / False
The Bellman equation embodies the principle of optimality: if a plan is globally optimal, then the continuation of that plan from any future state must itself be optimal given that state.
TTrue
FFalse
Answer: True
This is the foundation of dynamic programming. If there existed a better continuation from some future state, the agent could improve the overall plan by switching to it — contradicting the assumption that the original plan was globally optimal. The principle implies that optimizing myopically at each state (using the value function to summarize the future) is equivalent to optimizing globally across all time periods at once. This is a non-trivial insight: it converts a global infinite-dimensional problem into a local one-period problem solved at each state.
Question 4 True / False
To apply the Bellman equation, an agent is expected to first solve for the optimal decisions in most future periods before determining what to do in the current period.
TTrue
FFalse
Answer: False
This misunderstands the recursive structure. The Bellman equation says: given the value function V (which summarizes all future payoffs), the optimal current decision is found by maximizing today's payoff plus βV(x'). You don't need to know tomorrow's specific decision to make today's — you only need V, which can be found iteratively without any temporal ordering. Value function iteration starts from an arbitrary guess and converges to V* without ever solving 'future periods first.' The recursion works precisely because the value function decouples today's optimization from the infinite sequence of future decisions.
Question 5 Short Answer
What is the value function V(x) in the Bellman framework, and why does the recursive formulation make infinite-horizon optimization tractable?
Think about your answer, then reveal below.
Model answer: The value function V(x) is the maximum total discounted payoff an agent can achieve by starting in state x and behaving optimally at every subsequent period forever. It is a function from states to values, summarizing the entire future in a single number for each state. The recursive formulation is tractable because it converts the problem of choosing an infinite sequence of actions (intractable) into the problem of finding a fixed point of a single functional equation (tractable via iteration). Once V is known, the optimal policy c*(x) follows immediately by solving the one-period maximization at each state.
The key is that V(x) encodes everything relevant about the future in a compact form. An infinite sequence of decisions becomes a one-period problem — choose c today to maximize u(c,x) + βV(x') — where V(x') does the heavy lifting of summarizing everything that happens afterward. The contraction mapping theorem guarantees this functional equation has a unique solution and that iterative methods converge to it.