Stochastic control optimizes a controlled diffusion dX = μ(X,u)dt + σ(X,u)dW, where the control u(t) is chosen adaptively to minimize an expected cost J(u) = E[∫₀ᵀ L(X,u)dt + g(X(T))]. The Hamilton-Jacobi-Bellman (HJB) equation ∂V/∂t + min_u{L(x,u) + μ(x,u)∂V/∂x + (1/2)σ²(x,u)∂²V/∂x²} = 0 characterizes the value function V(x,t), and the optimal control u* is the minimizer in the HJB equation. This is the stochastic extension of classical optimal control theory.
Stochastic control extends classical optimal control to systems driven by noise. The controlled process dX = μ(X,u)dt + σ(X,u)dW evolves differently depending on the control u(t), which the decision-maker chooses adaptively based on current information. The goal is to choose u to minimize the expected total cost J(u) = E[∫₀ᵀ L(X(t), u(t))dt + g(X(T))], where L is the running cost and g is the terminal cost. The control u can affect the drift (steering), the diffusion (risk management), or both.
The dynamic programming principle leads to the Hamilton-Jacobi-Bellman (HJB) equation. Define the value function V(x,t) = inf_u E[∫ₜᵀ L + g | X(t) = x] — the optimal cost-to-go from state x at time t. The HJB equation is ∂V/∂t + min_u{L(x,u) + μ(x,u)V_x + (1/2)σ²(x,u)V_{xx}} = 0 with terminal condition V(x,T) = g(x). This PDE encapsulates Bellman's principle of optimality: the optimal policy from (x,t) must be optimal for every sub-problem starting at any future state. The minimizer u*(x,t) = argmin{...} gives the optimal feedback control — a rule specifying the control as a function of the current state and time.
The derivation uses Itô's formula. If V is smooth, apply Itô to V(X(t),t): dV = (V_t + μV_x + (1/2)σ²V_{xx})dt + σV_x dW. For the process V(X(t),t) + ∫₀ᵗ L(X(s),u(s))ds to be a martingale under the optimal control (and a submartingale under any control), the drift must satisfy V_t + L + μV_x + (1/2)σ²V_{xx} ≥ 0 for all u and = 0 for u = u*. Minimizing over u gives the HJB equation. The verification theorem makes this rigorous: if a smooth V solves HJB and the resulting u* is admissible, then V is indeed the value function and u* is optimal.
The Merton problem (optimal investment and consumption) is the most famous application. An investor with wealth W following dW = (rW + π(μ-r)W - c)dt + πσW dW chooses the risky asset fraction π(t) and consumption rate c(t) to maximize E[∫₀ᵀ U(c)dt]. With power utility U(c) = c^γ/γ and GBM dynamics, the HJB equation admits an explicit solution: the optimal investment fraction π* = (μ-r)/((1-γ)σ²) is constant (the Merton fraction), and consumption is proportional to wealth. This elegant result — a dynamic stochastic problem with a static-looking solution — is special to the CRRA utility/GBM combination. Real-world extensions (stochastic volatility, transaction costs, portfolio constraints) make the HJB equation genuinely nonlinear and require numerical methods.
No topics depend on this one yet.