Questions: Optimal Control and Pontryagin Maximum Principle
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
For a minimum-fuel optimal control problem (minimize ∫|u(t)|dt subject to ẋ = f(x,u)), the optimal control u*(t) is often bang-bang: u* = ±u_max with possible switches at isolated times. Why is smooth feedback rarely optimal for minimum-fuel cost?
ASmooth feedback is always optimal for quadratic cost; minimum-fuel cost is different because you want to minimize total actuation, not energy
BSmooth feedback would spend 'medium effort' for most of the trajectory, accumulating fuel cost. Bang-bang control uses maximum effort when needed and zero effort otherwise, minimizing wasted intermediate-level efforts. The Hamiltonian switching function determines when to switch between limits
CSmooth feedback is computationally easier but physically infeasible
DThe actuator can only apply bang-bang control physically, so the optimal control law must match this constraint
The cost |u| is a norm, not a convex function (well, it's convex but not strictly convex with respect to time). At any instant, the Pontryagin condition ∂H/∂u = 0 or boundary: since H = |u| + λᵀf depends on u, and |u| is piecewise-linear in u, the optimal u that maximizes H will be at the boundary (u_max or u_min) rather than interior. Interior solutions (where ∂H/∂u = 0) occur only on singular arcs where special conditions hold. For minimum-time problems (cost J = T, the final time), bang-bang is always optimal. For minimum-fuel, it is optimal when there are no singular arcs.
Question 2 Multiple Choice
In Pontryagin's framework, the costate λ(t) satisfies a differential equation dλ/dt = −(∂H/∂x)ᵀ with terminal condition λ(T) determined by the cost function's terminal penalty. If the terminal state is free (no target), what is the appropriate boundary condition for λ?
Aλ(T) = 0 (the costate has no value at the final time)
Bλ(T) = ∇g(x(T)), where g is the terminal cost; if g = 0 (free terminal state), then λ(T) = 0
Cλ(T) is determined by the stability condition of the closed-loop system
Dλ(T) is a free variable, determined by iterating the TPBVP solution until it converges
The terminal condition for costate comes from the Pontryagin transversality conditions: if x(T) is free (you don't care where the system ends), then there is no terminal penalty, g(x(T)) = 0, and thus λ(T) = ∇g(x(T)) = 0. If you penalize the terminal state (g(x(T)) ≠ 0), then λ(T) = ∇g — the costate at the end equals the gradient of the terminal cost. This encodes the intuition: if you have no preference for where the system ends, the marginal value of changing the final state is zero, so λ(T) = 0. The costate then evolves backward in time (from T to 0) according to the costate dynamics, accumulating the cost of changing states along the optimal trajectory.
Question 3 True / False
A direct method for solving Pontryagin-type optimal control problems discretizes the trajectory into N steps, treats the state and control at each step as decision variables, and solves the resulting nonlinear program (NLP). What is the advantage over indirect methods (solving the TPBVP)?
TTrue
FFalse
Answer: True
Direct methods easily handle path constraints (e.g., |x₁| ≤ x_max at all times, not just terminal) because the state is a decision variable — you simply add inequality constraints to the NLP. Indirect methods must solve for the costate dynamics analytically, which is difficult when constraints are active. Direct methods also tend to be more numerically robust (NLP solvers are mature and well-conditioned), while TPBVP solvers can be sensitive to poor initial guesses for the costate. The trade-off: direct methods solve an NLP with ~N·n variables and constraints (where n is state dimension), which can be large, but requires no analytical derivation of costate equations. Indirect methods solve a smaller TPBVP but require deriving costate equations and are sensitive to initial conditions.
Question 4 True / False
In minimum-time optimal control, the switching function Φ(t) = λᵀb(x,t) (where b is the control direction) determines when optimal control switches from u_max to u_min. If Φ(t) has multiple zeros, what do they represent?
TTrue
FFalse
Answer: True
Each zero of Φ(t) = 0 is a potential switching point where the optimal control changes from u_max to u_min or vice versa. Multiple zeros indicate multiple switches during the trajectory. The number of switches is determined by the problem structure: for a simple double-integrator moving to a target position in minimum time, the optimal trajectory has at most one switch (accelerate, then decelerate). For higher-order systems, more switches can occur. In practice, you solve the TPBVP and count the zeros of Φ to determine the switching structure; then, you can re-solve with the known switch times to refine the solution.
Question 5 Short Answer
Explain why the Pontryagin costate λ(t) is often called the 'shadow price' or 'adjoint variable,' and how interpreting it this way helps understand the sensitivity of the optimal cost to changes in state constraints.
Think about your answer, then reveal below.
Model answer: The costate λᵢ(t) = −∂J*/∂xᵢ|_{x(t)}, the negative gradient of optimal cost with respect to state i. If you could relax the state constraint and increase xᵢ by δ, the optimal cost would improve (decrease) by λᵢ·δ (to first order). This is the 'shadow price': how much the objective is worth per unit increase in state. For example, in a fuel-optimal problem, if λ₁ is the costate for position, a large positive λ₁ means position is 'expensive' — moving further costs a lot of fuel. The costate dynamics dλ/dt = −∂H/∂x propagate this cost backward in time: a state is costly now because it will be costly in the future. Conversely, if you have a state constraint (e.g., x ≤ x_max), the costate tells you the value of relaxing it: if λ is large, relaxing x_max would significantly improve the objective.
This interpretation is powerful for sensitivity analysis: after solving the optimal control problem, you have both x*(t) and λ*(t). The costate λ* directly tells you the marginal benefit of state changes without resolving the optimization. This is widely used in aerospace (how much delta-v is worth for a trajectory maneuver) and economics (shadow prices of resources).