Questions: Markov Decision Processes

3 questions to test your understanding

Score: 0 / 3
Question 1 Multiple Choice

In an MDP, the Markov property means that:

AThe reward only depends on the current action
BThe next state depends only on the current state and action, not on history
CThe optimal policy must be deterministic
DTransition probabilities are uniform across all actions
Question 2 True / False

In an MDP with a discount factor γ = 1 and an infinite horizon, value iteration is expected to converge to the optimal value function in a finite number of iterations.

TTrue
FFalse
Question 3 Short Answer

What is the difference between a policy and a value function in an MDP, and how are they related?

Think about your answer, then reveal below.