← Back to Topic Graph View All Domains

Questions: Markov Decision Processes

3 questions to test your understanding

Score: 0 / 3

Question 1 Multiple Choice

In an MDP, the Markov property means that:

AThe reward only depends on the current action

BThe next state depends only on the current state and action, not on history

CThe optimal policy must be deterministic

DTransition probabilities are uniform across all actions

Question 2 True / False

In an MDP with a discount factor γ = 1 and an infinite horizon, value iteration is expected to converge to the optimal value function in a finite number of iterations.

TTrue

FFalse

Question 3 Short Answer

What is the difference between a policy and a value function in an MDP, and how are they related?

Think about your answer, then reveal below.