Deadlock Detection and Recovery

College Depth 72 in the knowledge graph I know this Set as goal
Unlocks 2 downstream topics
deadlock detection recovery resource-allocation

Core Idea

Deadlock detection uses resource allocation graphs to identify cycles, indicating deadlock. The OS periodically checks for cycles. Recovery involves terminating processes (simple but destructive) or preempting resources (complex but less disruptive). Detection-and-recovery trades off prevention overhead for acceptance of occasional deadlocks.

Explainer

From deadlock conditions and resource allocation graphs, you know that deadlock requires four conditions — mutual exclusion, hold and wait, no preemption, and circular wait — and that a cycle in the resource allocation graph proves deadlock exists for single-instance resources. Deadlock detection and recovery is the strategy of letting deadlocks happen and then cleaning up, as opposed to preventing or avoiding them in advance.

The detection algorithm works by constructing and analyzing the wait-for graph, a simplified version of the resource allocation graph that shows only which processes are waiting for which other processes. For single-instance resource types, finding a cycle in this graph is sufficient to confirm deadlock. For multi-instance resource types, the detection algorithm is more involved: it simulates resource allocation using available resources, marking processes that could finish with what is currently free, then releasing their resources and repeating. Any process that cannot be marked by the end of this simulation is deadlocked. This is essentially the same logic as the Banker's algorithm for avoidance, but run on current state rather than projected future state.

A critical design question is how often to run the detection algorithm. Running it after every resource request catches deadlocks immediately but adds overhead to every allocation. Running it periodically (say, every few minutes) or when resource utilization drops below a threshold reduces overhead but means deadlocked processes sit idle longer. The right frequency depends on how costly deadlocks are versus how costly detection is — in a system where deadlocks are rare, periodic detection is usually sufficient.

Once a deadlock is detected, the OS must break it. Process termination is the blunt approach: kill one or more deadlocked processes to release their resources. The OS can terminate all deadlocked processes at once (guaranteed to break the deadlock, but maximally destructive) or terminate them one at a time, re-running detection after each kill to see if the deadlock is resolved. Resource preemption is gentler: forcibly take a resource from one process and give it to another, rolling back the victim process to a safe state. This requires the ability to checkpoint and restore process state, which is not always feasible. In either case, the OS must choose a victim — typically the process with the lowest priority, the least accumulated computation, or the one holding the most resources. The detection-and-recovery approach is philosophically pragmatic: rather than constraining every allocation to prevent a rare event, it lets the system run freely and pays the recovery cost only when deadlock actually occurs.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Kernel Architecture and OS StructureSystem Calls and User/Kernel ModeProcesses and the Process Control BlockProcess Creation: fork() and exec()Process Termination and Resource CleanupProcess States and State TransitionsThreads and ConcurrencyThe Critical Section Problem and Race ConditionsMutual Exclusion and LocksSemaphoresDeadlock: Conditions and ModelingDeadlock Conditions and Resource GraphsDeadlock Detection and Recovery

Longest path: 73 steps · 262 total prerequisite topics

Prerequisites (1)

Leads To (1)