Saga Pattern for Long-Running Distributed Transactions

Graduate Depth 69 in the knowledge graph I know this Set as goal
transactions saga long-running consistency

Core Idea

Sagas are long-running transactions split into a sequence of local transactions, each with a compensating transaction for rollback. If any step fails, compensations run in reverse order. Sagas avoid blocking (unlike 2PC) but must handle partial failures and idempotence carefully.

How It's Best Learned

Model a travel booking saga: reserve hotel, reserve flight, reserve car. Write compensations: cancel hotel, cancel flight, cancel car. Then trace through a failure (flight reservation fails) and verify the rollback is correct and can be retried safely.

Common Misconceptions

Explainer

From your study of the two-phase commit protocol, you know that 2PC provides strong atomicity guarantees — all participants commit or all abort — but at a steep cost: resources are locked for the entire duration of the transaction, and a coordinator crash can leave participants blocked indefinitely. For short-lived transactions within a single database, this tradeoff is often acceptable. But consider a travel booking that reserves a hotel, a flight, and a rental car across three independent services. If the entire workflow takes thirty seconds and any service can fail, holding locks across all three services for that duration is impractical and fragile. The saga pattern offers an alternative.

A saga breaks a distributed transaction into a sequence of local transactions, each of which commits independently against its own service. The hotel service commits the reservation locally, then the flight service commits its reservation locally, then the car service commits its reservation. There is no global coordinator holding locks across all three. Each local transaction is fully committed — its changes are visible to other users — before the next step begins. This eliminates the blocking problem that plagues 2PC.

The price of this freedom is that you lose automatic rollback. If the flight reservation fails after the hotel reservation has already committed, you cannot simply abort — the hotel reservation is already durable. Instead, each step in a saga has a corresponding compensating transaction that semantically undoes its effect. The hotel compensation cancels the reservation, the flight compensation cancels the flight, and so on. When a step fails, the saga executes compensations in reverse order for all previously completed steps. This is not a true rollback — it is a new set of forward actions that happen to reverse the business effect.

Two coordination styles exist for sagas. In choreography, each service publishes events when it completes its local transaction, and the next service listens for those events and begins its work. This is decentralized and loosely coupled but can become hard to trace and debug as the number of steps grows. In orchestration, a central saga coordinator tells each service what to do and tracks the overall progress, making the workflow explicit and easier to monitor. The orchestrator does not hold locks like a 2PC coordinator — it simply sequences the local transactions and triggers compensations on failure.

The critical design challenge in sagas is idempotence. Network failures can cause a compensation or a local transaction to be delivered more than once, so every step must produce the same result whether executed once or multiple times. A "cancel hotel reservation" compensation that runs twice should not fail or double-refund. Sagas also require careful thought about isolation — since intermediate states are visible (the hotel is reserved before the flight is confirmed), other transactions can see and act on partially completed sagas. Designing compensations that are safe, idempotent, and available under failure is the core engineering challenge, and it is why sagas provide eventual consistency rather than the strict atomicity of 2PC.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Kernel Architecture and OS StructureSystem Calls and User/Kernel ModeProcesses and the Process Control BlockLogical Clocks and Event OrderingVector Clocks and Capturing CausalityHappened-Before Relation and Causal OrderingConsistency Models in Distributed SystemsReplication Strategies and Trade-offsTwo-Phase Commit ProtocolSaga Pattern for Long-Running Distributed Transactions

Longest path: 70 steps · 253 total prerequisite topics

Prerequisites (1)

Leads To (0)

No topics depend on this one yet.