In eventually consistent systems, replicas temporarily hold different data. Read repair fixes inconsistencies on reads by comparing versions from replicas and writing back the newest; anti-entropy runs in the background (using Merkle trees or gossip) to find and fix divergent data without waiting for reads.
Design a scenario: a replica misses an update while offline. Trace through read repair (client reads from multiple replicas, resolves conflict, writes back) and anti-entropy (background process scans both replicas, finds mismatch, pushes correct value).
In an eventually consistent system, replicas are allowed to temporarily diverge — a write might reach replica A but not replica B due to a network partition, slow propagation, or a node being temporarily down. Your prerequisite knowledge of eventual consistency tells you that the system guarantees replicas will converge *eventually*, but it does not specify *how*. Read repair and anti-entropy are the two primary mechanisms that make convergence actually happen.
Read repair is an opportunistic strategy that piggybacks consistency fixing onto normal read operations. When a client reads a key, the coordinator contacts multiple replicas (often a quorum). If the replicas return different versions, the coordinator identifies the most recent version — typically using vector clocks or timestamps — and writes it back to the stale replicas before returning the result to the client. Think of it like checking a fact with three colleagues: if two say "version 5" and one says "version 4," you correct the one who is behind. The advantage is that frequently-read data stays consistent with no extra background work. The disadvantage is that data nobody reads can remain inconsistent indefinitely.
Anti-entropy fills that gap. It is a background process that systematically compares replicas and repairs any differences it finds, regardless of whether anyone is reading the data. The most common implementation uses Merkle trees — hash trees where each leaf represents a range of keys and each parent is the hash of its children. Two replicas can compare their Merkle tree roots in a single exchange; if the roots match, all data is consistent. If not, they recursively descend to identify exactly which key ranges differ, minimizing the amount of data transferred. This is far more efficient than comparing every key-value pair.
The two mechanisms complement each other and are typically deployed together. Read repair handles the hot path: popular keys that are read constantly get fixed almost immediately. Anti-entropy handles the cold path: keys that are rarely or never read still converge on a schedule — hourly, daily, or whatever the operator configures. Together, they give an eventually consistent system a practical convergence guarantee with tunable tradeoffs between repair speed, read latency, and background resource usage. Systems like Apache Cassandra and Amazon Dynamo use exactly this combination.