SAT Solving and Conflict-Driven Clause Learning

Research Depth 70 in the knowledge graph I know this Set as goal
Unlocks 6 downstream topics
sat cdcl dpll unit-propagation backjumping watched-literals clause-learning

Core Idea

Modern SAT solvers determine whether a propositional formula in conjunctive normal form (CNF) has a satisfying assignment. The DPLL algorithm (Davis-Putnam-Logemann-Loveland) searches by choosing variable assignments, propagating forced implications (unit propagation), and backtracking on contradictions. Conflict-Driven Clause Learning (CDCL) dramatically improves on DPLL by analyzing each conflict to learn a new clause that prevents the same mistake, then backjumping non-chronologically past irrelevant decisions. With additional techniques like watched literals for efficient propagation, restart strategies, and variable-state-independent decaying sum (VSIDS) heuristics, CDCL solvers routinely handle industrial instances with millions of variables -- despite SAT being NP-complete.

Explainer

The Boolean satisfiability problem (SAT) asks whether there exists an assignment of truth values to variables that makes a propositional formula true. The formula is typically given in conjunctive normal form (CNF): a conjunction of clauses, where each clause is a disjunction of literals (variables or their negations). Despite being the canonical NP-complete problem, SAT has become practically solvable for enormous structured instances thanks to the CDCL algorithm, which powers every competitive modern SAT solver (MiniSat, CaDiCaL, Kissat, Glucose).

The foundation is the DPLL algorithm (1962), which searches for a satisfying assignment through three operations: decide (choose an unassigned variable and tentatively assign it true or false), propagate (apply unit propagation -- if a clause has exactly one unassigned literal and all others are false, that literal must be true), and backtrack (when a conflict is reached -- some clause has all literals false -- undo the most recent decision and try the opposite). DPLL organizes the search as a binary decision tree, with unit propagation pruning branches where implications are forced. On its own, DPLL with chronological backtracking is complete but can explore exponentially large subtrees that could have been avoided.

Conflict-Driven Clause Learning (CDCL), developed through the GRASP (1996), Chaff (2001), and subsequent solvers, transforms DPLL with three key innovations. First, conflict analysis: when propagation leads to a conflict, the solver traces backward through the implication graph (a DAG recording which decisions and propagations led to each assignment) to identify the true cause of the conflict. It derives a learned clause -- a new clause that is logically implied by the original formula but which the solver did not originally have. This clause prevents the same combination of decisions from recurring. Second, non-chronological backjumping: instead of undoing just the most recent decision, the solver jumps back to the decision level identified by conflict analysis as the source of the problem, potentially skipping many irrelevant decision levels at once. Third, efficient data structures: the watched-literals scheme ensures that unit propagation examines only the clauses directly affected by each assignment, achieving near-constant amortized cost per propagation step.

Modern CDCL solvers add several additional techniques that contribute to their remarkable empirical performance. The VSIDS (Variable State Independent Decaying Sum) decision heuristic prioritizes variables that appeared in recent conflicts, focusing the search on the "active" part of the formula. Restarts periodically abandon the current search tree and begin again from scratch, but crucially, all learned clauses are retained -- the solver restarts with strictly more knowledge than it began with. Clause database management periodically deletes low-activity learned clauses to control memory usage while retaining the most useful conflict information. Phase saving remembers the last assigned polarity of each variable and reuses it after restarts, preserving locality in the search. Together, these techniques enable CDCL solvers to exploit the structure present in real-world industrial formulas, which is why they succeed on instances with millions of variables even though random SAT instances of a few hundred variables can be impractical.

SAT solving is the engine underneath much of formal methods. Bounded model checking encodes verification problems as SAT formulas. Symbolic execution uses SAT/SMT to determine path feasibility. Invariant generation techniques use SAT oracles for counterexample-guided refinement. Understanding how the solver works -- not just that it returns SAT or UNSAT -- is essential for formulating verification problems in solver-friendly ways and for diagnosing why a verification tool succeeds or fails on a given instance.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsLogarithms IntroductionTime and Space ComplexityTime Complexity Classes: P and EXPTIMENondeterministic Time Complexity and NPThe P vs. NP ProblemComplexity Class P: Polynomial TimeComplexity Class NP: Nondeterministic Polynomial TimeNP-Completeness and Cook-Levin TheoremThe Cook-Levin TheoremBoolean Satisfiability, Cook-Levin, and ReductionsSAT Solving and Conflict-Driven Clause Learning

Longest path: 71 steps · 361 total prerequisite topics

Prerequisites (2)

Leads To (2)