Symbolic Execution

Research Depth 62 in the knowledge graph I know this Set as goal
Unlocks 4 downstream topics
symbolic-execution path-constraint smt concolic klee test-generation

Core Idea

Symbolic execution runs a program with symbolic values (placeholders like alpha, beta) instead of concrete inputs, tracking how these symbols propagate through computations and accumulating path constraints at each branch. When the program branches on a condition involving symbolic values, execution forks: one path adds the condition as a constraint, the other adds its negation. Each path's constraint set characterizes exactly which concrete inputs would follow that path. Feeding path constraints to an SMT solver generates concrete test inputs or proves path infeasibility. Symbolic execution systematically explores all program paths, enabling high-coverage test generation, bug finding, and bounded verification.

Explainer

Testing runs a program on specific concrete inputs and checks the output. This is fast but covers only the tested inputs — bugs on untested paths remain hidden. Symbolic execution inverts this by running the program on symbolic inputs (variables like alpha, beta rather than specific values), tracking how they flow through the computation. When the program computes y = x + 3, the symbolic executor records y = alpha + 3. When it branches on x > 5, it forks into two execution paths: one where alpha > 5 and one where alpha <= 5. Each path accumulates a path constraint — the conjunction of all branch conditions encountered along that path.

At any point, feeding the path constraint to an SMT solver (like Z3) either produces a concrete input satisfying all constraints (a test case that follows that exact path) or proves the constraints unsatisfiable (the path is infeasible — no concrete input can reach it). This gives symbolic execution two superpowers: automatic test generation (produce inputs that exercise specific paths, including hard-to-reach error paths) and bounded verification (if all paths to an error are explored and none is feasible, the error is unreachable within the explored scope).

The central challenge is path explosion: a program with n sequential branches has up to 2^n paths. Loops make this worse — a loop with a symbolic bound has a different path for each possible iteration count. Real programs have astronomically many paths, and exhaustive exploration is infeasible. Practical symbolic execution tools use heuristics to prioritize interesting paths (depth-first, coverage-guided, or directed toward specific targets), path merging (combining paths that converge to the same program point, using disjunctive constraints), and bounded analysis (exploring paths up to a fixed depth or loop unrolling bound).

Concolic execution (concrete + symbolic), pioneered by tools like DART and CUTE, addresses another limitation: pure symbolic execution cannot handle operations it cannot model symbolically (system calls, native library functions, complex pointer arithmetic). Concolic execution runs the program concretely with real inputs while simultaneously maintaining symbolic constraints for the operations it can model. It uses the concrete execution to drive progress and the symbolic constraints to generate alternative inputs. By negating one branch condition at a time, it systematically steers the program toward unexplored paths.

Practical impact has been substantial. KLEE (Stanford) found bugs in GNU Coreutils that had survived decades of testing and manual code review. SAGE (Microsoft) uses concolic execution to find security vulnerabilities in Windows applications and file parsers, having found roughly one-third of all security bugs discovered during Windows 7 development. S2E (EPFL) combines symbolic execution with whole-system analysis, testing entire software stacks including OS kernels. angr (UC Santa Barbara) applies symbolic execution to binary analysis, working without source code. The technique has become a standard tool in both security research and software testing, complementing model checking and deductive verification with a path-centric, test-generation-oriented approach.

Practice Questions 3 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesFinite State Machines (FSMs)Deterministic Finite Automata (DFA)Nondeterministic Finite Automata (NFA)Two-Way Finite AutomataNFA to DFA Conversion (Subset Construction)DFA Properties and Minimization AlgorithmsRegular Languages: Definition and CharacterizationContext-Free Grammars (CFGs)Context-Free Grammar Properties and AmbiguityParse Trees, Derivations, and Ambiguity in CFGsContext-Free Grammars in Compiler DesignCompiler Phases and OrganizationGrammar Design for CompilationDomain-Specific Language Design and ImplementationProgramming Language SemanticsSymbolic Execution

Longest path: 63 steps · 279 total prerequisite topics

Prerequisites (3)

Leads To (4)