A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Certified Compilation

Research Depth 95 in the knowledge graph ☐ I know this ☆ Set as goal

540prerequisites beneath it

Interactive Theorem Proving Operational Semantics +1 more→

Core Idea

Certified compilation produces a compiler whose behavior is proven correct by a machine-checked formal proof. A certified compiler guarantees that the compiled code behaves identically to the source program — miscompilation bugs are impossible by mathematical proof, not by testing. CompCert (the foundational example) is a certified C compiler where every optimization pass is formally verified in Coq to preserve program semantics. The proof demonstrates a simulation or bisimulation between source and compiled code: any observable behavior (input/output, termination, failure) of the source is faithfully reproduced by the compiled code. This provides absolute assurance that the compiler will not introduce subtle bugs that testing might miss.

Explainer

Every programmer knows the frustration: a compiler bug sneaks a miscompilation into production code. The program works correctly in isolation but fails in specific contexts because the compiler incorrectly optimized or transformed it. Compiler bugs are rare but devastating because they strike at a level of abstraction the programmer trusts completely. Certified compilation solves this problem at its root: prove mathematically that the compiler is correct.

CompCert, developed by Xavier Leroy and colleagues, demonstrated that certified compilation is practical. CompCert is a compiler from a subset of C to multiple target architectures (x86, PowerPC, ARM), with every transformation pass proven correct in Coq. The proof is machine-checked, meaning no argument is accepted unless the Coq kernel formally verifies it. The result is a compiler with an absolute guarantee: any observable behavior (input/output, termination, failure) of a C program compiled by CompCert will be identical to the behavior if that program were executed by a reference interpreter of C semantics.

The approach has two main components:

1. Formal semantics: Both the source language (C) and target language (assembly) are formalized mathematically. This is not a vague description but a precise definition of every operation, rule, and edge case. For C, this includes pointer operations, type conversions, memory layout, and undefined behavior boundaries. For the target assembly, it includes instruction execution, memory access, register semantics.

2. Verified transformations: Each compiler pass (parsing, type checking, optimization, code generation, register allocation) is implemented and proven to preserve semantics. A proof of semantic preservation for a pass P says: given a source program S that satisfies source semantics, the output P(S) satisfies target semantics and exhibits identical observable behavior.

The key insight is the simulation relation (or bisimulation): a formal notion of "same behavior." A backward simulation says that for every step of the target program (compiled), there is a corresponding step (or sequence of steps) of the source program, and the states remain "equivalent" throughout execution. This equivalence is carefully defined to ignore irrelevant differences (variable names, intermediate machine states) while preserving observable behaviors.

Practical implications:

No miscompilation bugs: Unlike conventional compilers tested on large benchmark suites, CompCert's correctness is absolute. A bug in the verified transformation is mathematically impossible; any failure must lie outside the verified scope (e.g., in the un-verified parts like preprocessing or linking).

Scope tradeoff: CompCert doesn't support every C feature (some undefined behaviors, dangerous casts, volatile access are excluded) and doesn't target every architecture. But the subset it covers is suitable for systems programming and critical applications.

Performance: CompCert includes optimizations (constant propagation, dead code elimination, common subexpression elimination, instruction scheduling). The compiler is 80-90% as fast as GCC on many benchmarks, demonstrating that certified compilation doesn't require sacrificing performance.

Widespread applicability: CompCert has been applied to verify embedded systems, aerospace software, and critical infrastructure. The guarantee that no compiler-introduced bugs can appear provides enormous value in high-assurance contexts.

Beyond CompCert:

Sel4 microkernel: The seL4 operating system kernel (a microkernel used in military/critical systems) was verified in Isabelle/HOL, including compilation to executable code.
Cryptol: A domain-specific language for cryptographic specifications, with a certified compiler to C.
CakeML: A dialect of Standard ML with a fully certified compiler from source to machine code, verified in HOL4.

The cost of certified compilation is development effort: CompCert took many years and extensive formalization effort. But as proof automation improves and certified compiler frameworks are reused, the cost is decreasing. The benefit — absolute assurance of compiler correctness — is increasingly valuable for critical systems where a single bug can have catastrophic consequences.

The fundamental question certified compilation raises is: how much can we trust automation? The answer CompCert provides is: we can trust compilers completely if we formalize their behavior and mechanically verify it. This is not a rejection of testing or engineering rigor but a complement: formal proof for the parts we can formalize, testing and review for the parts we cannot.

Practice Questions 4 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Finite State Machines (FSMs) → Deterministic Finite Automata (DFA) → Nondeterministic Finite Automata (NFA) → Two-Way Finite Automata → NFA to DFA Conversion (Subset Construction) → DFA Properties and Minimization Algorithms → Regular Languages: Definition and Characterization → Context-Free Grammars (CFGs) → Context-Free Grammar Properties and Ambiguity → Parse Trees, Derivations, and Ambiguity in CFGs → Context-Free Grammars in Compiler Design → Abstract Syntax Trees (ASTs) → Symbol Tables and Scope Resolution → Semantic Analysis Phase → Type Systems Overview → Curry-Howard Correspondence → Interactive Theorem Proving → Certified Compilation

Longest path: 96 steps · 540 total prerequisite topics

Prerequisites (3)

Operational Semanticshard Interactive Theorem Provinghard Program Synthesissoft

Leads To (0)

No topics depend on this one yet.