Certified Compilation

Research Depth 69 in the knowledge graph I know this Set as goal
verified-compiler compcert semantics-preservation bisimulation compiler-correctness proof-assistant

Core Idea

Certified compilation produces a compiler whose behavior is proven correct by a machine-checked formal proof. A certified compiler guarantees that the compiled code behaves identically to the source program — miscompilation bugs are impossible by mathematical proof, not by testing. CompCert (the foundational example) is a certified C compiler where every optimization pass is formally verified in Coq to preserve program semantics. The proof demonstrates a simulation or bisimulation between source and compiled code: any observable behavior (input/output, termination, failure) of the source is faithfully reproduced by the compiled code. This provides absolute assurance that the compiler will not introduce subtle bugs that testing might miss.

Explainer

Every programmer knows the frustration: a compiler bug sneaks a miscompilation into production code. The program works correctly in isolation but fails in specific contexts because the compiler incorrectly optimized or transformed it. Compiler bugs are rare but devastating because they strike at a level of abstraction the programmer trusts completely. Certified compilation solves this problem at its root: prove mathematically that the compiler is correct.

CompCert, developed by Xavier Leroy and colleagues, demonstrated that certified compilation is practical. CompCert is a compiler from a subset of C to multiple target architectures (x86, PowerPC, ARM), with every transformation pass proven correct in Coq. The proof is machine-checked, meaning no argument is accepted unless the Coq kernel formally verifies it. The result is a compiler with an absolute guarantee: any observable behavior (input/output, termination, failure) of a C program compiled by CompCert will be identical to the behavior if that program were executed by a reference interpreter of C semantics.

The approach has two main components:

1. Formal semantics: Both the source language (C) and target language (assembly) are formalized mathematically. This is not a vague description but a precise definition of every operation, rule, and edge case. For C, this includes pointer operations, type conversions, memory layout, and undefined behavior boundaries. For the target assembly, it includes instruction execution, memory access, register semantics.

2. Verified transformations: Each compiler pass (parsing, type checking, optimization, code generation, register allocation) is implemented and proven to preserve semantics. A proof of semantic preservation for a pass P says: given a source program S that satisfies source semantics, the output P(S) satisfies target semantics and exhibits identical observable behavior.

The key insight is the simulation relation (or bisimulation): a formal notion of "same behavior." A backward simulation says that for every step of the target program (compiled), there is a corresponding step (or sequence of steps) of the source program, and the states remain "equivalent" throughout execution. This equivalence is carefully defined to ignore irrelevant differences (variable names, intermediate machine states) while preserving observable behaviors.

Practical implications:

Beyond CompCert:

The cost of certified compilation is development effort: CompCert took many years and extensive formalization effort. But as proof automation improves and certified compiler frameworks are reused, the cost is decreasing. The benefit — absolute assurance of compiler correctness — is increasingly valuable for critical systems where a single bug can have catastrophic consequences.

The fundamental question certified compilation raises is: how much can we trust automation? The answer CompCert provides is: we can trust compilers completely if we formalize their behavior and mechanically verify it. This is not a rejection of testing or engineering rigor but a complement: formal proof for the parts we can formalize, testing and review for the parts we cannot.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsLogarithms IntroductionTime and Space ComplexityAmortized AnalysisHash TablesSymbol Tables and Scope ResolutionSemantic Analysis PhaseType Systems OverviewCurry-Howard CorrespondenceInteractive Theorem ProvingCertified Compilation

Longest path: 70 steps · 405 total prerequisite topics

Prerequisites (3)

Leads To (0)

No topics depend on this one yet.