Compiler Bootstrapping and Self-Hosting

Graduate Depth 61 in the knowledge graph I know this Set as goal
bootstrapping compilation self-hosting

Core Idea

A bootstrap compiler is a compiler written in its own language. Building one requires an initial compiler in another language to compile the bootstrapping version; the bootstrapped compiler then compiles itself, enabling improvements with each iteration and providing a reference implementation.

Explainer

There is a seemingly paradoxical question at the heart of compiler design: if you want to write a compiler for language X in language X, what compiles the compiler? This is the bootstrapping problem, and understanding it requires thinking carefully about the distinction between the language a compiler is *written in* and the language it *compiles*. From your study of compiler phases, you know a compiler is just a program — it reads source code, transforms it through a pipeline of phases, and emits target code. The language it is written in is independent of the language it processes.

The classic solution involves three stages, sometimes called a bootstrap chain. First, you write a minimal compiler for language X in some existing language Y — say, you write a C compiler for a subset of X. This compiler does not need to be fast, elegant, or complete; it just needs to correctly compile enough of X to be useful. Second, you rewrite the compiler in X itself, using only the subset that your stage-one compiler supports. Third, you compile this X-written compiler using the stage-one compiler. The result is a binary — produced by Y's toolchain — that can compile X source code. Now you feed the X-written compiler source to *itself* (the binary you just produced), and out comes a new binary that was compiled by an X compiler. This new binary is the self-hosting compiler.

The process can be repeated iteratively. Each time you improve the compiler's source code (adding optimizations, fixing bugs, supporting new language features), you compile the new source with the previous version of the compiler. The improvements compound: a compiler that generates better code will, when used to compile itself, produce a faster compiler binary, which in turn compiles the next version faster. This iterative self-improvement is one of the elegant properties of bootstrapping. In practice, compiler developers often maintain a "known good" binary of the previous compiler version specifically for this purpose.

Bootstrapping also provides a powerful correctness check. If you compile the compiler source with version N to produce binary N+1, and then compile the same source with binary N+1 to produce binary N+2, then binaries N+1 and N+2 should be identical — a property called fixed-point reproducibility. If they differ, something is wrong: either the compiler has a bug that manifests differently depending on which binary compiled it, or there is nondeterminism in the compilation process. This technique was used historically to validate early C compilers and remains a standard practice. Real-world examples include GCC (originally bootstrapped from a Pastel compiler), Rust (bootstrapped from an OCaml implementation), and Go (which was rewritten from C to Go and bootstrapped through a translation tool).

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Assembly Language BasicsCompiler Bootstrapping and Self-Hosting

Longest path: 62 steps · 298 total prerequisite topics

Prerequisites (2)

Leads To (0)

No topics depend on this one yet.