Questions: Compiler Bootstrapping and Self-Hosting
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
In the three-stage bootstrap chain for language X, what is the essential role of the stage-1 compiler written in language Y?
AIt serves as the permanent production compiler for language X
BIt compiles enough of X to produce a binary that can then compile the X-written compiler, breaking the circularity
CIt validates that the X-written compiler is semantically correct by comparing outputs
DIt translates the X-written compiler into assembly language for direct execution
The stage-1 compiler need not be fast, complete, or elegant — it only needs to compile enough of language X to process the source code of the X-written compiler. This produces the first binary that can compile X source. That binary then compiles itself, producing a self-hosting compiler whose lineage traces back to the Y-written stage-1 tool. Without this intermediate step, there is no way to turn the X-written compiler source into an executable — you cannot run source code without a compiler, and you cannot have a compiler without being able to run it.
Question 2 Multiple Choice
A compiler team compiles their compiler source with version N to produce binary N+1, then compiles the same source with binary N+1 to produce binary N+2. What property do they expect to hold, and what does a violation indicate?
ABinary N+2 should be larger than N+1, indicating the compiler is generating more optimized code
BBinaries N+1 and N+2 should be identical (fixed-point reproducibility); a difference indicates a compiler bug or nondeterminism
CBinary N+2 should be faster at runtime than N+1, since it was compiled by a better compiler
DThe source code should change between compilations as the compiler auto-optimizes itself
Fixed-point reproducibility: if the compiler correctly implements the language, compiling the same source with itself should produce an identical binary regardless of which previous version did the compiling. Differences between N+1 and N+2 mean the compiler behaves differently depending on what compiled it — a sign of a latent bug. This technique was used to validate early C compilers and remains a standard correctness check. It is elegant precisely because the compiler is its own test case.
Question 3 True / False
A self-hosting compiler is one that can compile its own source code, meaning the language it is written in and the language it compiles are the same.
TTrue
FFalse
Answer: True
Self-hosting means the compiler accepts its own source code as input and produces a working binary — the compiler compiles itself. This is distinct from a language being self-hosting (a broader property) or a compiler being optimal (self-hosting says nothing about efficiency). GCC, Rust, Go, and many production compilers are self-hosting. Achieving self-hosting is significant because it validates that the compiler can handle the full complexity of the language it targets — including all the features used in the compiler's own implementation.
Question 4 True / False
When developing a new version of a self-hosting compiler, you is expected to rewrite the compiler from scratch using an external language each time, because the previous version can seldom compile new language features it doesn't yet understand.
TTrue
FFalse
Answer: False
You compile the new version's source code using the previous version of the compiler. The trick is to write new features using only syntax that the old compiler already understands — you can add new capabilities to the compiler without needing the compiler to already support those capabilities. Once the new version compiles and produces a binary, that binary (which implements the new features) can then compile the same source again, producing a fully featured self-hosting binary. This iterative refinement is the normal workflow.
Question 5 Short Answer
Explain the apparent paradox of bootstrapping — 'how can a compiler for language X be written in language X?' — and how the bootstrap chain resolves it.
Think about your answer, then reveal below.
Model answer: The apparent paradox dissolves once you distinguish between the language a compiler is written in and the language it compiles — these are independent. You don't need a compiler for X to write source code in X; you need one to execute it. The bootstrap chain resolves the circularity in stages: first write a minimal X compiler in another language Y, use it to compile an X-written compiler, then use that binary to compile itself. The resulting self-hosting compiler has no circular dependency — it was produced by a known-good toolchain.
The key insight is that the bootstrapping process has a beginning: a compiler in an existing language Y that can process enough of X. This initial compiler is the 'seed' that breaks the circularity. Each subsequent stage uses the previous binary, so there is always a concrete, executable program doing the compilation — never a circular dependency. Historical examples include GCC (from Pastel), Rust (from OCaml), and Go (from a C implementation later translated to Go).