A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Intermediate Code Representation

Graduate Depth 92 in the knowledge graph ☐ I know this ☆ Set as goal

34topics build on this

502prerequisites beneath it

Abstract Syntax Trees (ASTs)Semantic Analysis Phase→→Basic Block Analysis Bytecode Intermediate Representation and Virtual Machines +5 more

intermediate-representation ir compilation-phases

Core Idea

Intermediate representation (IR) is an abstraction between source and target languages. Common forms include three-address code (TAC), register-transfer language (RTL), and bytecode. IR simplifies optimization and retargeting: optimize once on IR, then generate code for multiple targets. IR abstracts away source-language details and target-machine specifics, enabling machine-independent transformations.

Explainer

After semantic analysis, you have an AST annotated with types and scope information — a tree that faithfully represents the structure of the source program. But an AST is a poor target for optimization and code generation: its structure mirrors the source language's syntax, not the machine's execution model, and tree transformations are awkward for the linear, instruction-by-instruction reasoning that optimization requires. Intermediate representation is the bridge: a language-neutral, machine-neutral format that is low-level enough to reason about execution but high-level enough to support powerful transformations before committing to any specific target architecture.

The most common IR form is three-address code (TAC), where every instruction has at most one operator and up to three operands: `t1 = a + b`, `t2 = t1 * c`, `if t2 > 0 goto L1`. Complex source expressions are decomposed into sequences of simple operations using temporary variables. The expression `a + b * c - d` becomes something like `t1 = b * c; t2 = a + t1; t3 = t2 - d`. This flat, explicit form makes data flow visible — you can see exactly which temporaries feed into which operations — and is easy to analyze for optimization. Control flow constructs like loops and conditionals become explicit labels and goto instructions, making the control flow graph straightforward to extract.

The strategic value of IR is the m × n problem. Without IR, supporting m source languages and n target machines requires m × n separate translators. With a common IR, you need only m frontends (source → IR) and n backends (IR → machine code), for m + n total components. This is why LLVM's IR is so influential: any language frontend that emits LLVM IR gets access to LLVM's entire suite of optimizations and all its target backends, from x86 to ARM to WebAssembly. The same principle applies at a smaller scale within a single compiler — machine-independent optimizations like constant folding, dead code elimination, and common subexpression elimination are written once on the IR and apply regardless of what source language produced it or what target will consume it.

Different compilers use IRs at different abstraction levels, and some use multiple IR levels. A high-level IR might preserve loop structure and array indexing; a low-level IR might expose individual memory loads, stores, and register-like temporaries. Static Single Assignment (SSA) form — where each variable is assigned exactly once — is a particularly powerful IR variant that simplifies many optimization analyses by making data flow explicit. Bytecode formats like the JVM's or Python's are also IRs, interpreted by a virtual machine rather than compiled to hardware. The choice of IR shapes what optimizations are easy to express: design the right intermediate language, and the optimizations almost write themselves.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Finite State Machines (FSMs) → Deterministic Finite Automata (DFA) → Nondeterministic Finite Automata (NFA) → Two-Way Finite Automata → NFA to DFA Conversion (Subset Construction) → DFA Properties and Minimization Algorithms → Regular Languages: Definition and Characterization → Context-Free Grammars (CFGs) → Context-Free Grammar Properties and Ambiguity → Parse Trees, Derivations, and Ambiguity in CFGs → Context-Free Grammars in Compiler Design → Abstract Syntax Trees (ASTs) → Symbol Tables and Scope Resolution → Semantic Analysis Phase → Intermediate Code Representation

Longest path: 93 steps · 502 total prerequisite topics

Prerequisites (2)

Semantic Analysis Phasehard Abstract Syntax Trees (ASTs)hard

Leads To (7)

Basic Block Analysishard Bytecode Intermediate Representation and Virtual Machineshard Code Generation from IRhard Control Flow Graphshard Multi-Stage Programming and Staged Compilationhard Partial Evaluation and Program Specializationhard Static Single Assignment (SSA) Formhard