Bytecode is a compact, machine-independent intermediate representation executed by a virtual machine. The compiler targets bytecode for portability, and the VM interprets it (slow but flexible) or JIT-compiles it to native code (fast). Trade-off between deployment simplicity and runtime performance.
From your study of intermediate code representations, you know that compilers typically lower source code into an IR that is easier to optimize and translate than raw syntax but more abstract than machine code. Bytecode is a specific kind of IR designed not for further compilation but for direct execution by a software interpreter — a virtual machine (VM). Where a traditional compiler's IR is a waypoint on the path to native machine code, bytecode is often the final destination. Java's `.class` files, Python's `.pyc` files, and C#'s Common Intermediate Language are all bytecode formats that run on their respective VMs rather than directly on hardware.
Bytecode instructions resemble machine instructions — load a value, add two numbers, jump to an address — but they target an idealized abstract machine rather than any specific processor. Most bytecode VMs use a stack-based architecture: instead of naming registers, instructions push values onto and pop values off an operand stack. "Add" pops two values, adds them, and pushes the result. This design keeps the bytecode compact (no register operands to encode) and makes the compiler simpler, since it does not need to perform register allocation. Some VMs, like Lua's and Dalvik (Android), use a register-based architecture instead, which produces fewer instructions at the cost of wider encodings. The design choice involves a direct tradeoff: stack bytecode is smaller and simpler to emit, register bytecode executes fewer instructions per operation.
The simplest VM implementation is a bytecode interpreter, typically structured as a loop with a large switch statement: fetch the next instruction, dispatch to the appropriate case, execute it, repeat. This is portable — the same bytecode runs on any platform with a VM implementation — but slow, because every bytecode instruction incurs the overhead of the fetch-decode-dispatch loop. Measured against native code, pure interpretation is typically 10–100× slower. This is where your knowledge of JIT compilation becomes essential. A JIT compiler monitors which bytecode functions execute frequently ("hot" functions) and compiles them to native machine code at runtime. The first few executions of a function are interpreted (fast startup), but once the JIT kicks in, subsequent calls run at near-native speed. This gives bytecode VMs the portability of interpretation with performance approaching ahead-of-time compilation.
Modern VMs combine interpretation, JIT compilation, and runtime profiling into a tiered system. The V8 engine (JavaScript) starts with a fast interpreter (Ignition), profiles execution, then JIT-compiles hot paths with an optimizing compiler (TurboFan) that uses the profiling data to make speculative optimizations. If assumptions are violated (a variable that was always an integer suddenly receives a string), the VM deoptimizes — falls back to interpreted bytecode and re-profiles. This adaptive approach means bytecode VMs can sometimes outperform static compilation, because they optimize based on actual runtime behavior rather than conservative static analysis.
No topics depend on this one yet.