Compilers
JIT Compilation
Compile at runtime — specialize on what actually runs
Just-in-Time compilation converts bytecode or IR to native machine code at runtime. Tiered pipelines — interpreter, baseline JIT, optimizing JIT — escalate as code gets hotter. Deoptimize when speculations fail. V8, HotSpot, LuaJIT, .NET.
- When compilation happensAt runtime, on hot code paths
- Typical tier count3–4 (interpreter → baseline → optimizing → super-optimizing)
- Compile cost (baseline)~10× interpreter speed (V8 Sparkplug)
- Compile cost (optimizing)100–1000× interpreter — 5–50 ms per function
- Deopt cost10–100 µs per occurrence
- Used byV8, HotSpot, LuaJIT, .NET CoreCLR, PyPy, Truffle
Interactive visualization
Watch a function climb the tier ladder — interpreter, then baseline JIT, then optimizing JIT — and deopt back when a speculation breaks.
How JIT compilation works
A function is born cold. The first time it runs, the runtime has no profile data — no idea which branches dominate, what types flow through, which calls inline well. So it runs the function in the interpreter: walk the bytecode one operation at a time. Slow, but instantly available.
As the function runs more times, the runtime increments a counter and records type observations. When the counter crosses a threshold (1000–10000 calls for V8; 10000 by default for HotSpot), the JIT compiles the function. Two things happen:
- Compile to native code. Generate x86-64 / ARM64 instructions that execute directly on hardware. Skip the interpreter's per-bytecode dispatch.
- Specialize on observed types. If the function always receives integers, inline integer arithmetic; if always strings, inline string ops. This is profile-guided optimization for free.
But the JIT isn't all-or-nothing. Real systems have multiple tiers — each tier costs more compile time and produces faster code. Hot code climbs the ladder; lukewarm code stays in a cheap tier.
Tiered compilation
V8's tiers, as of 2024:
| Tier | What it does | Compile cost | Runtime speed (vs interp) |
|---|---|---|---|
| Ignition | Interpret bytecode directly | 0 (no compile) | 1× baseline |
| Sparkplug | Non-optimizing baseline JIT — bytecode-to-native template | ~10× interpret speed | 2–5× faster than interp |
| Maglev | Mid-tier optimizing JIT — SSA-based, lighter passes | ~100× | 20–40× faster |
| TurboFan | Full optimizing JIT — escape analysis, inlining, LICM, SSA, etc. | ~1000× | 50–200× faster |
HotSpot's tiers:
| Tier | What it does | Compile cost | Runtime speed |
|---|---|---|---|
| Interpreter | Bytecode walker | 0 | 1× |
| C1 (Tier 1–3) | Client compiler — fast compile, basic optimizations, profile-collecting | ~50× interpret | 5–15× faster |
| C2 (Tier 4) | Server compiler — full SSA-based optimizing JIT, aggressive inlining | ~500× | 30–100× faster |
The threshold to promote from one tier to the next is itself dynamic. HotSpot's default is 10000 invocations or 14000 back-edge iterations for C2 promotion. V8 promotes much faster — Sparkplug after just one execution; TurboFan after ~1500 ticks.
Speculation and deoptimization
An optimizing JIT speculates on observed types and runtime conditions. Examples:
- Hidden class checks. V8 observes
obj.xis always a small integer at a specific shape. The compiled code assumes that and skips the type check. - Devirtualization. A virtual method call always reaches one implementation. The JIT inlines that implementation directly, bypassing the vtable lookup.
- Loop bounds. Iterations don't escape the array bounds. The JIT omits the bounds check inside the loop body.
- No exceptions. A function never throws in observed runs. The JIT omits exception-handler tables that aren't reachable from optimized code.
When a speculation fails — a different shape arrives, a new method override loads, an exception occurs — the compiled code is no longer valid for the new situation. The JIT deoptimizes: aborts the optimized code mid-execution, reconstructs the interpreter's state from the optimized frame, and resumes in the interpreter. The optimized code is invalidated, the profile is updated, and a fresh compile may follow.
Deopts cost 10–100 µs per occurrence. A function that deopts dozens of times per second has its performance dominated by deopt overhead — the dreaded "deopt loop." Workarounds: stabilize observed shapes (always pass same-typed arguments), avoid polymorphism at hot call sites, simplify try/catch around hot code.
JIT vs AOT vs interpreter
| JIT (V8, HotSpot) | AOT (gcc, rustc) | Pure interpreter (early Python, BASIC) | |
|---|---|---|---|
| When compilation happens | At runtime, on hot paths | Once, before execution | Never — executes source/bytecode directly |
| Startup speed | Slow — interpreter while warming up | Fast — binary loads and runs | Instant |
| Steady-state speed | Fast — profile-specialized native code | Fast — static optimizations | Slow — 10–50× slower than native |
| Memory footprint | Large — JIT in process, code caches | Small — only the binary | Tiny — just the interpreter |
| Profile-guided optimization | Free — automatic at runtime | Optional — PGO build with sample run required | N/A |
| Reflection / dynamic loading | Excellent — recompile new code as it arrives | Limited — would require runtime compiler | Excellent |
| Predictable latency | Worse — deopt and GC pauses | Better — no runtime surprises | Best — slow but steady |
| Typical workload | Server, browser, JVM, .NET | Native applications, embedded, kernels | Scripting (historical) |
Real systems blur these. Microsoft's .NET CoreCLR has ReadyToRun (AOT) plus tiered JIT (runtime). Java has GraalVM Native Image (AOT) plus HotSpot (JIT) — same source code, two deployment shapes. Even Rust experimentally explores JIT for Cranelift in some workloads.
Trace-based JITs (LuaJIT, TraceMonkey)
An alternative shape. Instead of compiling functions, compile traces — sequences of bytecode operations that execute in a row, often crossing function boundaries. LuaJIT records what happens on a hot loop iteration, including which branches were taken and which functions were called, and compiles that exact path. Guards inserted at every type check and branch — if the next iteration diverges from the recorded trace, fall back to interpreter or record a new trace.
Trace-based JITs excel at numerical loops with predictable behavior — LuaJIT regularly matches C performance on benchmarks like SciMark. They struggle on branchy, polymorphic code where every iteration is different. Mozilla's TraceMonkey (early Firefox JavaScript engine) demonstrated trace-based works for JavaScript, but the team switched to method-based for better polymorphism handling.
When JIT details matter for you
- Benchmarking JIT'd code. Microbenchmarks must include a warm-up phase. JMH (Java Microbenchmark Harness) defaults to 30 seconds of warm-up. Without warm-up, you're measuring the interpreter, not your code.
- Memory-budgeted environments. JIT code caches consume 50–500 MB depending on workload. Lambda functions, IoT devices, mobile apps may prefer AOT to skip the JIT memory tax.
- Latency-sensitive code. Deopts and GC pauses cause tail-latency spikes. Real-time trading systems often use AOT (GraalVM Native Image) to eliminate the JIT variance.
- Debugging "fast code that occasionally is slow." Almost always a deopt. Use V8's
--trace-deoptor HotSpot's-XX:+PrintCompilation -XX:+PrintInliningto see what the JIT is doing and when it invalidates.
V8 deopt example — JavaScript
// V8 JIT walkthrough — paste into Chrome devtools
function sum(arr) {
let total = 0;
for (let i = 0; i < arr.length; i++) {
total += arr[i];
}
return total;
}
// Run hot with integer array — TurboFan specializes for Smi
const ints = Array.from({ length: 1000 }, (_, i) => i);
for (let k = 0; k < 100000; k++) sum(ints);
// After ~1500 ticks: TurboFan compiles sum with Smi-specialized arithmetic
// total is an i32, arr[i] is an Smi, no boxing, no type check
// Now feed it a double — speculation breaks
const doubles = Array.from({ length: 1000 }, (_, i) => i + 0.5);
sum(doubles);
// V8 deopts — arr[i] is no longer Smi. Falls back to Ignition.
// Re-profile, eventually recompile with HeapNumber-tolerant code.
// Steady-state runs 3-5× slower than the Smi-only version.
Run this with node --trace-opt --trace-deopt sum.js and you'll see TurboFan announce "compiled sum" once it warms up, then "deoptimizing sum" when the double array arrives, then a re-compile a few iterations later. The deopt pattern is exactly why mono-morphic call sites (always-same-shape) are V8's performance sweet spot.
Performance numbers and trade-offs
- V8 Ignition (interpreter): ~100 million ops/sec on integer-heavy code.
- V8 Sparkplug (baseline): ~3–5× faster than Ignition. Compiled in 10–50 µs per function.
- V8 TurboFan (optimizing): ~50–200× faster than Ignition on type-stable code. Compile cost 5–50 ms per function.
- HotSpot C1 (Tier 3): ~10× faster than interpreter. Compile cost ~10 ms per function.
- HotSpot C2 (Tier 4): ~50–100× faster than interpreter. Compile cost 100–500 ms per function — but only for the very hottest methods.
- LuaJIT: matches C performance on tight numerical loops; 10–50× faster than the standard Lua interpreter.
- Code cache size: V8 default ~64 MB; HotSpot default 240 MB (
-XX:ReservedCodeCacheSize). Exceed it and the JIT throws away cold compiled code to make room.
Common JIT pitfalls
- Polymorphic call sites. A method called with many different types prevents inlining. V8 falls off the "monomorphic" fast path after 1 type, falls off "polymorphic" after 4, becomes "megamorphic" beyond — slowest case. Keep hot call sites mono-typed.
- Try/catch around hot code. Older JITs refused to optimize try/catch. Modern V8 and HotSpot handle it, but the optimized code is often slower than try-free equivalents because of guard overhead.
- Premature warm-up checks. Reading
performance.now()in the first 100 ms of a Node.js script measures Ignition speed, not TurboFan. Add a warm-up loop. - JIT thrashing. Code constantly added and removed (e.g.,
eval-generated functions, regex compilation) blows out the code cache, forcing recompiles. Avoid in hot paths. - Mismatched profile and production. If your dev environment has different shapes than production, the JIT specializes for dev. Watch for "fast in tests, slow in production" — a deopt loop is common.
Frequently asked questions
What is JIT compilation?
Just-in-Time compilation generates native machine code at runtime from a higher-level representation — bytecode, IR, or even source. Instead of compiling the whole program ahead of time (AOT), a JIT only compiles code paths that actually execute, exploits runtime profiling information to specialize, and re-compiles when assumptions change. V8 (Chrome's JavaScript engine), HotSpot (Java), LuaJIT, and .NET's CLR all use JITs.
What's tiered compilation?
Multiple JIT compilers at different optimization levels, escalating as code becomes hot. V8 has Ignition (interpreter) → Sparkplug (non-optimizing baseline JIT) → Maglev (mid-tier) → TurboFan (optimizing). HotSpot has interpreter → C1 (fast compile, low optimization) → C2 (slow compile, aggressive optimization). The trade-off: each tier costs compile time and produces faster code; only invest in tiers where code runs enough times to justify.
What's deoptimization?
When an optimizing JIT's speculation fails — say it inlined a method assuming a specific class but a new instance appears — the compiled code is no longer valid. Deoptimization aborts execution of the optimized code mid-stream, rebuilds the interpreter's state from the optimized frame, and resumes in the interpreter. The optimized code is invalidated; the function may be recompiled later when better profile data is available. Deopts cost roughly 10–100 microseconds per occurrence.
What's on-stack replacement (OSR)?
Compiling a function that's already running. A hot loop inside a long-running function reaches the compilation threshold while the function is mid-execution. OSR replaces the running interpreter frame with the JIT-compiled version, preserving live variables and continuing from the corresponding loop iteration in compiled code. Essential for benchmarks that have all their work in one big main loop — without OSR, you'd run the interpreter forever.
Why are JIT-compiled languages often as fast as C?
A good optimizing JIT runs profile-guided optimization for free. Inlining decisions based on actual call frequencies, devirtualization based on observed receivers, branch prediction hints from real-world execution, escape analysis catching cases the static compiler couldn't prove. Within steady-state, V8 and HotSpot regularly match C++ performance on well-suited workloads. The trade-off is startup cost and memory: the JIT compiler is resident in memory and the warm-up period is wasted CPU.
What's the warm-up cost of a JIT?
V8's baseline JIT (Sparkplug) compiles ~10× slower than the interpreter — 50 µs per function for typical functions. TurboFan's optimizing tier compiles 100–1000× slower than the interpreter — 5–50 ms per function. HotSpot's C2 takes hundreds of milliseconds for hot functions. The first few seconds of any JIT-compiled program run at interpreter speed; only after the profile says "this is hot" do you get native code. Microbenchmarks must run long enough to amortize this — JMH (Java) defaults to 30 seconds of warm-up.
What's the difference between V8 and HotSpot?
V8 (JavaScript) leans heavier on speculation because JavaScript has no type annotations — it has to guess types, inline aggressively based on hidden classes, and deopt frequently when guesses fail. HotSpot (Java) has static types from bytecode so it speculates less. Both use tiered compilation, but V8's optimizing tier (TurboFan) does more profile-guided specialization while HotSpot's C2 does more conservative loop optimization. LuaJIT, by contrast, uses trace-based JIT — compile entire hot paths through multiple functions instead of one function at a time.