Computer Architecture

Spectre & Meltdown

Q: What is the difference between Spectre and Meltdown?

Meltdown (CVE-2017-5754) exploits out-of-order execution to read kernel memory directly across the user–kernel privilege boundary, because the data load is allowed to proceed transiently before the permission fault retires. Spectre (CVE-2017-5753 and CVE-2017-5715) exploits speculative execution past a mispredicted branch to make a victim access its own out-of-bounds or attacker-chosen memory. Meltdown crosses a privilege boundary by itself; Spectre tricks an authorized victim into leaking its own data.

Q: How does the cache leak the secret if the speculation is rolled back?

Architectural state — registers and memory — is rolled back when the CPU squashes the transient instructions. Microarchitectural state is not. The transient code uses the secret byte as an index into an attacker array, touching one cache line out of 256. That line stays in cache after the rollback. The attacker then times reads of all 256 lines; the one that loads in ~50 cycles instead of ~200 reveals the secret byte. This recovery step is the Flush+Reload covert channel.

Q: How fast can Spectre actually leak data?

The original 2018 papers reported up to around 503 KB/s for Meltdown and roughly 10 KB/s for Spectre variant 1, at low error rates after error correction. These are slow compared to legitimate memory bandwidth (tens of GB/s), but more than fast enough to extract an SSH key, a password, or a browser session token within seconds to minutes.

Q: Why does the Flush+Reload channel need exactly 256 cache lines spaced 4096 bytes apart?

256 lines cover every possible value of a single 8-bit byte. The 4096-byte (one page) stride guarantees each probe slot lands in a distinct cache set and defeats the hardware prefetcher, which would otherwise pull in adjacent lines and create false positives. Multiplying the secret byte by 4096 turns its value directly into a page offset, so the attacker only has to find which of 256 pages got cached.

Q: Did the fixes really cost performance, and how much?

Yes. KPTI (kernel page-table isolation, the Meltdown fix) adds a TLB-flushing page-table switch on every syscall and interrupt; syscall-heavy workloads saw 5–30% slowdowns, while compute-bound ones were nearly unaffected. Spectre v2 mitigations (retpoline, IBRS/IBPB) cost a few percent on branch-heavy code. Disabling hyper-threading to close cross-thread leaks (recommended for some MDS variants) can cost much more.

Q: Were Spectre and Meltdown fully fixed in software, or did CPUs have to change?

Meltdown was closed in hardware: post-2018 CPUs (Intel from Cascade Lake / Whiskey Lake on, all AMD) no longer forward fault-bound load data to dependent transient instructions, so KPTI is no longer needed there. Spectre variant 1 is fundamentally a software problem — there is no general hardware fix, only targeted barriers like LFENCE and array index masking that developers must add where bounds checks guard secrets. Spectre v2 got both microcode controls and silicon changes (enhanced IBRS).

The CPU runs code it promised to throw away — and the cache remembers what it saw

Spectre and Meltdown are speculative-execution attacks: the CPU runs instructions it later discards, but the discarded work leaves a secret-dependent footprint in the cache that an attacker recovers by timing memory accesses.

DisclosedJan 3, 2018
Meltdown CVE2017-5754
Spectre CVEs2017-5753 / 5715
Covert channelFlush+Reload, 256 lines
Leak rateup to ~500 KB/s (Meltdown)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The trick: read a forbidden byte, then deny you ever did

A modern out-of-order CPU does not wait politely. To keep its execution units busy, it runs instructions ahead of where the program logically is — past unresolved branches, past unfinished permission checks. If a guess turns out wrong, the processor quietly discards the results: registers and memory snap back to where they should have been. As far as the program can tell, nothing happened.

But "nothing happened" is a lie. The speculative work touched the memory system, and the memory system has a long memory. A cache line that was pulled in during the discarded run stays warm. Spectre and Meltdown turn that residue into a readout. The attack runs in two halves: a transient half that secretly accesses memory it should never be allowed to read, and a recovery half that times the cache to figure out which byte the transient half saw.

The genius — and the horror — is that the two halves communicate through the cache, a side channel that no privilege check guards. The CPU rolls back the architectural state (the values you can name in assembly) but not the microarchitectural state (which lines are cached, which branches are predicted). That gap is the whole bug.

The precise mechanism

Both attacks share the same exfiltration template. The transient code computes:

secret = *forbidden_address;        // (1) a load that should fault or never happen
index  = secret & 0xFF;             // isolate one byte (0..255)
dummy  = probe_array[index * 4096]; // (2) cache exactly one line, chosen by the secret

Line (2) is the encoding. probe_array is 256 × 4096 = 1 MiB, one page per possible byte value. Multiplying by 4096 spreads the 256 candidate lines one page apart, so each lands in a distinct cache set and the hardware prefetcher can't accidentally pull in a neighbour. After the CPU squashes the speculation, exactly one of those 256 pages is sitting in the L1/L2 cache — the page whose index equals the stolen byte.

The recovery half — the Flush+Reload covert channel — runs in normal, non-speculative code:

for v in 0..256:
    t0 = rdtscp()
    touch probe_array[v * 4096]    // load it
    dt[v] = rdtscp() - t0           // how long did it take?
// the v with the smallest dt was already cached → that's the secret byte

A cached load returns in roughly 40–70 cycles; a DRAM fetch takes 200–300 cycles. The gap is enormous and unambiguous. Repeat for each byte of the secret region and you have read memory you were never authorized to touch.

What makes the forbidden load succeed differs by attack.

Meltdown reads a kernel address from user mode. Normally a user load of a kernel page raises a fault. But on vulnerable Intel cores, the permission check and the data forwarding happen in parallel: the load delivers the data to dependent instructions transiently, a few cycles before the fault is architecturally raised and the pipeline flushed. Those few cycles are enough to run lines (1)–(2). The fault is then delivered (and the attacker catches it, or hides it behind transactional memory or another mispredicted branch), but the cache footprint is already set.

Spectre variant 1 (bounds-check bypass) never crosses a privilege boundary directly. Instead it poisons a victim's branch predictor. Consider a victim function:

if (x < array1_size)             // bounds check
    y = probe_array[array1[x] * 4096];

Train the predictor by calling with many in-bounds x values so it learns "the branch is taken." Then pass a huge out-of-bounds x. The CPU speculatively assumes the branch is taken — before array1_size is even fetched from memory — reads array1[x] out of bounds, and uses it to index probe_array. The bounds check eventually fails and the speculation unwinds, but the cache footprint of the out-of-bounds byte remains.

Spectre variant 2 (branch target injection) poisons the indirect-branch predictor (BTB) so that a victim's indirect jmp/call speculatively lands on an attacker-chosen "gadget" inside the victim's own address space — a snippet that performs the leak. This is the variant that needed microcode (IBRS/IBPB) and retpolines.

Where this matters — and who is exposed

Shared hardware. Cloud VMs, browser tabs, and containers all run untrusted code on the same physical core. Spectre lets a tenant read across the isolation boundary the hypervisor or browser assumed was solid.
JavaScript in the browser. The 2018 proof-of-concept ran Spectre v1 from inside a web page, reading the browser process's own memory. This is why browsers reduced timer resolution (performance.now() was coarsened to 100 µs, then 5 µs) and disabled SharedArrayBuffer by default.
Kernel and hypervisor boundaries. Meltdown and the later L1TF / MDS family let unprivileged code read kernel, SMM, or sibling-hyperthread data.
Anywhere a secret is "protected" only by a bounds check or a permission bit. Speculation runs past both before they resolve.

It does not matter for air-gapped single-tenant machines running only trusted code — there's no attacker to run the gadget. The mitigations have real cost, so threat-model honestly before paying for all of them.

The transient-execution attack family

	Meltdown (v3)	Spectre v1	Spectre v2	L1TF / Foreshadow	MDS / RIDL
CVE	2017-5754	2017-5753	2017-5715	2018-3615/620/646	2018-12126…30
Boundary crossed	User → kernel	In-process / sandbox	Cross-domain via BTB	Across EPT / SGX	Cross-thread buffers
Speculation source	Out-of-order fault delay	Conditional branch	Indirect branch (BTB)	Faulting terminal PTE	Pipeline fill buffers
Covert channel	Flush+Reload	Flush+Reload	Flush+Reload	Flush+Reload	Flush+Reload
Primary fix	KPTI + HW (no fwd)	LFENCE / index mask	retpoline, IBRS	PTE invert, L1 flush	VERW buffer clear, no-HT
AMD affected?	No	Yes	Yes	No	Partly

The common thread is the bottom row of every column: every one of these uses cache timing as the readout. Kill the covert channel and you kill the leak — but no one has found a way to make the cache forget speculative accesses without crippling performance, so defenses target the speculation instead.

What the numbers actually say

Leak rate. The original papers measured up to ~503 KB/s for Meltdown at a 0.003% error rate, and ~10 KB/s for Spectre v1. Slow next to 50 GB/s of legitimate bandwidth — but an RSA-2048 private key is on the order of a kilobyte, so it falls out in well under a second.
Timing gap. A cache hit is about 4 cycles (L1), ~12 (L2), ~40 (L3); a DRAM miss is 200–300 cycles. The Flush+Reload threshold typically sits around 120–150 cycles, far from either cluster, so misclassification is rare.
Speculation window. Roughly 100–200 instructions can execute transiently before a mispredict is resolved — plenty for a multi-step gadget. The reorder buffer on a Skylake core holds 224 micro-ops.
KPTI overhead. Syscall-bound workloads slowed 5–30%; databases and Redis-style servers saw the worst, compute-bound HPC saw <1%.
Probe array. Exactly 256 × 4096 = 1,048,576 bytes to encode a single byte — one page per possible value, to dodge the prefetcher.

JavaScript: the Flush+Reload recovery half

You cannot run the privileged-load half in a browser today (timers are coarsened and the loads are mitigated), but the logic of the recovery side is the heart of the attack. This models how an attacker turns 256 timing samples into one stolen byte:

const PAGE = 4096;
const N = 256;                       // one slot per possible byte value
const HIT_THRESHOLD = 120;           // cycles: below = was cached

// probeArray spans N pages; only the secret's page is "warm" after speculation
function recoverByte(timeAccess, flushAll, runVictimGadget) {
  const score = new Array(N).fill(0);

  // Repeat to beat noise — a single trial is unreliable.
  for (let trial = 0; trial < 1000; trial++) {
    flushAll();                      // evict all N probe slots from cache
    runVictimGadget();               // transient load caches ONE slot

    for (let v = 0; v < N; v++) {
      const dt = timeAccess(v * PAGE); // rdtscp around the load
      if (dt < HIT_THRESHOLD) score[v]++;
    }
  }

  // The slot cached most often across trials is the secret byte.
  let best = 0;
  for (let v = 1; v < N; v++) if (score[v] > score[best]) best = v;
  return best;                       // === secret & 0xFF
}

// Read a whole region one byte at a time.
function dumpSecret(len, makeGadgetFor, timeAccess, flushAll) {
  const out = new Uint8Array(len);
  for (let i = 0; i < len; i++) {
    out[i] = recoverByte(timeAccess, flushAll, makeGadgetFor(i));
  }
  return out;
}

Two details carry the whole attack. First, the 1000-trial loop: a single transient run is noisy — interrupts, prefetch, and contention all corrupt it — so the byte that wins by majority vote is the answer. Second, v * PAGE not v: spacing slots a full page apart is what defeats the prefetcher and keeps each slot in its own cache set.

Python pseudocode: the Spectre v1 gadget

Python is far too slow and too abstracted to mount a real attack (no clflush, no cycle timer, garbage-collected memory), so treat this as faithful pseudocode for the victim side — the bounds-check-bypass gadget an attacker trains and then abuses:

# Conceptual model — real Spectre v1 is in C/asm with clflush + rdtscp.

PAGE = 4096
probe = bytearray(256 * PAGE)         # attacker-readable channel
array1 = b"safe public data"          # the array the victim guards
array1_size = len(array1)

def victim(x, cpu):
    # The real CPU runs the body SPECULATIVELY before this check resolves
    # if its branch predictor has been trained to expect "taken".
    if x < array1_size:               # bounds check — bypassed transiently
        secret_byte = array1[x]       # x is out of bounds during the attack
        _ = probe[secret_byte * PAGE] # caches one of 256 pages
    # On a real core the out-of-bounds path executes transiently, then unwinds.

def attack(target_offset, cpu):
    # 1. TRAIN: many in-bounds calls teach the predictor "branch is taken".
    for _ in range(30):
        victim(target_offset % array1_size, cpu)
    cpu.flush(array1_size_address)    # make the real check SLOW to resolve...
    # 2. EXPLOIT: out-of-bounds index; CPU speculates past the stale check.
    victim(target_offset, cpu)        # target_offset >= array1_size
    # 3. RECOVER via Flush+Reload over the 256 probe pages (see JS above).
    return flush_reload(probe)

The training/exploit alternation is essential: you must first convince the predictor the branch is reliably taken, then evict array1_size so the real comparison stalls on a DRAM fetch — widening the speculation window long enough for the gadget to finish before the misprediction is caught.

Variants and mitigations worth knowing

Flush+Reload vs Prime+Probe. Flush+Reload needs shared memory (e.g. a shared library page) and uses clflush. When memory isn't shared, attackers use Prime+Probe: fill a cache set with your own lines, let the victim evict some, then time which of yours got pushed out. Slower and noisier, but needs no shared page.

KPTI / KAISER. The Meltdown software fix unmaps almost all kernel pages from the user-mode page table, so there's no kernel address for the transient load to reach. The cost is a page-table switch (and TLB churn) on every syscall — the source of the headline slowdown.

Retpoline. The Spectre v2 software fix replaces indirect branches with a ret-based trampoline that traps speculation in an infinite loop instead of letting the BTB steer it to an attacker gadget. Largely superseded by enhanced IBRS in silicon on newer cores.

LFENCE and index masking. For Spectre v1 there is no blanket fix — you insert an LFENCE serializing barrier after a bounds check, or mask the index (x &= (size - 1) for power-of-two sizes; array_index_nospec() in the Linux kernel) so an out-of-bounds index speculatively clamps to a safe value.

The wider family. Foreshadow/L1TF (reads the L1 cache across SGX and VM boundaries), MDS/RIDL/Fallout/ZombieLoad (read stale data from internal pipeline buffers, cleared with a VERW instruction), and later Retbleed and Downfall all reuse the same speculate-then-time-the-cache template against different microarchitectural structures.

Common pitfalls and misconceptions

"The rollback undoes everything." It undoes architectural state only. Cache occupancy, TLB entries, and branch-predictor history survive — that's the leak. This is the single most common misunderstanding.
Confusing the two halves. The transient half doesn't read the secret out — it only stamps the cache. The non-speculative recovery half does the reading. Forgetting this makes the attack seem impossible ("how do you get the value back if it's discarded?").
Spacing probe slots by less than a page. Use a stride under 4096 and the hardware prefetcher loads adjacent slots, producing false hits and an unreadable signal. The page stride is not arbitrary.
Treating Spectre as patchable like Meltdown. Meltdown got a clean hardware fix. Spectre v1 is inherent to speculation; it can only be mitigated case-by-case where a check guards a secret. Assuming a microcode update "fixed Spectre" is wrong.
Trusting constant-time crypto alone. Constant-time code defeats classic timing channels, but Spectre can make the CPU speculatively run a non-constant-time path the source never intended. Speculation can violate the very invariant the code was written to hold.
Ignoring noise. One transient run rarely produces a clean read. Real exploits average hundreds to thousands of trials per byte and discard slots that the OS scheduler or interrupts polluted.

Frequently asked questions

What is the difference between Spectre and Meltdown?

Meltdown (CVE-2017-5754) exploits out-of-order execution to read kernel memory directly across the user–kernel privilege boundary, because the data load is allowed to proceed transiently before the permission fault retires. Spectre (CVE-2017-5753 and CVE-2017-5715) exploits speculative execution past a mispredicted branch to make a victim access its own out-of-bounds or attacker-chosen memory. Meltdown crosses a privilege boundary by itself; Spectre tricks an authorized victim into leaking its own data.

How does the cache leak the secret if the speculation is rolled back?