Security
Stack Canaries
A tripwire word that dies before your return address does
A stack canary is a secret value placed just before the saved return address; on a buffer overflow the canary is overwritten first, so the function checks it before returning and aborts the process if it changed.
- InventedStackGuard, 1998
- Canary size8 bytes (x86-64)
- Per-call cost~3–5 instructions
- Runtime overhead< 1–3%
- CatchesContiguous overflows
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The idea: a value that has to die first
A classic stack buffer overflow works because two things that should never be neighbors are: your function's local char buf[64] and the saved return address that tells the CPU where to jump when the function finishes. Write past the end of buf — with gets, strcpy, an off-by-one loop — and you keep climbing through the frame until you reach that return address. Overwrite it with the address of your own code and the ret instruction hands the CPU straight to the attacker. This is the bug that powered the 1988 Morris Worm and most of the 1990s.
The stack canary, introduced by Crispin Cowan's StackGuard in 1998 (named for the canaries miners carried into coal shafts to detect deadly gas), inserts a sentinel word in between. On function entry the compiler writes a secret value — the canary — into the frame, positioned just below the saved frame pointer and return address. On the way out, just before ret, it reloads the canary and compares it to the known-good copy. If they match, the function returns normally. If they differ, something wrote over it, so the program calls __stack_chk_fail and aborts — loudly, immediately, before the poisoned return address can be used.
The geometry is the whole trick. A contiguous overflow can't skip the canary: to reach the return address it has to write through the canary, leaving a fingerprint. The check turns "silent corruption that runs attacker code" into "a clean crash."
The precise mechanism
On Linux/x86-64, the canary lives in thread-local storage at %fs:0x28. The compiler emits a prologue and epilogue around protected functions:
; prologue — stash the canary into the frame
mov rax, QWORD PTR fs:0x28 ; load master canary from TLS
mov QWORD PTR [rbp-0x8], rax ; store it just below saved RBP / return addr
xor eax, eax ; scrub the register so it never leaks
; ... function body, including the vulnerable buffer ...
; epilogue — verify before returning
mov rax, QWORD PTR [rbp-0x8] ; reload the frame copy
sub rax, QWORD PTR fs:0x28 ; compare against the master
jne .L_fail ; mismatch → stack was smashed
leave
ret
.L_fail:
call __stack_chk_fail ; prints "stack smashing detected", aborts
Three properties make this robust. First, the master canary is per-thread and random: glibc seeds fs:0x28 at process startup from the kernel's AT_RANDOM auxiliary vector (16 bytes of entropy supplied by the loader), so it differs every run and across threads. Second, the low byte is forced to 0x00 — a NUL — so that string-based overflows can't copy the canary intact (most string functions stop at NUL), and so leaks via string reads are truncated. Third, the prologue zeroes the register after storing the canary so it doesn't linger somewhere the attacker can read.
The cost is constant per protected call: a load + store on entry, a load + compare + branch on exit, and one extra 8-byte slot in the frame. That's O(1) overhead per function call — no scaling with buffer size or input length.
When canaries help — and when they don't
- They catch contiguous stack overflows — the overwhelmingly common case where a write runs straight off the end of a local array toward the return address.
- They are nearly free, so there is no reason to ship production C/C++ without them; every major distro enables
-fstack-protector-strongby default. - They do not stop non-contiguous writes. An attacker with an arbitrary-write primitive (e.g. a controlled index
buf[i] = xwith attacker-choseni) can land directly on the return address and jump over the canary untouched. - They do not stop overwriting local pointers below the canary. If a function dereferences or calls through a function pointer that lives in the buffer's frame before it returns, corrupting that pointer hijacks control before the epilogue check ever runs.
- They do not protect the heap — heap overflows and use-after-free are an entirely separate problem (heap metadata hardening, ASan, etc.).
Canaries vs other memory-safety defenses
| Stack canary | ASLR | NX / DEP (W^X) | Shadow stack (CET) | ASan | |
|---|---|---|---|---|---|
| Defends | Return address (contiguous) | All addresses (probabilistic) | Code-vs-data integrity | Return address (any write) | All memory accesses |
| Mechanism | Compare guard word in epilogue | Randomize segment base addresses | Mark stack/heap non-executable | Separate protected return stack | Red zones + shadow memory |
| Catches arbitrary write? | No | Raises the bar only | No (blocks shellcode, not ROP) | Yes (for return addresses) | Yes |
| Runtime cost | < 1–3% | ~0% | ~0% | ~1–2% (HW) | 2–4× slowdown, 2–3× RAM |
| Where | Compiler + libc | Kernel + loader | CPU page bits + OS | Intel CET / ARM PAC | Compiler instrumentation |
| Best for | Production, always-on | Production, always-on | Production, always-on | Production (modern HW) | Testing / fuzzing only |
These are layers, not alternatives. A canary detects the smash, NX stops injected shellcode from running so attackers must reuse existing code (ROP), and ASLR hides where that code lives. Modern hardware shadow stacks finally close the gap canaries left — corruption of the return address by any means is caught on return — but canaries remain the cheapest, most universal first line.
What the numbers actually say
- Brute-forcing a 64-bit canary blind is 1-in-264 ≈ 1 in 1.8 × 1019, and every wrong guess crashes the process. With the forced NUL low byte the guessable entropy is effectively 56 bits — still astronomically out of reach for a single shot.
- But a forking server leaks one byte at a time. A server that
fork()s children inheriting the same canary lets an attacker brute-force it byte-by-byte: at most 256 tries per byte × 8 bytes ≈ 2,048 attempts to recover the whole canary, versus 264 for a blind guess. This is the canonical "CTF" weakness. - Per-call overhead is ~3–5 instructions and 8 bytes of stack, giving measured slowdowns typically under 1–3% — small enough that it's a default, not an opt-in.
- -fstack-protector-all can cost more because it instruments every function, including tiny leaf functions; benchmarks have shown it reaching the high single digits to low double digits of percent on call-heavy code, which is why distros ship
-strong(heuristic) rather than-all.
JavaScript model of the check
You can't really overflow a stack in JavaScript, but you can model the frame layout and the detection logic to see exactly why the order of memory matters.
// Master canary: random, with the low byte forced to NUL (0x00),
// exactly as glibc does on x86-64.
const MASTER_CANARY = (() => {
const r = new Uint8Array(8);
crypto.getRandomValues(r);
r[0] = 0x00; // forced NUL terminator byte
return Array.from(r);
})();
// A simulated stack frame, low address → high address.
// Overflows write upward, in array-index order.
function makeFrame() {
return {
buffer: new Array(8).fill(0), // char buf[8] (lowest)
canary: [...MASTER_CANARY], // the guard word
savedRbp: 0xBADF00D, // saved frame pointer
returnAddr: 0x401136, // where ret jumps (highest)
};
}
// strcpy-style overflow: write bytes contiguously starting in buffer,
// spilling into canary → savedRbp → returnAddr as it overruns.
function overflow(frame, bytes) {
const flat = [
...frame.buffer.map(() => 'B'),
...frame.canary.map(() => 'C'),
'R', 'R', 'R', 'R', // savedRbp + returnAddr region
];
for (let i = 0; i < bytes.length && i < flat.length; i++) {
if (i < 8) frame.buffer[i] = bytes[i];
else if (i < 16) frame.canary[i - 8] = bytes[i];
else frame.returnAddr = 0xDEADBEEF; // attacker-controlled target
}
}
// Epilogue check — runs just before 'ret'.
function checkCanary(frame) {
const ok = frame.canary.every((b, i) => b === MASTER_CANARY[i]);
if (!ok) throw new Error('*** stack smashing detected ***: terminated');
return frame.returnAddr; // safe to return
}
const f = makeFrame();
overflow(f, new Array(40).fill(0x41)); // 40 'A's into an 8-byte buffer
try {
checkCanary(f); // never reached
} catch (e) {
console.error(e.message); // canary was clobbered → abort
}
The key insight the model makes concrete: to corrupt returnAddr with a contiguous write you must pass through canary, so checkCanary always notices before returnAddr is used.
The real thing in C
Here is the vulnerable function and what the compiler injects. Compile with gcc -fstack-protector-strong vuln.c (the default on Debian, Ubuntu, Fedora, and friends).
#include <string.h>
#include <stdio.h>
void greet(const char *name) {
char buf[16];
strcpy(buf, name); // no bounds check — overflow if name > 15 chars
printf("Hello, %s\n", buf);
} // canary checked HERE, before ret
int main(int argc, char **argv) {
if (argc > 1) greet(argv[1]);
return 0;
}
Run it with a short name and it prints normally. Run ./a.out $(python3 -c 'print("A"*64)') and instead of silently jumping to 0x4141414141414141, you get:
*** stack smashing detected ***: terminated
Aborted (core dumped)
The message comes from __stack_chk_fail in libc, which calls __fortify_fail and then __libc_message → abort(). The function never returned through the poisoned address.
Variants worth knowing
Random canary. The modern default: a per-process, per-thread random value with a forced NUL low byte (described above). High entropy; needs an RNG at startup. Used by glibc, musl, the BSDs, Windows /GS.
Terminator canary. A fixed value 0x000aff0d built from NUL, line feed (\n), carriage return (\r), and EOF (0xff) — the bytes that terminate common string copies. No entropy required, which is why early StackGuard used it; defeated by memcpy-style overflows that aren't string-terminated.
Random XOR canary. The canary is XOR-folded with the saved return address (or frame pointer). Now the guard depends on the control-flow data it protects, so an attacker who leaks the canary still can't forge it without also knowing the target return address. ProPolice and Windows /GS with cookie XOR use variants of this.
StackGuard vs ProPolice (SSP). The original StackGuard only guarded the return address. IBM's ProPolice (Stack-Smashing Protector, by Hiroaki Etoh, 2000) added two ideas that survive today: it reorders local variables so that arrays sit above scalars and pointers, so an array overflow can't clobber a pointer the function uses before returning, and it copies pointer arguments below arrays. This reordering is as important as the canary itself.
Hardware shadow stacks. Not a canary, but the successor: Intel CET and ARM's pointer authentication (PAC) keep return-address integrity in hardware and catch corruption by any means, contiguous or not. Where available they are stronger; canaries still run alongside for breadth of coverage.
Common pitfalls and how canaries get bypassed
- Information leak defeats it. A format-string bug (
printf(user_input)) or any read primitive that prints the canary lets the attacker include the correct canary bytes in their overflow, restoring it before the check. The canary is a secret — leaking it removes all protection. - Forking servers enable byte-at-a-time brute force. Children sharing the parent's canary turn 264 into ~2,048 guesses. Re-randomize on
forkor, better, onexec, to stop this. - Unprotected functions. The default heuristic skips functions with no arrays and no address-taken locals. A vulnerable scalar-only function (rare, but possible with manual pointer arithmetic) gets no canary unless you use
-fstack-protector-all. - Pre-return control transfer. If the overflow corrupts a local function pointer, C++ vtable pointer, or
setjmpbuffer that the function uses before it returns, control is hijacked before the epilogue ever checks the canary. - Targeted (non-contiguous) writes. An arbitrary-write primitive can overwrite the return address directly without touching the canary. Canaries assume contiguity; arbitrary writes break that assumption.
- Forgetting it isn't a substitute for bounds checking. A canary turns exploitation into a denial-of-service crash; it does not prevent the overflow. Use bounded APIs (
strlcpy,snprintf,fgets) and a memory-safe language where you can.
Frequently asked questions
Why does the canary sit between the buffers and the return address?
Stacks grow toward lower addresses, but a buffer overflow writes from a low address upward toward higher addresses — past the buffer, past the canary, then over the saved frame pointer and return address. Because the canary is contiguous and sits below the return address, a contiguous overflow cannot reach the return address without first clobbering the canary. Checking the canary in the epilogue therefore catches the attack before the corrupted return address is ever used.
Can an attacker just guess or rewrite the canary?
Guessing a random 64-bit canary blind is a 1-in-2^64 shot, and a wrong guess crashes the process — so a remote attacker effectively can't brute it. The real weaknesses are information leaks (a format-string or read primitive that prints the canary), forking servers that keep the same canary across child crashes (allowing byte-at-a-time brute force), and overwrites that jump over the canary entirely rather than running contiguously through it.
What is the terminator canary and why is it built from null, CR, LF, and EOF bytes?
A terminator canary is the fixed value 0x000aff0d, made of the bytes that terminate common string operations: NUL (\0), line feed (\n), carriage return (\r), and 0xFF (EOF). Because string functions like strcpy and gets stop at these bytes, an attacker overflowing through a string copy cannot write the canary's exact bytes without ending their own copy early — so they cannot restore it. It needs no entropy source, which is why early StackGuard used it, but it is defeated by overflows that use memcpy or length-controlled writes.
Do stack canaries protect every function?
No. By default GCC and Clang only instrument functions the heuristic deems risky. -fstack-protector covers functions with a char array of 8+ bytes or that call alloca; -fstack-protector-strong (the modern default in most distros) widens this to any function with a local array or that takes the address of a local; -fstack-protector-all instruments everything at a runtime cost. A function with only scalar locals and no address-taken variables is typically left unprotected.
What's the difference between a stack canary and a shadow stack?
A canary is a tripwire — it detects corruption after it happens by comparing a guard value in the epilogue, and it only catches contiguous overflows. A shadow stack keeps a separate, protected copy of every return address and compares the real return address against it on return, so it catches any corruption of the return address regardless of how it was written. Intel CET shadow stacks and ARM PAC return-address signing are hardware mechanisms that supersede canaries for return-address integrity, but canaries remain a cheap, ubiquitous first line of defense.
How much does the canary cost at runtime?
Per protected call it adds a load of the canary from thread-local storage and a store into the frame on entry, then a load, compare, and conditional branch to __stack_chk_fail on exit — roughly 3 to 5 instructions and one extra stack word (8 bytes on x86-64). Measured overhead is typically under 1% to 3% on real programs, which is why it is enabled by default across Linux distributions, Android, and the major BSDs.