Security
Return-Oriented Programming (ROP)
Turning a program's own code against it — no shellcode required
Return-oriented programming (ROP) is a code-reuse exploit that chains short instruction sequences already in memory — each ending in 'ret' — so an attacker runs arbitrary logic without injecting a single new byte of code, defeating non-executable-stack defenses.
- IntroducedShacham, 2007
- DefeatsDEP / NX (W^X)
- Building blockgadget → ends in
ret - Computational powerTuring-complete
- Hardware defenseCET shadow stack · ARM PAC
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The trick: don't bring code, borrow it
For decades the classic stack-smashing attack was simple: overflow a buffer, overwrite the saved return address, and point it at shellcode the attacker had pushed onto the stack along with the overflow. The CPU returned straight into the attacker's bytes and started executing them.
Then defenders deployed DEP (Data Execution Prevention), also called NX (the no-execute page bit) or W^X ("write XOR execute"). The rule: a memory page can be writable or executable, never both. The stack and heap became writable-but-not-executable, so injected shellcode triggered a fault the instant the CPU tried to run it. Code injection was dead.
Return-oriented programming is the answer to that defense, and the idea is almost insolent in its simplicity: if you can't bring your own code, use the code that's already there. A normal program links against a large C library — libc on Linux is hundreds of kilobytes of fully executable machine code. Buried in that code are thousands of tiny fragments that happen to end in a ret instruction. Each fragment, called a gadget, does one small useful thing — pop a value into a register, add two registers, write memory through a pointer — and then returns.
ROP strings these fragments together. The attacker still overflows the stack, but instead of writing shellcode they write a list of gadget addresses. Because ret pops the next word off the stack into the instruction pointer, each gadget's final ret launches the next gadget. The stack stops being data and becomes a program. NX is never violated — every instruction executed lives in pages the loader already marked executable. The technique was named and proven Turing-complete by Hovav Shacham in his 2007 paper The Geometry of Innocent Flesh on the Bone, generalizing the older return-into-libc attack from a single function call into an arbitrary computation.
How a chain executes, instruction by instruction
Picture the stack growing downward and the stack pointer rsp pointing at the word the next ret will consume. The attacker's overflow has laid out memory like this (x86-64, addresses increasing downward):
stack (after overflow) what happens
rsp → 0x4011a3 pop rdi; ret ┐ ret #0 jumps here
0x00601060 "/bin/sh\0" ┘ pop loads this into rdi, then ret
0x401080 pop rsi; ret ┐ ret #1 jumps here
0x00000000 ┘ pop loads 0 into rsi, then ret
0x4010d0 pop rdx; ret ┐
0x00000000 ┘ rdx = 0
0x7ffff7e1b2c0 execve → ret #3 jumps into execve(rdi,rsi,rdx)
Trace it. The function we hijacked ends with its own ret. That ret pops 0x4011a3 — the address of a pop rdi; ret gadget — into rip. The pop rdi then consumes the next stack word (the pointer to the string "/bin/sh") and loads it into the first-argument register rdi. The gadget's ret pops the next address and we repeat: clear rsi, clear rdx, then return straight into execve, which now runs execve("/bin/sh", NULL, NULL) and drops a shell.
The "program counter" of a ROP chain is just rsp. Every ret advances it by 8 bytes (one word). A gadget that does pop rdi; ret advances it by 16 — eight for the popped value, eight for the return. Getting that stack arithmetic exactly right is most of the work in building a chain: a single misaligned word and the next ret jumps to garbage.
Where gadgets come from — including ones the compiler never wrote
The supply of gadgets is larger than it looks because x86 instructions are variable-length and unaligned. The CPU will happily start decoding at any byte. Consider this innocent two-byte sequence the compiler emitted as part of a larger instruction:
intended: b8 5f c3 04 08 mov eax, 0x0804c35f
entered at +1: 5f pop rdi
c3 ret
Jump to the second byte instead of the first and the same bytes decode as a brand-new pop rdi; ret gadget the programmer never intended to exist. Shacham's insight was that the c3 byte — the opcode for ret — appears constantly inside larger instructions and immediate values, so scanning a libc backward from every c3 yields a galaxy of usable gadgets. Tools like ROPgadget, ropper, and pwntools' gadget finder automate this search, then a chain compiler assembles the addresses for you.
The complexity of finding gadgets is linear: a scanner makes one backward pass from each ret byte, so for a binary of n bytes containing r return opcodes the scan is O(n) and the gadget catalog is built in O(r · k) where k is the max gadget length the tool searches (typically 5–10 instructions). On fixed-instruction-width architectures like ARM (32-bit) or AArch64, you can't decode mid-instruction, which shrinks the gadget pool — but it's still plenty, and the equivalent jump-oriented programming uses indirect bx/blr branches instead of ret.
Where ROP is the tool of choice (for attackers and defenders alike)
- Bypassing W^X / NX. Any time the target enforces a non-executable stack and heap, code injection is off the table and reuse is the path forward.
- Bootstrapping to a bigger payload. A common chain doesn't try to do everything in ROP. It calls
mprotect()orVirtualProtect()to flip a writable region to executable, then jumps to conventional shellcode there. Three gadgets buy you back full code execution. - Exploit development & CTFs. ROP is the staple of binary-exploitation security competitions and of legitimate red-team / penetration-testing engagements.
- Defensive research. Understanding ROP is how defenders design and benchmark mitigations like CFI and shadow stacks. You can't measure a defense without the attack.
The tradeoffs are real. ROP chains are brittle: they break across compiler versions, libc versions, and any address-space layout change. They are tedious to build by hand. And modern hardware mitigations (below) are specifically designed to detect the one thing every gadget relies on — a ret to an address the program flow never legitimately produced.
ROP vs related code-reuse techniques
| Code injection | ret2libc | ROP | JOP | COP / Counterfeit OOP | |
|---|---|---|---|---|---|
| Injects new code? | Yes (shellcode) | No | No | No | No |
| Defeats DEP/NX? | No | Yes | Yes | Yes | Yes |
| Control transfer | jump to stack | one library call | chained ret | indirect jmp / dispatcher | virtual-call (vtable) |
| Expressive power | arbitrary | a few function calls | Turing-complete | Turing-complete | Turing-complete |
| Stopped by shadow stack? | by NX | partly | Yes | No (no ret) | No |
| Stopped by forward-edge CFI? | — | weakly | weakly | Yes | Yes |
The headline: ret2libc was ROP's predecessor — calling a single library function with attacker-chosen arguments. ROP generalized it to a full computation. Jump-oriented programming (JOP) and counterfeit object-oriented programming (COP) then sidestepped ret-specific defenses by chaining indirect jumps or hijacked C++ virtual calls instead. That's why a complete defense needs both a backward-edge protection (shadow stack, for ret) and a forward-edge one (CFI, for indirect jmp/call).
What the numbers say
- Gadget density. Shacham's original analysis of a standard libc found enough gadgets to build a Turing-complete set within a single ~1.5 MB library; modern tools routinely extract tens of thousands of distinct gadgets from a system libc.
- Minimal chain to a shell. A working
execve("/bin/sh", 0, 0)chain on x86-64 needs about 4–6 gadgets plus the string and the syscall — a payload of well under 100 bytes of addresses. - ASLR entropy. On 32-bit Linux, mmap base randomization gives only ~8 bits of entropy for the library base — roughly 256 possibilities, brute-forceable in seconds via crash-and-retry. On 64-bit, entropy rises to 28–30 bits, which is why a real exploit pairs ROP with an info leak rather than brute force.
- Shadow-stack cost. Intel CET's hardware shadow stack adds essentially no measurable overhead on supported CPUs (a parallel push/pop in microcode), versus software-only CFI schemes that historically cost 1–10%.
JavaScript: a model of how a chain "runs"
You can't perform a real exploit in a browser, but you can model the mechanism exactly: a simulated stack of gadget addresses, a virtual CPU with registers, and a ret that pops the next address. This is the heart of what the interactive visualization animates.
// A gadget catalog: address -> {asm, effect}.
// Each gadget pops 0+ values off the stack, mutates regs, then "ret".
const gadgets = {
0x4011a3: { asm: 'pop rdi; ret', pops: ['rdi'] },
0x401080: { asm: 'pop rsi; ret', pops: ['rsi'] },
0x4010d0: { asm: 'pop rdx; ret', pops: ['rdx'] },
0x401130: { asm: 'syscall; ret', pops: [], syscall: true },
};
// The attacker-controlled stack: a flat list of words.
// Addresses are gadget entry points; everything else is data popped by them.
const stack = [
0x4011a3, 0x00601060, // pop rdi; ret -> rdi = ptr to "/bin/sh"
0x401080, 0x00000000, // pop rsi; ret -> rsi = 0
0x4010d0, 0x00000000, // pop rdx; ret -> rdx = 0
0x401130, // syscall; ret -> execve(rdi, rsi, rdx)
];
function run(stack) {
const reg = { rdi: 0, rsi: 0, rdx: 0, rax: 59 /* execve */ };
let sp = 0; // stack pointer = the ROP "PC"
while (sp < stack.length) {
const addr = stack[sp++]; // ret: pop next address into rip
const g = gadgets[addr];
if (!g) throw new Error('crash: ret to non-gadget ' + addr.toString(16));
for (const r of g.pops) reg[r] = stack[sp++]; // each pop eats a word
if (g.syscall) {
console.log(`execve(rdi=0x${reg.rdi.toString(16)}, ` +
`rsi=${reg.rsi}, rdx=${reg.rdx}) // shell!`);
return reg;
}
}
}
run(stack); // -> execve(rdi=0x601060, rsi=0, rdx=0) // shell!
The crucial line is const addr = stack[sp++]: that single read-and-advance is the ret instruction. Everything else — the registers, the pops, the syscall — is just bookkeeping around it. Whoever controls the stack controls the program counter.
Python: finding gadgets, the way real tools do
The other half of ROP is discovery: scanning a blob of machine code for byte sequences that end in 0xc3 (the x86 ret opcode) and disassembling backward from each one. Here is the core loop, simplified, mirroring what ROPgadget and ropper do under the hood.
RET = 0xC3 # x86 'ret' opcode
MAX_GADGET = 6 # max instructions before the ret
def find_gadgets(code: bytes, base_addr: int, disasm):
"""Yield (address, [instructions]) for every ret-terminated gadget.
`disasm(bytes, addr)` decodes one instruction; we use a capstone-like API."""
gadgets = []
for i, b in enumerate(code):
if b != RET:
continue
ret_addr = base_addr + i
# Walk backward, trying each possible start byte before the ret.
# x86 is variable-length, so any offset might be a valid entry point.
for start in range(max(0, i - MAX_GADGET * 15), i):
insns, off = [], start
ok = True
while off <= i:
ins = disasm(code[off:i + 1], base_addr + off)
if ins is None: # bytes don't decode here
ok = False
break
insns.append(ins)
off += ins.size
if off == i + 1: # landed exactly on the ret: valid gadget
break
else:
ok = False
if ok and insns and insns[-1].mnemonic == 'ret':
gadgets.append((base_addr + start, insns))
return gadgets
# A chain is then just the list of gadget addresses (+ data) the attacker
# writes onto the stack, assembled by a solver that wants e.g. rdi = X.
def build_chain(catalog, want):
"""want = {'rdi': str_addr, 'rsi': 0, 'rdx': 0}; returns packed stack words."""
chain = []
for reg, val in want.items():
pop = catalog[f'pop {reg}; ret'] # look up a gadget that pops this reg
chain += [pop, val] # gadget address, then the value it pops
chain.append(catalog['syscall; ret'])
return chain
The backward walk is what surfaces unaligned gadgets: by trying every start offset before each ret, the scanner discovers the mid-instruction gadgets the compiler never emitted on purpose. Real solvers go further, using a small constraint engine to pick gadgets that set the registers an attacker wants while minimizing side effects (a stray pop that clobbers a register you already set).
Variants worth knowing
ret2libc. The ancestor. Overwrite the return address with the address of a libc function like system() and arrange its argument on the stack. One call, not a chain — limited but often enough.
Jump-oriented programming (JOP). Replaces ret-terminated gadgets with ones ending in indirect jumps, driven by a "dispatcher" gadget that walks a table of target addresses. It sidesteps any defense that only watches return instructions — including shadow stacks.
Sigreturn-oriented programming (SROP). Abuses the kernel's signal-return path: a single sigreturn syscall restores all registers from a stack frame the attacker forged, setting the entire CPU state in one shot. Extremely compact — sometimes a one-gadget exploit.
Blind ROP (BROP). Builds a chain against a remote server with no copy of the binary, by observing whether each guessed gadget crashes the process or not — turning crash/no-crash into an oracle that leaks the gadget layout one byte at a time.
Counterfeit object-oriented programming (COP). Forges C++ objects with malicious vtables so that legitimate virtual calls dispatch into gadgets, attacking the forward edge instead of the return edge.
Common pitfalls and edge cases
- Stack misalignment. The x86-64 ABI requires 16-byte stack alignment before a
call. Modern libc functions use SSE instructions that fault on a misaligned stack — so chains often need a "stack-alignment gadget" (a bareret) inserted to nudgerspby 8. - Bad bytes in the payload. If the overflow goes through
strcpy, any null byte truncates it; throughscanf("%s"), whitespace ends it. Addresses containing forbidden bytes must be avoided or written with arithmetic gadgets. - Forgetting side effects. A gadget that also does
pop rbxon the way to itsretconsumes an extra stack word; omit the filler and every subsequent address shifts and the chain derails. - Assuming static addresses under ASLR. Hardcoding gadget addresses works only with ASLR off or a leak in hand. Without a leak, the chain lands at a random location and crashes.
- Ignoring the shadow stack. On a CET-enabled CPU, the very first gadget's
retmismatches the shadow copy and the process is killed — your beautiful chain never reaches gadget two. - Treating ROP as malware-only. The same primitive underpins legitimate red-team testing, mitigation benchmarking, and academic security research. Understanding it is defensive, not just offensive.
Frequently asked questions
How does ROP get around a non-executable stack?
DEP/NX marks the stack and heap as non-executable, so injected shellcode never runs. ROP never injects code — it only writes addresses. Every gadget it jumps to lives in code the program already mapped as executable (libc, the main binary), so the no-execute bit is never violated.
What exactly is a gadget?
A gadget is a short sequence of machine instructions that ends in a 'ret' (or another indirect branch). Typical gadgets are two to four instructions long, like 'pop rdi; ret' or 'mov [rdi], rax; ret'. Because x86 has variable-length, unaligned instructions, gadgets can also be found mid-instruction — bytes that mean one thing when decoded normally and another when entered at a different offset.
Why does every gadget end in 'ret'?
'ret' pops the next value off the stack into the instruction pointer. Since the attacker controls the stack, each 'ret' hands control to the next address the attacker placed there. The stack becomes a program: a list of gadget addresses that 'ret' walks through one by one, like a player piano reading a paper roll.
Is ROP Turing-complete?
Yes. Shacham's 2007 paper proved that the gadgets found in a standard C library (libc on Linux, or even just libc-equivalent code) form a Turing-complete instruction set — you can build loads, stores, arithmetic, and conditional branches entirely from existing 'ret'-terminated sequences. In practice attackers rarely need full computation; a short chain that calls mprotect or execve is enough.
Does ASLR stop ROP?
ASLR randomizes where libraries load, so gadget addresses aren't known in advance — but it doesn't stop ROP, it just adds a step. Attackers defeat it with an information leak (a format-string bug or an out-of-bounds read) that discloses one runtime address, then compute every gadget offset from it. On 32-bit systems the entropy is low enough to brute-force.
What actually defends against ROP today?
Control-flow integrity (CFI) and hardware shadow stacks. Intel CET's shadow stack keeps a protected copy of return addresses and faults if a 'ret' target doesn't match. ARM's pointer authentication (PAC) signs return addresses with a key, so a forged stack value fails verification. Both make ROP chains far harder to land than DEP and ASLR alone.