Systems

Process Lifecycle: fork, exec, wait

How Unix makes every program — clone, replace, reap

On Unix every new program is born the same way: fork clones the calling process, exec overwrites the clone with a new program image, and wait lets the parent reap the child's exit status — preventing zombies.

  • fork returnstwice (0 / PID)
  • exec returnsnever (on success)
  • Cost of forkcopy page tables only
  • Memory modelcopy-on-write
  • Un-reaped childzombie

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The one idea behind every Unix process

There is no spawn("editor") call on Unix. There is no single system call that says "run this program." Instead, Unix splits the job into two deliberately separate steps — and once you see why, the entire process model clicks into place.

First you fork: the kernel makes a near-perfect copy of the calling process. Now there are two identical processes running the same code, with the same open files, the same memory, the same everything — except their process IDs. Then, in the copy, you exec: you tell the kernel to throw away the program currently running in this process and load a different one in its place. The process keeps its identity (its PID, its open file descriptors, its working directory), but its code and data are completely replaced.

This is Ken Thompson and Dennis Ritchie's design from the early 1970s, and the separation is the whole point. The gap between fork and exec is a window in which the child — still running the parent's code — can rearrange its inheritance: redirect stdout to a file, close descriptors it shouldn't pass on, drop privileges, change directory. The shell uses exactly this window every time you type ls > out.txt. A combined spawn call would have to anticipate every such tweak as a flag; fork-then-exec needs none, because the child is just running ordinary code in between.

The third call, wait, closes the loop. When the child finishes, its exit status has to go somewhere — to the parent that started it. So a finished child lingers in the process table as a zombie until the parent calls wait (or waitpid) to collect that status. Collecting it is called reaping, and it frees the last scrap of the dead process.

The precise mechanism

Walk through what the kernel actually does at each call.

fork(). The kernel allocates a new process structure and a new PID, then duplicates the parent's address space, file-descriptor table, signal dispositions, and current working directory. Crucially the duplication is logical, not physical — see copy-on-write below. fork then returns in both processes:

  • In the parent, it returns the child's PID — a positive integer.
  • In the child, it returns 0.
  • On failure (no memory, hit the PID limit) it returns -1 in the parent, and no child exists.

That single return value is the entire branching mechanism: the same line of code runs in two processes, and each inspects the value to decide who it is.

exec(). Technically a family — execve is the actual system call; execl, execvp, etc. are libc wrappers. exec discards the current text, data, heap, and stack segments and maps in the new program's segments, then jumps to its entry point. Because the old program no longer exists, exec does not return on success — the only time control comes back is on failure (file not found, not executable), where it returns -1. What survives across exec: the PID, the parent, open file descriptors (unless marked close-on-exec), the working directory, and the process's place in its group and session.

wait()/waitpid(). The parent blocks (or polls, with WNOHANG) until a child changes state. On return it gets the child's PID and a status word you decode with macros: WIFEXITED / WEXITSTATUS for a normal exit, WIFSIGNALED / WTERMSIG for a kill. The act of returning that status is what removes the zombie from the table.

Complexity is humble and worth stating exactly. fork with copy-on-write is O(size of the page tables), not O(resident memory): forking a process with a 4 GB heap copies a few megabytes of page-table entries (about 8 MB at 4 KB pages), not 4 GB. exec is O(size of the new program's pages that must be faulted in), and with demand paging even that is deferred until the code runs. wait is O(1).

Why fork doesn't actually copy

The original V6 Unix fork really did copy the whole address space — and on a machine without paging, that meant swapping the parent out to disk and back. It was so wasteful that BSD added vfork, a fork that shares memory and promises the child will immediately exec.

Modern systems solve it with copy-on-write (COW). fork maps the same physical pages into both processes but marks them read-only. As long as both sides only read, they share the same RAM for free. The instant either one writes to a page, the CPU traps, and the kernel makes a private copy of just that one page and lets the write proceed. Pages are copied lazily, one at a time, only when actually mutated.

The payoff is enormous for the common fork-then-exec pattern: the child writes almost nothing before exec throws the whole image away, so almost no copying ever happens. Forking a process is therefore cheap roughly in proportion to its page-table size, independent of how much heap it holds.

When to use which spawning primitive

  • fork + exec — the default. Use it when the child needs to set up its environment (redirect I/O, close descriptors, drop privileges) before running the new program. This is what every shell does.
  • fork alone (no exec) — when you want a second copy of this program: classic prefork servers (Apache, PostgreSQL backends) fork worker processes that keep running the same binary.
  • posix_spawn — a single library call that does fork+exec atomically, with a file-actions list describing the descriptor surgery. Safer in multithreaded programs; often implemented over vfork or clone for speed.
  • vfork — a sharp tool for forking a giant process that will immediately exec. The child borrows the parent's memory and the parent is suspended until exec/exit. Easy to misuse; mostly superseded by COW and posix_spawn.

If your workload is "run an external command and collect output," reach for posix_spawn or a higher-level wrapper. If it's "I am a server and I want isolated worker processes," fork is exactly right.

fork+exec vs the alternatives

fork + execposix_spawnvfork + execCreateProcess (Windows)clone (Linux)
Copies address spaceCOW (lazy)COW / shared, hiddenshared, no copynone — fresh imagetunable by flags
Pre-exec setup windowarbitrary child codefile-actions list onlyvery restrictedparams structarbitrary child code
Returns twiceyesnoyes (child must not return)noyes (process mode)
Multithread-safefragile (locks copied)yesfragileyesfragile in process mode
Cost for huge parentpage-table copypage-table copynear zeroindependent of parentpage-table copy
Typical useshells, daemonsstd library subprocesslegacy embeddedall Windows spawningcontainers, threads

The headline contrast is with Windows. CreateProcess has no fork — it builds a brand-new process directly from an executable plus a parameter block. There is no "run arbitrary code in the child before the program loads" window, so descriptor inheritance and the like are passed as explicit arguments. Unix trades that explicitness for the flexibility of the fork/exec gap; the price is that fork interacts badly with threads.

What the numbers actually say

  • fork latency: on Linux a fork of a small process is on the order of tens of microseconds; with COW it grows with page-table size, so a multi-gigabyte process forks in roughly 100–500 µs, not the seconds a physical copy would take.
  • PID space: the default /proc/sys/kernel/pid_max on 64-bit Linux is 4194304 (2²²). Every un-reaped zombie holds one slot; leak enough and fork starts returning EAGAIN.
  • A zombie's footprint: almost nothing — it has freed its memory, files, and address space. It retains only its PID and a small task struct holding the exit status. The cost of zombies is slot exhaustion, not RAM.
  • COW write amplification: a child that touches one byte forces a copy of the whole page containing it — typically 4 KB. Touching scattered data across a big heap can quietly copy far more than you wrote.
  • Fork bomb: :(){ :|:& };: recursively forks until the PID table is exhausted — the canonical reason production systems cap per-user process counts with RLIMIT_NPROC or cgroups.

JavaScript: modeling the lifecycle

Node.js doesn't expose raw fork(2) — its child_process.spawn is really a posix_spawn-style wrapper. But we can model the state machine and the parent/child handshake explicitly, which is the part people get wrong:

// A faithful model of fork -> exec -> exit -> wait, as a state machine.
const State = { RUNNING: 'running', ZOMBIE: 'zombie', REAPED: 'reaped' };
let nextPid = 1000;
const table = new Map();               // pid -> process record

function fork(parent) {
  const pid = nextPid++;
  // Child is a COW clone: same code/data, new identity, no children yet.
  const child = {
    pid, ppid: parent.pid, state: State.RUNNING,
    image: parent.image, exitStatus: null, children: [],
  };
  table.set(pid, child);
  parent.children.push(pid);
  return { parentSees: pid, childSees: 0 };   // fork "returns twice"
}

function exec(proc, image) {
  // Same PID and fds, brand-new program image. Never "returns" on success.
  proc.image = image;
  return undefined;                    // control does not come back
}

function exit(proc, code) {
  proc.state = State.ZOMBIE;           // body stays until reaped
  proc.exitStatus = code;
}

function wait(parent) {
  // Block until some child is a zombie, then reap exactly one.
  const pid = parent.children.find(p => table.get(p)?.state === State.ZOMBIE);
  if (pid === undefined) return null;  // would block (or WNOHANG -> 0)
  const child = table.get(pid);
  child.state = State.REAPED;
  table.delete(pid);                   // PID slot freed here — not at exit()
  parent.children = parent.children.filter(p => p !== pid);
  return { pid, status: child.exitStatus };
}

// Shell-style: clone, replace program in the child, run it, reap it.
const shell = { pid: 1, ppid: 0, image: 'sh', children: [] };
const { parentSees } = fork(shell);
exec(table.get(parentSees), 'ls');     // child becomes `ls`
exit(table.get(parentSees), 0);        // ls finishes
console.log(wait(shell));              // { pid: 1000, status: 0 } — reaped

The detail worth flagging: the PID slot is freed in wait, not in exit. Skip the wait and the entry sits in the table forever — that is a zombie.

Python: the real system calls

Python's os module wraps the genuine POSIX calls, so this is the canonical fork/exec/wait pattern as it actually runs:

import os, sys

pid = os.fork()            # returns twice: 0 in child, child-PID in parent

if pid == 0:
    # ---- CHILD ----
    # The pre-exec window: rearrange inherited state before the new program.
    log = os.open("out.txt", os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o644)
    os.dup2(log, 1)        # redirect stdout (fd 1) to the file
    os.close(log)
    os.execvp("ls", ["ls", "-l"])   # replace this process with `ls`
    # execvp only returns if it FAILED:
    os.write(2, b"exec failed\n")
    os._exit(127)          # _exit, not exit: skip the parent's atexit/buffers

else:
    # ---- PARENT ----
    reaped_pid, status = os.waitpid(pid, 0)   # block until the child exits
    if os.WIFEXITED(status):
        print(f"child {reaped_pid} exited with {os.WEXITSTATUS(status)}")
    elif os.WIFSIGNALED(status):
        print(f"child {reaped_pid} killed by signal {os.WTERMSIG(status)}")

Three things every beginner trips on. Use os._exit (not sys.exit) in the child after a failed exec — _exit skips the inherited atexit handlers and buffer flushes that belong to the parent. Do the descriptor surgery (dup2) in the child, before exec — that's the entire reason the gap exists. And always waitpid the children you fork, or you breed zombies.

Variants and lifecycle edge states

Orphans and re-parenting. If the parent exits first, the child is re-parented — historically to init (PID 1), now often to a per-session subreaper (PR_SET_CHILD_SUBREAPER). The new parent reaps it, so orphans never leak. The double-fork daemon trick exploits this: fork, the child forks again, the middle process exits, and the grandchild is adopted by init and detached from the terminal.

SIGCHLD and reaping without blocking. A server that doesn't want to block in wait installs a SIGCHLD handler that loops waitpid(-1, &st, WNOHANG) until it returns 0. Setting SIG_IGN on SIGCHLD on Linux makes children auto-reap — no zombies, but also no exit status.

clone(2). Linux's underlying primitive. fork is just clone with a particular flag set; threads are clone sharing the address space (CLONE_VM). Containers are clone with new namespace flags (CLONE_NEWPID, CLONE_NEWNET, …).

posix_spawn. The portable, thread-safe answer to "just run a program." It bundles the fork, the descriptor file-actions, and the exec into one call so no user code runs in the fragile post-fork window.

Common bugs and edge cases

  • Never reaping children. Forgetting wait/waitpid leaves zombies that accumulate until the PID table fills and fork starts failing with EAGAIN.
  • Assuming exec returns. Code written after a successful execvp never runs. The line after exec should only ever be the error path.
  • Forking a multithreaded process. Only the calling thread survives in the child, but every mutex the dead threads held is copied in its locked state. A single malloc or printf in the child can deadlock. Between fork and exec, call only async-signal-safe functions — or use posix_spawn.
  • Leaking file descriptors past exec. Descriptors survive exec by default, so a child can inherit sockets it has no business holding. Open with O_CLOEXEC (or set FD_CLOEXEC) so they auto-close on exec.
  • Buffered output duplicated. fork copies stdio buffers too. If the parent has un-flushed buffered output, both parent and child flush it — printing the same bytes twice. Flush before forking, or use unbuffered writes.
  • Using exit instead of _exit in the child. The child shares the parent's atexit handlers and stdio buffers via COW; running them in the child can double-flush or fire cleanup the parent expected to run once.

Frequently asked questions

Why does fork return twice?

fork creates a near-identical copy of the calling process, so after the kernel returns there are two processes both executing the line after fork. In the parent, fork returns the child's PID (a positive number); in the child it returns 0; on failure it returns -1 in the parent and no child is created. The single source line therefore appears to return twice — once in each process — which is how the same code can branch into parent and child behaviour.

What's the difference between fork and exec?

fork duplicates the current process, giving you a second copy of the same running program. exec does the opposite: it keeps the same process (same PID, same open file descriptors) but throws away the current memory image and loads a brand-new program in its place. exec never returns on success — there's nothing left to return to. You almost always pair them: fork to get a child, then exec in the child to run a different program.

What is a zombie process?

When a child exits, the kernel can't fully discard it yet — it must keep the exit status and resource usage around so the parent can read them. A child that has exited but not yet been waited on is a zombie: it holds a PID and a slot in the process table but nothing else. Calling wait or waitpid reaps the zombie, freeing the slot. Zombies that are never reaped accumulate and can exhaust the PID table.

What happens if the parent dies before the child?

The child becomes an orphan and is re-parented — historically to init (PID 1), today often to a per-session subreaper. The new parent is responsible for eventually reaping it, so an orphan does not leak. This is the basis of the classic double-fork daemonization trick: fork twice and let the middle process exit, so the grandchild is orphaned and adopted by init, detaching it from the controlling terminal.

Why is modern fork cheap even for huge processes?

Because of copy-on-write. fork no longer physically copies the parent's address space; instead it marks every shared page read-only and copies a page only when one side first writes to it. So forking a 4 GB process copies only the page tables, not 4 GB of data — typically hundreds of microseconds rather than the seconds a physical copy would take. The cost scales with the size of the page tables, not the size of the heap, which is why vfork and posix_spawn exist for the extreme cases.

Why is fork dangerous in a multithreaded program?

fork copies only the calling thread, not the others, but it copies all the memory — including locks the other threads were holding. The child inherits those locks in a locked state with no thread to release them, so a malloc or printf in the child can deadlock instantly. The only safe operations between fork and exec in a multithreaded process are async-signal-safe calls. This is why posix_spawn and the close-then-exec pattern are preferred for spawning programs from threaded servers.