Is mmap faster than read?

Sometimes. Sequential bulk reads with a large buffer often beat mmap because the syscall overhead amortizes and read-ahead is aggressive. Mmap wins for random access, partial reads of huge files, and zero-copy IPC. SQLite, LMDB and modern Lucene all use mmap for exactly that reason.

What's a page fault and is it expensive?

Accessing a mapped page that isn't yet resident traps into the kernel — a minor fault if the page is in cache (~1 µs) or major if it has to come from disk (~100 µs on SSD, ~10 ms on spinning rust). The first touch of every mapped page costs at least a minor fault.

MAP_SHARED vs MAP_PRIVATE — what's the difference?

MAP_SHARED writes go back to the underlying file and are visible to other mappers. MAP_PRIVATE uses copy-on-write: the first store on a page makes a private copy and the file is never modified. Loaders use PRIVATE for code, databases use SHARED for data files.

Can mmap fail because the file is too big?

On 32-bit systems, yes — your address space tops out at 2–3 GB of usable virtual range. On 64-bit, files routinely exceed RAM and mmap still works because pages fault in lazily, but the kernel can refuse if you hit RLIMIT_AS or system-wide overcommit limits.

What happens if the underlying file is truncated while mapped?

Touching a page beyond the new size raises SIGBUS. The kernel won't extend the mapping or zero-fill — it's the application's responsibility to keep the mapping and file size in sync, which is why databases use ftruncate before extending mappings.

Memory-Mapped I/O — How mmap Maps Files into Address Space

How memory mapping works

Every modern OS already presents process memory as a translated view of physical RAM via the page table. mmap just lets you ask for a chunk of that virtual address space to be backed by something specific — a file, an anonymous slab, a device. The kernel sets up the page-table entries but does not populate them. The physical pages are only fetched when you actually touch them.

The lifecycle of a single page in a file mapping:

mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, off) returns a virtual address. No data is read.
You dereference an address — say *(int*)(addr + 8192). The MMU finds no valid PTE for that virtual page and traps into the kernel: a page fault.
The fault handler looks up the file, reads the corresponding page from disk into the page cache (or finds it already there), and installs a PTE pointing at it.
The original load instruction is retried and now succeeds. From the program's perspective it was just a memory access.
Subsequent stores dirty the page. The kernel's writeback thread will eventually flush it back to the file (for MAP_SHARED) on its own schedule, or immediately on msync(MS_SYNC).

This lazy-fault model is the whole point. A 100 GB file mapped on a 16 GB box is fine — only the actively touched pages live in RAM. The kernel evicts cold pages back to disk under memory pressure, just like its normal page cache. From the application's point of view, the file is a byte array that happens to be larger than memory.

When to use mmap

Random access into large files — databases, search indexes, columnar stores.
Shared-memory IPC between cooperating processes (file-backed or MAP_ANONYMOUS|MAP_SHARED).
Loading executables and shared libraries — every ld.so on Linux uses mmap.
Writing to a file that you want to manipulate as a struct or array, with the kernel handling persistence.

Avoid mmap for streaming sequential reads of files larger than RAM — explicit read with a fat buffer plus posix_fadvise(SEQUENTIAL) hints often beats it because read-ahead is more aggressive and you don't pay a fault per page. Avoid it for files on networked filesystems too: NFS mmap semantics are subtle and bug-prone.

mmap vs read/write vs O_DIRECT

	mmap (file-backed)	read/write	O_DIRECT	mmap (anonymous, MAP_SHARED)	tmpfs / shm_open	io_uring + registered buffers
User-kernel copies	0 (page table magic)	1 per syscall	0 (DMA into user buffer)	0	0 (RAM-only file)	0 (pinned buffer)
Syscall per access	0 after fault	1 per call	1 per call	0 after fault	0 after fault	0 (sqe ring)
Random access	Excellent	Painful — seek + read	Painful — seek + read, alignment-required	Excellent	Excellent	Good
Sequential bulk	Good (with MADV_SEQUENTIAL)	Best (read-ahead friendly)	Excellent if aligned	N/A	Excellent	Excellent
Page-cache use	Yes — shared with read/write of same file	Yes	No (bypassed)	Yes	Yes (but pages == storage)	Yes by default
SIGBUS risk on truncate	Yes	No	No	No	No	No
Best for	Databases, parsers, BLOB stores	Sequential streaming	Direct-to-device databases (Postgres, Oracle)	Cross-process IPC over a file	POSIX shm regions, fast caches	Async high-fanout I/O

The single most important row is "Random access". Reading 8 KB at offset 50 GB into a file with pread takes ~100 µs of disk plus a syscall. Doing it via a mapped pointer is the same disk read, no syscall, and on subsequent touches of the same page it's a memory-speed hit (~30 ns). Databases live or die on this.

Python: mmap a file as a slice-able buffer

import mmap, os

path = "/var/data/index.bin"
with open(path, "r+b") as f:
    size = os.fstat(f.fileno()).st_size
    with mmap.mmap(f.fileno(), size, access=mmap.ACCESS_WRITE) as mm:
        # Treat the file as a bytearray
        header = mm[:16]
        mm[1024:1028] = (42).to_bytes(4, "little")
        # Tell the kernel we'll touch sequential pages
        mm.madvise(mmap.MADV_SEQUENTIAL)
        mm.flush()  # msync — push dirty pages back to disk

Python's mmap module exposes a memoryview-style object you can slice, search, and even re.search over without ever pulling the whole file into memory. madvise lets you tell the kernel about your access pattern: MADV_RANDOM disables read-ahead, MADV_DONTNEED drops resident pages, MADV_HUGEPAGE requests transparent hugepages.

C: shared anonymous mapping for IPC

#include <sys/mman.h>
#include <unistd.h>
#include <stdatomic.h>

typedef struct { atomic_int counter; char payload[4080]; } region_t;

int main(void) {
    region_t *r = mmap(NULL, sizeof(region_t),
                       PROT_READ | PROT_WRITE,
                       MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (r == MAP_FAILED) return 1;

    if (fork() == 0) {                          // child
        atomic_fetch_add(&r->counter, 1);
        _exit(0);
    }
    // parent
    sleep(1);
    // r->counter is incremented in the parent's address space too
}

MAP_ANONYMOUS | MAP_SHARED creates a region that survives fork and is shared between parent and child without a backing file. Every concurrency primitive in libpthread for cross-process locks (process-shared mutexes, futexes) lives in pages set up exactly like this. Place an atomic in the first cache line and you have lock-free IPC.

Node.js: shared array buffers and N-API mmap

// Within a single process, SharedArrayBuffer + Worker threads is the JS equivalent
import { Worker, isMainThread, parentPort, workerData } from 'node:worker_threads';

if (isMainThread) {
  const sab = new SharedArrayBuffer(4096);
  const view = new Int32Array(sab);
  new Worker(new URL(import.meta.url), { workerData: sab });
  Atomics.wait(view, 0, 0);
  console.log('worker wrote', view[0]);
} else {
  const view = new Int32Array(workerData);
  Atomics.store(view, 0, 42);
  Atomics.notify(view, 0);
}

Node's standard library has no direct mmap binding. SharedArrayBuffer is mmap-flavored shared memory between Worker threads. For true file-backed mmap, native addons like mmap-io wrap the syscall — useful when you want zero-copy reads of multi-gigabyte log files without buffering them through V8.

Mapping variants

File-backed, MAP_SHARED. The default for "treat a file as memory". Stores hit the file. Other mappers see your writes. Used by databases.
File-backed, MAP_PRIVATE (copy-on-write). First write to a page makes a private copy; the file stays untouched. Used by binary loaders to map executable text without ever modifying it on disk.
Anonymous, MAP_PRIVATE. What malloc uses for big allocations. The kernel zero-fills pages on first touch.
Anonymous, MAP_SHARED. Cross-fork shared memory without a file. Cheaper than POSIX shm for parent/child only.
Hugepage mappings (MAP_HUGETLB, HUGETLB_FLAG_2MB). 2 MB or 1 GB pages reduce TLB pressure 512× per entry — DBs and JVMs use them for heap regions.
Transparent hugepages. The kernel opportunistically promotes 4 KB pages to 2 MB. Faster-on-average but introduces pause-time spikes from background defragmentation.
POSIX shm_open + mmap. A tmpfs-backed file in /dev/shm. Survives the lifetime of the system, not just a fork tree.

Costed claims

Page-fault cost: minor fault ~1 µs (page already in cache, just install PTE), major fault ~100 µs on NVMe SSD, ~10 ms on spinning disk. A read syscall to a hot page costs ~100 ns + memcpy.
TLB reach with 4 KB pages: ~2 MB on a typical x86 with 512 TLB entries. With 2 MB hugepages, the same TLB covers 1 GB — a 512× reduction in TLB miss rate for large working sets.
Address space cost: on 64-bit Linux you have 47 bits of usable virtual range (~128 TB). Mapping 1 TB of files is routine; the kernel doesn't materialize physical pages until touched.
Writeback throughput: the kernel's kflushd coalesces dirty pages into multi-MB writes, sustaining roughly the device's sequential write speed (~3 GB/s on NVMe Gen4). msync blocks until those writes complete.

Common bugs and edge cases

SIGBUS on truncated file. If the file shrinks below the mapping length, touching the truncated tail kills the process with SIGBUS. Guard with ftruncate before extending the mapping or install a SIGBUS handler.
Forgetting to msync before unmap. Unmapping a dirty MAP_SHARED region usually flushes lazily, but if the process crashes or the system loses power, dirty pages can be lost. Databases call msync(MS_SYNC) before commit fences.
Holding a write lock across page faults. A page fault inside a critical section can take milliseconds (major fault) and stall every waiter. This is why JVMs prefault their heap and databases prewarm indexes.
Address space exhaustion on 32-bit. 2–3 GB of usable virtual range — mapping a 4 GB file directly fails. Sliding-window mmap (re-map small ranges as you scan) is the workaround.
NFS mmap surprises. Cache coherence is per-client; another machine writing the same file may not be visible until the next attribute revalidation. Avoid mmap for shared data on NFS unless you understand the close-to-open semantics.
Forking after mmap. MAP_PRIVATE CoW pages duplicate per-process on first write — innocent-looking memory writes can blow up RSS in forked workers.

Memory-Mapped I/O

Interactive visualization

Watch the 60-second explainer

How memory mapping works

When to use mmap

mmap vs read/write vs O_DIRECT

Python: mmap a file as a slice-able buffer

C: shared anonymous mapping for IPC

Node.js: shared array buffers and N-API mmap

Mapping variants

Costed claims

Common bugs and edge cases

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

How memory mapping works

When to use mmap

mmap vs read/write vs O_DIRECT

Python: mmap a file as a slice-able buffer

C: shared anonymous mapping for IPC

Node.js: shared array buffers and N-API mmap

Mapping variants

Costed claims

Common bugs and edge cases

Frequently asked questions

Related concepts