Memory Allocation

Slab Allocator

Pre-allocate N identical objects per slab — turn alloc and free into O(1) list ops

A slab allocator caches objects of one fixed size in a pre-built free list. Alloc pops, free pushes — both O(1). It's Bonwick's 1994 trick; Linux's SLUB powers every kernel object type.

  • Alloc costO(1) — pop free-list head
  • Free costO(1) — push to free-list head
  • Internal fragmentationNear zero (objects identically sized)
  • ConstructorRun once per object lifetime, not per alloc
  • Invented byJeff Bonwick, Sun Microsystems, 1994
  • Linux variantsSLAB, SLUB (default), SLOB

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How the slab allocator works

The Linux kernel allocates millions of small objects per second — a TCP socket, a directory entry, a task struct, an inode. Each has a fixed size and a known initial state. A general-purpose malloc would carry a header on each, fragment over time, lock-contend at the global heap, and constantly re-initialize bytes the caller is about to re-initialize anyway. Bonwick's slab allocator says: just keep a stack of pre-constructed objects of size X, refill it in batches from the page allocator, and never look at headers.

kmem_cache "task_struct"   size = 9472 B, align = 64 B

  per-CPU active slab (cpu0)         partial slabs (per-node)        full slabs
  ┌──────────────────────────┐       ┌────────────┬────────────┐    ┌─────────┐
  │ obj 0 | obj 1 | ... | n-1│ ──→   │ partial 1  │ partial 2  │ →  │ full 1  │
  │   ▲ free list head        │       │  some free  │  some free │    │ all in │
  └──────────────────────────┘       └────────────┴────────────┘    │  use    │
        │                                                            └─────────┘
        ▼ refill on miss
  ┌──────────────────────────┐
  │ buddy allocator (order-N) │   refill returns one slab = N contiguous pages
  └──────────────────────────┘

Each slab is one or more physically contiguous pages obtained from the buddy allocator, carved into obj_per_slab equal-sized objects. The slab carries a free-list — in SLUB, an inline singly-linked list threaded through the free objects' first bytes — so allocation just pops the head and freeing just pushes it. Per-CPU slabs make the hot path lock-free; the partial list serves CPU misses; empty slabs are returned to the buddy when reclaim asks.

The structural pieces

Cache (kmem_cache). Per-type metadata: object size, alignment, constructor function, list of partial slabs, per-CPU slab pointers. Created by kmem_cache_create("name", size, align, flags, ctor). The kernel creates dozens at boot, more from modules.

Slab. One physically contiguous run of pages divided into N equal objects, owned by exactly one cache. The slab tracks how many of its objects are free, its place in the cache's partial/full lists, and which CPU (if any) is currently using it as the per-CPU active slab.

Per-CPU slot. Each online CPU holds a pointer to a "current" slab. Allocation pops the next free object from that slab without a lock. If the slab becomes full, the CPU promotes a partial slab from the per-node partial list; if no partial is available, it requests a fresh one from the buddy.

Free list. A singly-linked list threaded through the free objects themselves — the first 8 bytes of each free object hold a pointer to the next. No external bookkeeping; no per-object header.

Shrinker callback. Reclaim under memory pressure calls each cache's shrinker. Caches with releasable empty slabs hand pages back to the buddy allocator. The dentry and inode caches are the most prolific shrinker clients in a running system.

When a slab allocator wins

  • Many small, identical objects. Kernel objects, network protocol headers, file-system metadata, language-runtime small objects. The cache hit rate makes alloc cheaper than memcpy.
  • Reusable initial state. If init_my_object() is expensive, the slab's constructor runs it once when the page is first carved up; subsequent allocations skip it. Free can be a "destructor" that restores state for the next reuse, not a full re-init.
  • Multi-core scaling. Per-CPU slabs make the hot path lock-free. Even on 96 cores, alloc/free latency stays single-digit nanoseconds.
  • Allocation profiling. Per-cache statistics tell you exactly which subsystem is leaking. /proc/slabinfo in Linux lists every kmem_cache with its allocation count, slab count, and waste.

Slabs lose when allocations are big, rare, or highly variable in size — every cache holds one size class, so 50 sizes need 50 caches with their own per-CPU storage. For variable-size general allocations, kmalloc uses a fixed family of size-class caches (kmalloc-8, kmalloc-16, ..., kmalloc-8192) and rounds up.

Slab vs other kernel/userspace allocators

Slab / SLUBBuddyvmallockmalloc (size class)tcmallocdlmalloc / ptmalloc
GranularityOne fixed object sizePower-of-2 pagesPage-aligned vmem rangeRounded to size classPer-thread size classBins of size classes
Alloc costO(1) popO(log MAX_ORDER) splitPage-table setupO(1) into slabO(1) per-thread cacheO(1) amortized
FragmentationNear zero internalOrder-aligned wastePage-level onlyClass roundingClass roundingInternal headers + external
Per-CPU/thread cacheYes (SLUB)Yes (pageset)NoInherited from slabYes (per-thread)Per-arena
Constructor supportYesNoNoNoNoNo
Used byLinux kernel objectsPage allocatorModule space, big buffersMixed-size kernel objectsGoogle userspaceglibc default malloc

The slab is unique in knowing in advance exactly what size every allocation will be. That single piece of information eliminates the bookkeeping every general-purpose allocator must carry.

What slab allocators actually cost

  • O(1) alloc / free. Both reduce to a list pop or push — a handful of memory ops, no locks on the hot per-CPU path. Real numbers: kmem_cache_alloc measured at 30-80 ns under no contention, ~150 ns under heavy contention before falling back to per-CPU.
  • Internal fragmentation near zero. A cache with object size 192 B in 4-KB slabs holds 21 objects (waste ~96 B per slab, ~2.3%). Compare to a general malloc rounding 192 B up to a 256-B size class (25% waste per allocation).
  • kmem_cache_create exists for every long-lived object type. Linux declares 50–200 caches at boot (varies by config and modules). /proc/slabinfo on a typical desktop shows ~120 named caches.
  • Per-CPU storage. Each cache holds one pointer to an active slab per CPU. On a 96-core box that's tens of KB of per-cache overhead — usually negligible compared to the slab pages themselves.
  • Memory returned only on shrinker pressure. Empty slabs sit on the partial/empty list until /proc/sys/vm/drop_caches or watermark crossing wakes a shrinker. A box that just ran a big find often shows hundreds of MB pinned in the dentry cache.
  • Object reuse rate >99% in steady state. Hot kernel caches (struct file, struct sock, struct dentry) reuse the same memory hundreds of times before any slab returns to the buddy.

JavaScript implementation (sketch)

// A toy slab cache. One cache, one fixed object size, free list threaded
// through the unused entries. Allocation and free are O(1) list ops.

class Slab {
  // capacity = how many objects this slab holds.
  constructor(slabId, capacity, ctor) {
    this.slabId = slabId;
    this.capacity = capacity;
    this.objects = new Array(capacity);
    this.next = new Int32Array(capacity);            // free-list pointers
    for (let i = 0; i < capacity - 1; i++) this.next[i] = i + 1;
    this.next[capacity - 1] = -1;                    // end of list
    this.freeHead = 0;
    this.freeCount = capacity;
    // run constructor once per object so allocation never has to.
    if (ctor) for (let i = 0; i < capacity; i++) this.objects[i] = ctor();
    else for (let i = 0; i < capacity; i++) this.objects[i] = {};
  }
  alloc() {
    if (this.freeHead === -1) return null;
    const idx = this.freeHead;
    this.freeHead = this.next[idx];
    this.freeCount--;
    return { slabId: this.slabId, idx, obj: this.objects[idx] };
  }
  free(idx) {
    this.next[idx] = this.freeHead;
    this.freeHead = idx;
    this.freeCount++;
  }
  isFull()  { return this.freeCount === 0; }
  isEmpty() { return this.freeCount === this.capacity; }
}

class KmemCache {
  constructor(name, objectsPerSlab, ctor) {
    this.name = name;
    this.objectsPerSlab = objectsPerSlab;
    this.ctor = ctor;
    this.partial = [];           // slabs with some free objects
    this.full = [];              // slabs entirely allocated
    this.empty = [];             // slabs entirely free (reclaim candidates)
    this.nextSlabId = 0;
  }
  _refill() {
    // In a real kernel: pop from the buddy allocator. Here: just create one.
    const s = new Slab(this.nextSlabId++, this.objectsPerSlab, this.ctor);
    this.partial.push(s);
    return s;
  }
  alloc() {
    let slab = this.partial[this.partial.length - 1] ?? this._refill();
    const got = slab.alloc();
    if (slab.isFull()) {
      this.partial.pop();
      this.full.push(slab);
    }
    return got;                                   // { slabId, idx, obj }
  }
  free(ref) {
    const { slabId, idx } = ref;
    let slab = this.partial.find(s => s.slabId === slabId)
            ?? this.full.find(s => s.slabId === slabId);
    const wasFull = slab.isFull();
    slab.free(idx);
    if (wasFull) {
      this.full = this.full.filter(s => s !== slab);
      this.partial.push(slab);
    }
    if (slab.isEmpty()) {
      this.partial = this.partial.filter(s => s !== slab);
      this.empty.push(slab);                      // candidate for shrinker
    }
  }
  shrink() {                                       // give pages back to buddy
    const released = this.empty.length;
    this.empty = [];
    return released;
  }
}

// Usage: one cache per object type, like the kernel does it.
const taskCache = new KmemCache('task_struct', 64, () => ({ pid: 0, state: 0 }));
const t = taskCache.alloc();
t.obj.pid = 42;
taskCache.free(t);

Real SLUB threads the free list through the unused objects themselves — no separate next array — and stores per-slab metadata in the page struct. The structure is the same: per-cache partial list, O(1) alloc/free, empty-slab return on shrinker callback.

Python implementation — per-CPU caches

import threading
from dataclasses import dataclass, field
from typing import Callable, Optional

@dataclass
class Slab:
    slab_id: int
    capacity: int
    free_head: int = 0
    free_count: int = 0
    next: list[int] = field(default_factory=list)
    storage: list = field(default_factory=list)

    def __post_init__(self) -> None:
        self.next = list(range(1, self.capacity)) + [-1]
        self.free_count = self.capacity
        self.storage = [None] * self.capacity

    def alloc(self) -> Optional[int]:
        if self.free_head == -1: return None
        idx = self.free_head
        self.free_head = self.next[idx]
        self.free_count -= 1
        return idx

    def free(self, idx: int) -> None:
        self.next[idx] = self.free_head
        self.free_head = idx
        self.free_count += 1

    def is_full(self) -> bool:  return self.free_count == 0
    def is_empty(self) -> bool: return self.free_count == self.capacity

class KmemCache:
    """Slab cache with per-CPU active slabs and a partial/full list under a lock."""
    def __init__(self, name: str, objects_per_slab: int, ctor: Callable, cpu_count: int = 4):
        self.name = name
        self.objects_per_slab = objects_per_slab
        self.ctor = ctor
        self.partial: list[Slab] = []
        self.full: list[Slab] = []
        self.empty: list[Slab] = []
        self._next_id = 0
        self._lock = threading.Lock()
        self.per_cpu: list[Optional[Slab]] = [None] * cpu_count

    def _new_slab(self) -> Slab:
        s = Slab(slab_id=self._next_id, capacity=self.objects_per_slab)
        self._next_id += 1
        for i in range(self.objects_per_slab):
            s.storage[i] = self.ctor()
        return s

    def alloc(self, cpu_id: int) -> tuple[Slab, int]:
        # Fast path: per-CPU active slab. No global lock.
        slab = self.per_cpu[cpu_id]
        if slab is not None and not slab.is_full():
            idx = slab.alloc()
            return slab, idx
        # Slow path: promote a partial slab (or refill from buddy).
        with self._lock:
            if slab is not None:                       # retire the now-full one
                self.full.append(slab)
            slab = self.partial.pop() if self.partial else self._new_slab()
            self.per_cpu[cpu_id] = slab
        idx = slab.alloc()
        return slab, idx

    def free(self, slab: Slab, idx: int) -> None:
        was_full = slab.is_full()
        slab.free(idx)
        if was_full:
            with self._lock:
                if slab in self.full:
                    self.full.remove(slab)
                    self.partial.append(slab)
        if slab.is_empty():
            with self._lock:
                if slab in self.partial:
                    self.partial.remove(slab)
                    self.empty.append(slab)
            # In a real kernel, the shrinker decides when to return to buddy.

    def shrink(self) -> int:
        with self._lock:
            released = len(self.empty)
            self.empty.clear()
        return released

The hot per-CPU path is lock-free; only refills (which are batched) hit the global lock. This is precisely the structure Linux SLUB uses to scale to hundreds of cores — locking only on slab transitions, never on per-object allocation.

Variants and real-world deployments

Bonwick SLAB (Solaris, 1994). The original. Three free queues per cache: per-CPU magazines (depot), per-magazine cache, central cache. Object coloring to spread cache-line offsets across slabs.

Linux SLAB. Direct port of Bonwick's design. Three-queue structure, slab coloring. Removed from the kernel in 6.x once SLUB matured.

Linux SLUB (default since 2.6.23). Simplified queue structure, per-CPU active slab, partial-slab list per NUMA node, metadata in the page struct. The default for x86, ARM64, RISC-V.

Linux SLOB. Minimum-footprint allocator for embedded kernels. Best-fit list of pages, no per-type caching. Trades performance for code size.

FreeBSD UMA (Universal Memory Allocator). Slab plus per-CPU buckets plus per-domain (NUMA) zones. Universal in the sense that it's the allocator for both kernel objects and the buddy of the page allocator.

illumos / Solaris kmem. The direct descendant of Bonwick's original. Adds magazines and depot layers for stronger per-CPU caching, plus KGI for kernel-aware leak/use-after-free debug.

Windows lookaside lists. The structural analog — per-type free list of objects of identical size, refilled in batches from the executive pool allocator.

Userspace echoes. jemalloc, tcmalloc, mimalloc are slab-flavored: size classes plus per-thread caches plus central free lists. The kernel slab is the design ancestor of all of them.

Common bugs and edge cases

  • Use-after-free poisoning. The freed object sits in the cache, ready to be re-handed-out. A stale pointer to it may successfully read the bytes of an unrelated new allocation. SLUB debug mode poisons freed memory with 0x6b to catch this.
  • Double free corruption. Pushing the same index twice onto the free list creates a cycle. Allocation later returns the same object to two callers. SLUB's redzones and freelist randomization help; production builds remove them for performance.
  • Object that escapes its cache. Freeing a slab object to the wrong cache scrambles two free lists. Type-stable allocations make this a memory-safety hole.
  • Slab leak in the dentry cache. A subsystem holding dentries forever quietly consumes RAM until the shrinker can't catch up. Diagnose with slabtop and slub_debug=U.
  • False sharing across objects. Two unrelated 192-byte objects packed into the same 64-byte cache line cause two cores to ping-pong the line. Slab coloring used to mitigate this; SLUB largely relies on object-size-rounded-to-cache-line alignment instead.
  • Per-CPU slab not flushed on offline. Hot-unplug must flush the CPU's slab back to the partial list, or those objects leak. Kernel bugs here have caused per-CPU memory leaks measured in GB on long-running boxes.
  • Constructor side effects. A bad constructor (e.g., one that takes a sleeping lock) deadlocks at slab-refill time inside a non-preemptive context. Constructors must be small, deterministic, and atomic.

Frequently asked questions

Why is the slab allocator faster than malloc?

Each cache holds objects of one fixed size, so alloc/free reduce to a pop/push against a singly-linked free list — a couple of memory ops and no header parsing. The fragment-and-coalesce dance that general malloc has to perform on every operation is replaced by a constant-time list manipulation. Per-CPU caches (the second tier in Linux SLUB) further remove the lock from the hot path so multi-core scaling stays linear.

What is the kmem_cache structure in Linux?

kmem_cache is the per-object-type metadata: object size, alignment, the constructor function (if any), and pointers to the cache's slabs and per-CPU page slots. Every kernel object type that wants its own slab — task_struct, dentry, inode, mm_struct, sock — calls kmem_cache_create at boot or module init. After that, kmem_cache_alloc returns an aligned, often pre-initialized object in nanoseconds, and kmem_cache_free returns it to the cache for the next caller.

How does a slab cache reclaim memory?

Each cache tracks its slabs as partial, full, or empty. Empty slabs (every object free) are candidates to return to the buddy allocator. A 'shrinker' callback is registered per cache so memory reclaim under watermark pressure can ask each cache to release some empty slabs. Big caches like the dentry and inode caches have explicit shrinkers run by the VFS reclaimer; trimming them is what 'echo 2 > /proc/sys/vm/drop_caches' actually does.

What is the difference between SLAB, SLUB, and SLOB?

SLAB was Linux's first port of Bonwick's design — three queues (CPU, node, global), aggressive coloring for cache-line distribution. SLUB (default since 2.6.23) keeps a per-CPU slab and a per-node partial list, drops most of the queue machinery, and stores metadata in the page struct instead of inline — smaller memory footprint, better multi-core scaling. SLOB is for embedded/tiny kernels: a simple first-fit list-of-pages allocator, no per-type caching, minimum kernel footprint.

How does slab compare to userspace malloc?

General malloc must handle any byte size from any caller; it uses size-class bins, headers, coalescing, and complex free-list management. Slab knows in advance that every allocation from a given cache is the same size — so no per-allocation header, no in-place coalescing, no fragmentation between sizes. Userspace allocators like jemalloc and tcmalloc adopt the same idea (size classes plus per-thread arenas) but ship general-purpose code that handles arbitrary sizes by routing each through its size class.

What is slab coloring?

Two slabs holding the same object type, allocated back-to-back from the buddy allocator, would put corresponding objects at the same offsets inside their pages — and therefore at the same cache-line index inside the L1. Slab coloring stagger the start-of-slab offset by one cache-line each, so consecutive objects across slabs spread across cache sets. The cost is a few bytes of wasted space at the start of each slab; the win is far fewer L1 conflict misses. SLUB drops most coloring because modern caches have grown enough associativity that the gain is marginal.