Memory Allocation

Buddy Allocator

Q: Why power-of-two blocks?

Power-of-two sizes have an O(1) buddy address: the buddy of block at offset X size 2^k is the block at offset X XOR 2^k. Coalescing on free reduces to a bit flip and a list-membership check. Any other size would require interval-tree lookups for sibling discovery. The cost is internal fragmentation — a 17-page request rounds up to 32 pages — but the rounding is bounded and the bookkeeping is constant-time.

Q: How does coalescing actually work?

On free, the allocator XORs the freed block's address with its size to compute its buddy's address. It then checks whether the buddy is on the free list of the same order. If yes: remove the buddy from its free list, take the lower address of the two as the new block of order k+1, and recurse — the new larger block may itself have a free buddy. The recursion terminates when buddies are not both free, or when the maximum order is reached. The whole walk is O(log MAX_ORDER), so under 20 steps even for huge memories.

Q: What is MAX_ORDER in Linux?

MAX_ORDER is the largest power-of-two block the buddy allocator manages. Default Linux: MAX_ORDER = 11, so orders 0..10 cover 2^0 = 1 page (4 KB) through 2^10 = 1024 pages (4 MB). Some architectures bump it up to support gigantic hugepages contiguously. Allocations larger than 2^(MAX_ORDER-1) pages have to come from CMA, hugetlb, or contiguous-memory carve-outs because the buddy refuses to express them as a single block.

Q: What is the difference between a buddy allocator and slab?

Buddy hands out variable-power-of-two contiguous page blocks; slab hands out fixed-size objects. The two stack: a slab cache refills itself by requesting one or more pages from the buddy (typically a single page, sometimes order-1 or order-2 for larger objects). Slab amortizes the buddy's split/merge work across many objects of the same size, and the buddy in turn makes sure each slab gets a physically contiguous run.

Q: What is external fragmentation in a buddy allocator?

External fragmentation appears when the free list at some order is large but every entry is split across non-mergeable buddies. /proc/buddyinfo is the diagnostic: a healthy zone has counts at multiple orders; a fragmented one has thousands at order 0 and zero at order 6+. To allocate an order-9 huge page in this state, the kernel must run the compactor to physically move movable pages out of the way, coalesce released buddies, and re-form a large contiguous block.

Q: How does Linux track buddies?

Each zone keeps an array of free_area structures, one per order. Each free_area holds a list of free blocks at that order, broken out by migrate type (UNMOVABLE, MOVABLE, RECLAIMABLE, CMA, etc.). The struct page for each first-page-of-block holds the order in page->private and the migrate type in page flags. Coalescing checks the buddy's page->private to confirm it is also a free block of the same order before merging.

Split on alloc, merge on free — power-of-two blocks the kernel can coalesce in O(1)

A buddy allocator manages memory as power-of-two blocks. Splits halve blocks on alloc; frees merge sibling buddies — fast, bounded-fragmentation page allocation.

Block sizes2^0 page .. 2^(MAX_ORDER-1) pages
Linux default range4 KB to 4 MB (MAX_ORDER = 11)
Alloc costO(log MAX_ORDER) splits
Free costO(log MAX_ORDER) coalesces
Buddy addressX XOR (1 << order)
Internal fragmentationBounded, < 2× requested

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a buddy allocator works

Think of memory as a single block of 2^N pages. The buddy allocator keeps one free list per "order" — order 0 is one page (4 KB), order 1 is two contiguous pages, order 10 is 1024 contiguous pages (4 MB). At first only the topmost order is populated with one big block; all other lists are empty.

An allocation for K pages rounds up to the smallest order k with 2^k ≥ K, then asks for a block at that order. If the order-k list is empty, the allocator looks at order k+1: take one block, split it in half, push the right half onto the order-k free list, and recurse with the left half (which may itself need splitting). On free, the allocator computes the buddy address — for a block at offset X and order k, the buddy lives at offset X XOR (1 << k) — and checks whether the buddy is also free. If yes, coalesce into an order-(k+1) block and retry the merge one level up. This walk takes at most MAX_ORDER steps.

order  block size   sample state before alloc(16 pages)

  10    1024 pages   [ ▣ one free 1024-page block ]
  9      512 pages   [ ]
  8      256 pages   [ ]
  7      128 pages   [ ]
  6       64 pages   [ ]
  5       32 pages   [ ]
  4       16 pages   [ ]   ← target order
  3        8 pages
  ...

Asking for 16 pages (order 4):

  Step 1 (split 1024 → 512+512):  o10 [ ]    o9 [ ▣ ▢ ]
  Step 2 (split 512  → 256+256):  o9  [ ▣ ]  o8 [ ▢ ▣ ]
  Step 3 (split 256  → 128+128):  o8  [ ▣ ]  o7 [ ▢ ▣ ]
  Step 4 (split 128  → 64 +64 ):  o7  [ ▣ ]  o6 [ ▢ ▣ ]
  Step 5 (split 64   → 32 +32 ):  o6  [ ▣ ]  o5 [ ▢ ▣ ]
  Step 6 (split 32   → 16 +16 ):  o5  [ ▣ ]  o4 [ ▢ ▣ ]
  Allocate the left 16-page block (▢ becomes "in use").

  Result: kept right-half blocks free at o5, o6, o7, o8, o9; one o4 in use.
  The same 6 splits run for any first allocation up to 16 pages — afterward
  there are free blocks at every order so subsequent allocs are cheap.

On free, the same path is walked backwards. A handed-back order-4 block whose buddy is still free coalesces with it into an order-5; if that order-5's buddy is also free, into an order-6; and so on. Eventually the structure returns to one giant block at order 10.

The moving pieces

Free-area lists. One doubly-linked list per order. Linux's struct free_area array lives inside each zone — the buddy is per-zone, not global.

Order metadata. Each page's order is stored in page->private when the page is a free buddy block. Knowing the order is what makes buddy XOR-computation possible.

Migrate type per block. Free blocks are further classified — UNMOVABLE, MOVABLE, RECLAIMABLE, CMA — so the compactor and reclaim can act selectively. The same buddy structure runs once per migrate type.

Allocation context. GFP flags decide which zone the buddy runs in (zones partition physical memory by purpose). The buddy itself is identical across zones; the zone is just the address range it manages.

Why a buddy allocator

Bounded fragmentation. The worst-case allocation wastes less than half the block — a request for 65 pages becomes an order-7 (128-page) block, 63 pages wasted internally. Smaller requests waste correspondingly less, and the waste returns to the free list on free.
O(1) buddy address. Coalescing is a single XOR, not a search through a free tree. Free is genuinely cheap.
Physically contiguous blocks. By construction, every block the buddy hands out is physically contiguous — exactly what DMA, hugepages, and the slab allocator need.
Compaction-friendly. A page can be moved and the buddy still tracks its new free state correctly; coalescing simply happens when neighbors line up.

Buddy allocators lose for many small (sub-page) requests, where slab is better; for very-large allocations that exceed MAX_ORDER, where CMA or hugetlb reserve memory in advance; and for workloads where coalescing never catches up (constant churn of mid-order blocks at very high rate).

Buddy vs other memory allocators

	Buddy	Slab	First-fit / best-fit	Segregated free lists	Bump allocator	Region (arena)
Granularity	2^k pages	Fixed object size	Variable bytes	Discrete size classes	Bytes (no free)	Bytes, bulk free
Alloc cost	O(log MAX_ORDER)	O(1) pop	O(N) search	O(1) per class	O(1) pointer bump	O(1) pointer bump
Free cost	O(log MAX_ORDER) merge	O(1) push	O(N) coalesce	O(1) push	Not supported	O(1) region reset
Internal frag	Up to ~50%	Near zero	Low	Class rounding	None	None
External frag	From split history	None	Common	Common	N/A	N/A
Used by	Linux page allocator, FreeBSD vm_phys	kernel objects, jemalloc tcache	Old C malloc	tcmalloc, mimalloc	Compilers, parsers	Game engines, Erlang heaps

The buddy's signature property is the symmetric structure of alloc and free: both are recursive walks across orders bounded by MAX_ORDER, and both rely on the same XOR-based sibling lookup. Few other allocators have such a clean reversibility.

What the numbers actually look like

Orders 2^0 to 2^10 in Linux default. MAX_ORDER = 11 means free-area arrays span order 0 (4 KB) to order 10 (4 MB). Some configs bump to MAX_ORDER = 12 or higher for 1 GB gigantic-page support.
Internal fragmentation bounded by 2×. Worst case: request 2^k + 1 pages, get 2^(k+1) — half the block is wasted. Across millions of allocations of varied sizes the average waste is well under 50% and is fully reclaimable on free.
Coalesce walk < log₂(memory) steps. Even on a 1 TB-physical-memory NUMA box, MAX_ORDER bounded by zone size limits each free to under 20 buddy checks — microseconds at worst.
/proc/buddyinfo shows per-order free counts. Healthy zones show non-zero counts across many orders. Fragmented zones cluster everything at order 0 — symptom of failed high-order allocations and a trigger for kcompactd.
External fragmentation forces compaction. Allocations >= order 3 occasionally fail under high uptime; the kernel runs vm.compaction_proactiveness-driven sweeps to relocate movable pages and reform high-order blocks.
Pages per second in steady state. A loaded Linux server allocates and frees tens of millions of pages per second through the buddy — measurable via /proc/vmstat's pgalloc_* and pgfree counters.

JavaScript implementation

// A power-of-two buddy allocator. Pages addressed by index 0..(1<<MAX)-1.
// Each order keeps a Set of free block start-offsets.

class BuddyAllocator {
  constructor(maxOrder = 10) {                   // 1024 pages → 4 MB if 4 KB pages
    this.maxOrder = maxOrder;
    this.free = [];
    for (let k = 0; k <= maxOrder; k++) this.free.push(new Set());
    this.free[maxOrder].add(0);                  // one big block at startup
    this.orderOf = new Map();                    // allocated start → order
  }

  _splitDown(order, targetOrder) {
    // Pop a block from `order`, split until reaching targetOrder.
    while (order > targetOrder) {
      const startIter = this.free[order].values().next();
      const start = startIter.value;
      this.free[order].delete(start);
      const half = 1 << (order - 1);
      this.free[order - 1].add(start);           // left half goes free
      this.free[order - 1].add(start + half);    // right half goes free
      order--;
    }
  }

  alloc(pages) {
    const order = Math.max(0, Math.ceil(Math.log2(Math.max(1, pages))));
    if (order > this.maxOrder) return null;

    // Find smallest order with a free block ≥ requested order.
    let k = order;
    while (k <= this.maxOrder && this.free[k].size === 0) k++;
    if (k > this.maxOrder) return null;          // OOM at this size

    this._splitDown(k, order);
    const startIter = this.free[order].values().next();
    const start = startIter.value;
    this.free[order].delete(start);
    this.orderOf.set(start, order);
    return { start, order, size: 1 << order };
  }

  free_(block) {
    let { start, order } = block;
    while (order < this.maxOrder) {
      const buddy = start ^ (1 << order);        // XOR gives sibling address
      if (!this.free[order].has(buddy)) break;   // buddy busy → stop merging
      this.free[order].delete(buddy);
      start = Math.min(start, buddy);            // lower address survives
      order++;                                    // merged block lives one order up
    }
    this.free[order].add(start);
    this.orderOf.delete(block.start);
  }

  buddyinfo() {                                  // mimic /proc/buddyinfo
    return this.free.map((s, k) => ({ order: k, count: s.size }));
  }
}

// Use:
const ba = new BuddyAllocator(10);
const a = ba.alloc(16);    // 16 pages → order 4 → splits down from order 10
const b = ba.alloc(64);    // 64 pages → order 6 → reuses split tree
ba.free_(a);               // free order 4; coalesces with right buddy if free
console.log(ba.buddyinfo());

Two lines do all the structural work: start ^ (1 << order) computes the buddy, and Math.min(start, buddy) picks which address represents the merged larger block. The rest is just bookkeeping.

Python implementation — with migrate types

from math import log2, ceil
from collections import defaultdict
from typing import Optional

MOVABLE, UNMOVABLE, RECLAIMABLE = 'movable', 'unmovable', 'reclaimable'

class Buddy:
    """Power-of-two buddy allocator with per-migrate-type free lists."""

    def __init__(self, max_order: int = 10) -> None:
        self.max_order = max_order
        # free[order][migrate_type] = set of start offsets
        self.free: list[dict[str, set[int]]] = [
            defaultdict(set) for _ in range(max_order + 1)
        ]
        self.free[max_order][MOVABLE].add(0)
        # metadata for allocated blocks
        self.allocated: dict[int, tuple[int, str]] = {}

    @staticmethod
    def _order_for(pages: int) -> int:
        return max(0, ceil(log2(max(1, pages))))

    def _split_down(self, order: int, target: int, mt: str) -> int:
        """Take one block at `order` (of migrate type `mt`), split to `target`. Return start."""
        start = self.free[order][mt].pop()
        while order > target:
            half = 1 << (order - 1)
            self.free[order - 1][mt].add(start + half)   # right half free
            order -= 1
        return start

    def alloc(self, pages: int, mt: str = MOVABLE) -> Optional[tuple[int, int]]:
        target = self._order_for(pages)
        if target > self.max_order: return None
        # Walk orders looking for a free block of migrate type `mt`.
        for k in range(target, self.max_order + 1):
            if self.free[k][mt]:
                start = self._split_down(k, target, mt)
                self.allocated[start] = (target, mt)
                return start, target
        # Fallback: steal from another migrate type at a high order, mark it mt.
        for k in range(target, self.max_order + 1):
            for src in (UNMOVABLE, RECLAIMABLE, MOVABLE):
                if self.free[k][src]:
                    start = self.free[k][src].pop()
                    while k > target:
                        half = 1 << (k - 1)
                        self.free[k - 1][mt].add(start + half)
                        k -= 1
                    self.allocated[start] = (target, mt)
                    return start, target
        return None

    def free_(self, start: int) -> None:
        order, mt = self.allocated.pop(start)
        while order < self.max_order:
            buddy = start ^ (1 << order)
            if buddy not in self.free[order][mt]:
                break
            self.free[order][mt].remove(buddy)
            start = min(start, buddy)
            order += 1
        self.free[order][mt].add(start)

    def buddyinfo(self) -> list[dict[str, int]]:
        return [
            {'order': k, 'movable': len(self.free[k][MOVABLE]),
             'unmovable': len(self.free[k][UNMOVABLE]),
             'reclaimable': len(self.free[k][RECLAIMABLE])}
            for k in range(self.max_order + 1)
        ]

The migrate-type axis is what makes Linux's buddy compaction-friendly. Free movable blocks coalesce with movable buddies; the compactor can relocate movable pages to free up neighbors of unmovable ones; the buddy's XOR identity continues to work because the address arithmetic is independent of the migrate-type bookkeeping.

Variants and real-world deployments

Knuth's original buddy system (1968). The Art of Computer Programming, vol. 1, describes the classic algorithm: power-of-two free lists, XOR-buddy lookup, recursive split/merge.

Linux page allocator. The canonical production buddy. Per-zone, per-migrate-type, integrated with the per-CPU pageset cache, kcompactd defragmentation, and CMA reservations.

FreeBSD vm_phys. Multi-segment buddy with NUMA domain awareness. Different segmentation strategy from Linux zones but identical buddy mechanics.

seL4 untyped memory. A capability-based microkernel where untyped (raw physical) memory is allocated and retyped into kernel objects. The retype operation is buddy-like — split or coalesce capability ranges.

jemalloc extents. Userspace allocator with buddy-style coalescing for its arena-level extent management. Tracks dirty/clean extents and madvise-releases dirty pages periodically.

RTOS allocators (FreeRTOS heap_4). Embedded systems often ship a buddy or buddy-flavored allocator because bounded coalescing latency matters for hard real-time deadlines.

Hugepage subsystem. Hugepages are pre-reserved order-9 (2 MB) or order-18 (1 GB) blocks. Either come from the buddy at boot or are carved out of CMA — both lean on the buddy's contiguity guarantees.

Common bugs and edge cases

Failure to find an order despite plenty of memory. Classic external-fragmentation hit: hundreds of MB free but all at order 0. Symptom: __alloc_pages failure for order 4+. Fix: compaction or proactive defrag.
Double-free corruption. Freeing the same block twice puts it on the order's free list twice, and the next alloc returns it to two callers. Linux uses page->flags and page->private sanity checks; production builds keep some checks even outside DEBUG.
Cross-migrate-type pollution. A misbehaving allocator hands out a movable block as if it were unmovable; later, the compactor refuses to migrate it. Long-term: high-order allocations never succeed even though aggregate free is huge.
MAX_ORDER ceiling for 1 GB pages. Standard MAX_ORDER = 11 cannot express a 1 GB block. Gigantic hugepages live in a parallel reservation pool (hugetlbfs) rather than going through the buddy.
NUMA-cross coalesce attempts. Each zone has its own buddy; coalescing across zone boundaries is forbidden, even if addresses look adjacent. Bugs that drop this check have created cross-zone "buddies" that crashed the kernel.
Order-0 thrash. A workload that churns single-page allocs through the buddy at high rate beats the per-CPU pageset cache and contends on the zone lock. Symptom: ~30% CPU in __rmqueue.
Off-by-one MAX_ORDER. Some allocations sit at exactly MAX_ORDER-1 (the largest representable buddy block). Code that asks for "MAX_ORDER pages" instead of "(1 << (MAX_ORDER-1)) pages" silently fails — a perennial papercut.

Frequently asked questions

Why power-of-two blocks?

Power-of-two sizes have an O(1) buddy address: the buddy of block at offset X size 2^k is the block at offset X XOR 2^k. Coalescing on free reduces to a bit flip and a list-membership check. Any other size would require interval-tree lookups for sibling discovery. The cost is internal fragmentation — a 17-page request rounds up to 32 pages — but the rounding is bounded and the bookkeeping is constant-time.

How does coalescing actually work?

On free, the allocator XORs the freed block's address with its size to compute its buddy's address. It then checks whether the buddy is on the free list of the same order. If yes: remove the buddy from its free list, take the lower address of the two as the new block of order k+1, and recurse — the new larger block may itself have a free buddy. The recursion terminates when buddies are not both free, or when the maximum order is reached. The whole walk is O(log MAX_ORDER), so under 20 steps even for huge memories.

What is MAX_ORDER in Linux?

MAX_ORDER is the largest power-of-two block the buddy allocator manages. Default Linux: MAX_ORDER = 11, so orders 0..10 cover 2^0 = 1 page (4 KB) through 2^10 = 1024 pages (4 MB). Some architectures bump it up to support gigantic hugepages contiguously. Allocations larger than 2^(MAX_ORDER-1) pages have to come from CMA, hugetlb, or contiguous-memory carve-outs because the buddy refuses to express them as a single block.

What is the difference between a buddy allocator and slab?

Buddy hands out variable-power-of-two contiguous page blocks; slab hands out fixed-size objects. The two stack: a slab cache refills itself by requesting one or more pages from the buddy (typically a single page, sometimes order-1 or order-2 for larger objects). Slab amortizes the buddy's split/merge work across many objects of the same size, and the buddy in turn makes sure each slab gets a physically contiguous run.

What is external fragmentation in a buddy allocator?

External fragmentation appears when the free list at some order is large but every entry is split across non-mergeable buddies. /proc/buddyinfo is the diagnostic: a healthy zone has counts at multiple orders; a fragmented one has thousands at order 0 and zero at order 6+. To allocate an order-9 huge page in this state, the kernel must run the compactor to physically move movable pages out of the way, coalesce released buddies, and re-form a large contiguous block.

How does Linux track buddies?

Each zone keeps an array of free_area structures, one per order. Each free_area holds a list of free blocks at that order, broken out by migrate type (UNMOVABLE, MOVABLE, RECLAIMABLE, CMA, etc.). The struct page for each first-page-of-block holds the order in page->private and the migrate type in page flags. Coalescing checks the buddy's page->private to confirm it is also a free block of the same order before merging.

Interactive visualization

Watch the 60-second explainer

How a buddy allocator works

The moving pieces

Why a buddy allocator

Buddy vs other memory allocators

What the numbers actually look like

JavaScript implementation

Python implementation — with migrate types

Variants and real-world deployments

Common bugs and edge cases

Frequently asked questions

Related concepts