Storage Systems
RAID
Turn a pile of fallible disks into one fast, fault-tolerant volume
RAID combines several disks into one logical volume for speed or fault tolerance using striping, mirroring, and distributed parity — trading capacity, performance, and rebuild risk against how many drives can fail without losing data.
- RAID 0 usable capacityN · C (no redundancy)
- RAID 5 survives1 drive failure
- RAID 6 survives2 drive failures
- Parity overhead1 (RAID 5) or 2 (RAID 6) drives
- RAID 10 capacityN · C / 2
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How RAID works: striping, mirroring, parity
RAID — Redundant Array of Independent Disks — was named in a 1988 Berkeley paper by Patterson, Gibson, and Katz, whose argument was economic: a single large expensive disk could be beaten by a stack of cheap small ones, if you added redundancy to compensate for their higher combined failure rate. Every RAID level is a different answer to one question: given a fixed pile of disks, how do you spend their capacity to buy speed, fault tolerance, or both?
Three primitives do all the work:
- Striping splits data into fixed-size chunks (typically 64 KB to 256 KB) and lays consecutive chunks across different drives. A single large read or write now hits N drives in parallel, so throughput scales with the number of spindles. Striping alone is RAID 0 — pure speed, zero protection.
- Mirroring writes every block to two or more drives. A read can come from either copy; a write must hit both. Mirroring alone is RAID 1 — you lose half your capacity, but you can pull a drive out and keep running.
- Parity stores a computed check block — the XOR of the data blocks in a stripe — so that any one missing block can be reconstructed from the others. Parity buys single-failure tolerance for the price of just one extra drive across the whole array, instead of doubling every disk.
The XOR trick is the heart of parity RAID. For data blocks D0, D1, D2 the parity is P = D0 ⊕ D1 ⊕ D2. If D1's drive dies, you recover it as D1 = D0 ⊕ D2 ⊕ P, because XOR is its own inverse: every term cancels except the one you lost. RAID 5 distributes these parity blocks across all drives (so no single disk becomes a write bottleneck); RAID 6 adds a second, mathematically independent parity (Q, computed with Reed-Solomon coding over a Galois field) so it can solve for any two unknowns at once.
The RAID levels you actually meet
- RAID 0 (striping). N drives, N·C capacity, ~N× throughput, zero redundancy. One drive dies, everything dies. Use it for scratch space and caches you can rebuild.
- RAID 1 (mirroring). Identical copies. Survives N−1 of N drives failing. Fast reads (any copy), capacity halved. The simplest safe choice for a two-disk system.
- RAID 5 (striping + distributed single parity). Needs ≥3 drives. Usable capacity (N−1)·C. Survives exactly one failure. Great read speed; writes pay a penalty (below).
- RAID 6 (striping + dual distributed parity). Needs ≥4 drives. Usable capacity (N−2)·C. Survives any two failures — the standard for large arrays where rebuilds take a long time.
- RAID 10 (mirror of stripes, written 1+0). Pairs of mirrored drives, striped together. Capacity N·C/2. Fast and resilient, with cheap rebuilds (copy one drive, not the whole array), but the most capacity-hungry of the redundant levels.
- Nested oddities (RAID 50, 60). Stripe across multiple RAID 5 or 6 groups for arrays of dozens of drives. Marginal beyond a few groups.
When to use which level
- Pure performance, expendable data → RAID 0. Video editing scratch, build artifacts, an L2 cache tier.
- Boot/OS drives, small servers → RAID 1. Two disks, simple, fast to rebuild.
- Read-mostly bulk storage, ≤ ~2 TB drives → RAID 5. File servers, media libraries where you rarely write.
- Large arrays, big modern drives → RAID 6. The long rebuild window is exactly when a second drive is most likely to fail, so a single-parity scheme is gambling.
- Databases and write-heavy OLTP → RAID 10. No parity to recompute on small random writes, so it dodges the write penalty entirely.
The deciding factors are almost always write pattern (random small writes punish parity) and rebuild time (bigger drives widen the danger window). If your workload is write-heavy, parity RAID's read-modify-write cost will dominate; if your drives are large, single parity's rebuild exposure should scare you toward RAID 6 or mirroring.
RAID levels compared
| RAID 0 | RAID 1 | RAID 5 | RAID 6 | RAID 10 | |
|---|---|---|---|---|---|
| Minimum drives | 2 | 2 | 3 | 4 | 4 |
| Usable capacity | N·C | C | (N−1)·C | (N−2)·C | N·C / 2 |
| Failures tolerated | 0 | N−1 | 1 | 2 | 1 per mirror (up to N/2) |
| Read speed | ★★★ | ★★ | ★★★ | ★★★ | ★★★ |
| Random write speed | ★★★ | ★★ | ★ (4× penalty) | ★ (6× penalty) | ★★★ |
| Rebuild cost | n/a | copy 1 drive | read all drives | read all drives | copy 1 drive |
| Typical use | scratch, cache | OS/boot, 2-disk | read-mostly NAS | large arrays | databases, VMs |
The headline tension is parity efficiency versus write cost. RAID 5/6 are far more space-efficient than mirroring at scale — RAID 6 protects 12 drives at the cost of 2, where RAID 10 would cost 6 — but they pay for it on every small write, and they take far longer to rebuild.
What the numbers actually say
- The RAID 5 small-write penalty is 4 I/Os per logical write. To update one data block you must: read the old data, read the old parity, write the new data, write the new parity — because new parity = old parity ⊕ old data ⊕ new data. RAID 6 needs 6 I/Os (two parities to maintain). This is why parity RAID is poison for OLTP databases.
- Capacity overhead is fixed, not proportional. RAID 5 always burns exactly one drive regardless of array size; on a 12-drive array that's 8.3% overhead versus 50% for mirroring. That efficiency is the entire reason parity exists.
- Unrecoverable read error rate dominates rebuild risk. Consumer SATA drives are spec'd at one URE per 1014 bits ≈ one per 12.5 TB read. Rebuilding a RAID 5 made of six 4 TB drives reads ~20 TB — so the expected number of UREs during a single rebuild exceeds one, and on RAID 5 a single URE mid-rebuild is fatal. Enterprise drives at 1015 push this out 10×.
- Rebuild times are measured in hours to days. A 16 TB drive rebuilt at a throttled ~100 MB/s takes ~44 hours of full-array reading — 44 hours during which a second failure or a single bad sector can end you. This single fact is why the industry moved from RAID 5 to RAID 6.
JavaScript implementation
The core of parity RAID is small. Here is RAID 5 over a set of drives, plus reconstruction of a failed one — the same XOR identity the hardware controller runs.
// Each "drive" is a Uint8Array of equal length. RAID 5 stripes data
// across drives, rotating which drive holds parity per stripe.
function xorInto(target, src) {
for (let i = 0; i < target.length; i++) target[i] ^= src[i];
}
// Write one stripe of data blocks; compute and place the parity block.
// `blocks` has one entry per drive; the parity slot is left null by caller.
function writeStripe(blocks, parityIndex) {
const len = blocks.find(b => b)?.length ?? 0;
const parity = new Uint8Array(len);
blocks.forEach((b, i) => { if (i !== parityIndex && b) xorInto(parity, b); });
blocks[parityIndex] = parity; // P = XOR of all data blocks
return blocks;
}
// A drive died. Reconstruct it from every surviving block in the stripe.
function rebuildDrive(survivingBlocks, deadIndex) {
const len = survivingBlocks.find(b => b)?.length ?? 0;
const recovered = new Uint8Array(len); // 0 ⊕ x = x
survivingBlocks.forEach((b, i) => { if (i !== deadIndex && b) xorInto(recovered, b); });
return recovered; // works for data OR parity drive
}
// Demo: 3 data drives + 1 parity, then lose drive 1 and recover it.
const drives = [
Uint8Array.of(0b1010, 5, 200),
Uint8Array.of(0b0110, 9, 17),
Uint8Array.of(0b1100, 2, 99),
null, // parity slot
];
writeStripe(drives, 3);
const lost = drives[1];
drives[1] = null; // drive 1 fails
const recovered = rebuildDrive(drives, 1);
console.log(recovered.every((v, i) => v === lost[i])); // true
Two details matter. First, reconstruction is symmetric: the same XOR-of-survivors recovers a lost data drive or a lost parity drive — parity is just data to the algorithm. Second, in real RAID 5 the parity block rotates to a different drive each stripe (parityIndex = stripe % N), so no single disk becomes the write hotspot that crippled the old RAID 4 design.
Python implementation
The same idea, plus an update path showing the read-modify-write that creates the famous small-write penalty.
from functools import reduce
from operator import xor
def xor_blocks(blocks):
"""XOR a list of equal-length byte sequences."""
return bytes(reduce(xor, vals) for vals in zip(*blocks))
class Raid5:
def __init__(self, n_drives, block_len):
self.n = n_drives
self.blk = block_len
# stripe -> list of blocks (one per drive); parity rotates per stripe
self.stripes = []
def parity_index(self, stripe):
return (self.n - 1 - stripe) % self.n # left-symmetric rotation
def write_stripe(self, stripe, data_blocks):
pi = self.parity_index(stripe)
row, di = [None] * self.n, iter(data_blocks)
for d in range(self.n):
row[d] = bytes(self.blk) if d == pi else next(di)
row[pi] = xor_blocks([b for i, b in enumerate(row) if i != pi])
self.stripes.append(row)
def update_block(self, stripe, drive, new):
"""One small write = 4 I/Os: read old data + old parity, write both."""
row = self.stripes[stripe]
pi = self.parity_index(stripe)
old = row[drive]
# new_parity = old_parity XOR old_data XOR new_data
row[pi] = xor_blocks([row[pi], old, new]) # +1 read parity, +1 write parity
row[drive] = new # +1 read old, +1 write new
def rebuild(self, stripe, dead):
row = self.stripes[stripe]
return xor_blocks([b for i, b in enumerate(row) if i != dead])
raid = Raid5(n_drives=4, block_len=3)
raid.write_stripe(0, [b'\x0a\x05\xc8', b'\x06\x09\x11', b'\x0c\x02\x63'])
truth = raid.stripes[0][1]
assert raid.rebuild(0, dead=1) == truth # recover a data drive
assert raid.rebuild(0, dead=raid.parity_index(0)) is not None # or parity
The update_block method makes the write penalty concrete: a single logical write touches the data block and the parity block, and each needs a read-before-write, hence four physical I/Os. There is no shortcut — parity must reflect every change to its stripe.
Variants and modern alternatives
RAID 4 (dedicated parity). The ancestor of RAID 5 — all parity on one drive. That drive becomes a write bottleneck since every write touches it, which is exactly why RAID 5 rotates parity. Still used in NetApp's WAFL, which pairs it with NVRAM to hide the bottleneck.
RAID-Z (ZFS) and RAID-Z2/Z3. ZFS's copy-on-write design writes whole stripes and never updates in place, so it has no write hole and no read-modify-write penalty. RAID-Z3 adds a third parity, tolerating three failures — viable only because checksums let it know precisely which block is corrupt.
Declustered RAID / parity declustering. Spread reconstruction work across all drives in a large pool instead of hammering the replacement disk, so rebuilds finish faster and the danger window shrinks. Used in IBM GPFS and modern object stores.
Erasure coding. The generalization of parity: split an object into k data shards and m parity shards (Reed-Solomon), survive any m losses. Cloud object stores (S3, Ceph, HDFS EC) use schemes like 10+4 — RAID 6 is just the k+2 special case. Far better space efficiency than 3× replication at large scale.
JBOD / spanning. "Just a Bunch Of Disks" — concatenation with no redundancy and no striping. Not RAID at all, but often offered alongside it; a single failure loses only the files on that disk, not the array.
Common bugs and edge cases
- The write hole. If power dies between the data write and the parity write, parity silently diverges from data and you only find out during a rebuild, when reconstruction produces wrong bytes. Battery/flash-backed write caches, write journals, or copy-on-write (ZFS) close it.
- Treating RAID as a backup. RAID survives drive failure, not
rm -rf, ransomware, or a controller that scribbles on every disk. A mistaken delete is faithfully mirrored to every copy. You still need real backups. - Same batch, same failure. Drives bought together, from the same lot, under the same load tend to fail around the same time — exactly when a rebuild is stressing the survivors. Mix manufacturers or batches for large arrays.
- Silent bit rot. Plain RAID trusts that what it reads back is what it wrote. Without per-block checksums (ZFS, Btrfs) a flipped bit on a surviving drive is happily fed into reconstruction, corrupting the recovered data. Scrub regularly.
- RAID 5 on big modern drives. The combination of multi-terabyte disks and a one-URE-per-12.5-TB error rate makes a clean RAID 5 rebuild statistically unlikely. Use RAID 6 or mirroring above a few terabytes per drive.
- Forgetting the rebuild is a read storm. Rebuilding reads every block of every surviving drive — heavy, sustained load on disks that are already your last line of defense. Throttle it against production traffic, but know that throttling lengthens the danger window.
Frequently asked questions
Is RAID a backup?
No. RAID protects against drive hardware failure, not against deletion, ransomware, file corruption, or a controller that writes garbage to every disk at once. RAID keeps a service online when a disk dies; a backup is a separate, point-in-time copy you can restore from when the live data is wrong. You need both.
What's the difference between RAID 5 and RAID 6?
RAID 5 stores one parity block per stripe and survives one drive failure; RAID 6 stores two independent parity blocks (P and Q, using Reed-Solomon coding) and survives any two. RAID 6 costs you a second drive of capacity but covers the dangerous case where a second disk dies during the long rebuild after the first failure.
Why is RAID 5 considered risky on large drives?
Rebuilding a failed drive means reading every block of every surviving drive. On a multi-terabyte array that takes hours to days, and with consumer disks rated at one unrecoverable read error per 1014 bits, the probability of hitting a bad sector mid-rebuild becomes non-trivial — a single read error on RAID 5 during rebuild loses the whole array. RAID 6 or mirroring is preferred above a few terabytes per drive.
What is the RAID 5 write hole?
Updating a stripe requires writing both the data block and the parity block. If power is lost between those two writes, the parity no longer matches the data and the inconsistency is silent — you only discover it during a rebuild, when reconstructed data comes out wrong. Battery-backed write caches, journaling, or copy-on-write filesystems like ZFS close the hole.
How much usable capacity does each RAID level give?
For N drives of capacity C: RAID 0 gives N·C with zero redundancy; RAID 1 gives C (a single mirror set, surviving N−1 failures); RAID 5 gives (N−1)·C; RAID 6 gives (N−2)·C; RAID 10 gives N·C/2. The parity levels are the most space-efficient way to tolerate one or two failures across many drives.
Is hardware RAID or software RAID better?
Software RAID (Linux mdadm, ZFS, Windows Storage Spaces) is now the default choice: CPUs compute parity faster than dedicated controllers, arrays are portable between machines, and there's no proprietary controller to fail and strand your disks. Hardware RAID still wins where a battery-backed cache must absorb writes independent of the OS, but the gap has closed.