Cryptography

Merkle Tree

Hash tree of hashes — verify a million records by checking 20

A Merkle tree is a binary tree where every leaf is a hash of a data block and every internal node is a hash of its two children. The single root hash fingerprints the entire dataset — change any block and the root changes. Verifying that a specific block belongs to the dataset takes only O(log n) hashes — 20 hashes proves membership in a million-block set.

  • Verification path lengthO(log n) — 20 hashes for 1M leaves
  • Hash functionSHA-256 (Bitcoin), BLAKE2/3 (modern systems)
  • Root size32 bytes (one hash) — independent of dataset size
  • Tampering detectionAny single-bit change in any block changes the root
  • Used byBitcoin, Git, IPFS, ZFS, Cassandra, BitTorrent
  • VariantsStandard, sparse, sum, Patricia trie

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a Merkle tree works

Consider four data blocks: A, B, C, D. Build the tree bottom-up:

  1. Hash each block: h_A = SHA256(A), h_B = SHA256(B), h_C = SHA256(C), h_D = SHA256(D).
  2. Pair up and hash again: h_AB = SHA256(h_A || h_B), h_CD = SHA256(h_C || h_D).
  3. Hash the result: h_root = SHA256(h_AB || h_CD).
           h_root
          /      \
       h_AB      h_CD
       /  \      /  \
     h_A  h_B  h_C  h_D
      |    |    |    |
      A    B    C    D

The root hash h_root is a 32-byte fingerprint of all four blocks together. Change any single bit in A, B, C, or D and the root hash changes. Two important properties:

  1. Tamper detection. Anyone with the root hash can verify the dataset hasn't changed by recomputing the tree.
  2. Efficient inclusion proofs. To prove A is in the set without revealing B, C, or D — give the verifier just h_B and h_CD (the sibling hashes on the path from A to the root). The verifier computes h_A from A, then h_AB from (h_A, h_B), then h_root from (h_AB, h_CD), and checks against the known root. log₂(n) hashes total.

Merkle proofs in detail

For a tree of n leaves, a proof of inclusion for any specific leaf is log₂(n) sibling hashes plus the leaf data itself. Examples:

LeavesProof size (hashes)Bytes (32 per hash)
164128
1,00010320
1,000,00020640
1,000,000,00030960

Even for a billion leaves, a proof is under 1 KB. This is the killer feature — anyone can verify membership in any-size dataset with a tiny proof and a single trusted root hash.

Where Merkle trees are deployed

SystemHow it uses Merkle trees
BitcoinEach block's header has a Merkle root of all transactions in the block. Light wallets verify single-transaction inclusion in O(log n).
GitEvery commit is a hash of (tree, parents, metadata). Every tree is a hash of (filenames, child hashes). The whole repo is a Merkle DAG.
IPFSFiles are split into blocks; blocks hash into a Merkle DAG. Content-addressed storage at internet scale.
BitTorrent v2Per-file Merkle tree allows integrity verification of each block during download.
ZFS / btrfsFilesystem stores Merkle trees of file content; on-disk corruption detected automatically (silent corruption protection).
Apache CassandraAnti-entropy repair — compares Merkle trees of data ranges between replicas to find diff in O(log n) instead of O(n).
Certificate TransparencyPublic log of all issued TLS certificates is a Merkle tree; CAs prove a certificate is logged.
EthereumModified Merkle Patricia trie stores all account state. Each block has roots for state, transactions, and receipts.

JavaScript implementation

const crypto = require('crypto');
const hash = (data) => crypto.createHash('sha256').update(data).digest();

class MerkleTree {
  constructor(blocks) {
    this.leaves = blocks.map(b => hash(b));
    this.layers = [this.leaves];
    while (this.layers[this.layers.length - 1].length > 1) {
      const prev = this.layers[this.layers.length - 1];
      const next = [];
      for (let i = 0; i < prev.length; i += 2) {
        if (i + 1 < prev.length) {
          next.push(hash(Buffer.concat([prev[i], prev[i+1]])));
        } else {
          // Odd leaf — duplicate (Bitcoin convention) or just promote
          next.push(hash(Buffer.concat([prev[i], prev[i]])));
        }
      }
      this.layers.push(next);
    }
  }

  get root() { return this.layers[this.layers.length - 1][0]; }

  // Proof: list of (siblingHash, isLeftSibling) pairs from leaf to root
  getProof(leafIndex) {
    const proof = [];
    let idx = leafIndex;
    for (let level = 0; level < this.layers.length - 1; level++) {
      const isRight = idx & 1;
      const siblingIdx = isRight ? idx - 1 : idx + 1;
      const layer = this.layers[level];
      const sibling = siblingIdx < layer.length ? layer[siblingIdx] : layer[idx]; // self-sibling for odd
      proof.push({ hash: sibling, isLeft: isRight === 1 });
      idx = idx >> 1;
    }
    return proof;
  }

  static verify(leafData, proof, root) {
    let h = hash(leafData);
    for (const { hash: sibling, isLeft } of proof) {
      const combined = isLeft
        ? Buffer.concat([sibling, h])
        : Buffer.concat([h, sibling]);
      h = hash(combined);
    }
    return h.equals(root);
  }
}

// Usage
const blocks = ['A', 'B', 'C', 'D'].map(s => Buffer.from(s));
const tree = new MerkleTree(blocks);
console.log('Root:', tree.root.toString('hex'));

const proof = tree.getProof(2); // proof for block C
const valid = MerkleTree.verify(blocks[2], proof, tree.root);
console.log('Proof valid:', valid);  // true

Merkle Patricia tries (Ethereum)

A standard Merkle tree is a balanced binary tree — fine for static data like Bitcoin transactions. For mutable, sparse keyed data (like Ethereum's account state with addresses across the entire 160-bit space), a different structure is needed.

Merkle Patricia tries combine three ideas:

  • Trie — keys form paths from root to leaf, character by character.
  • Patricia compression — long single-child chains collapse into one edge with a multi-character label.
  • Merkle hashing — each node is referenced by the hash of its serialized contents, giving the standard tamper-evidence property.

The result: efficient updates (only the path from root to changed leaf needs re-hashing), efficient lookups (logarithmic in path length), and Merkle proofs of any state at any block height. Critical for Ethereum's "trustless light client" model.

When to use a Merkle tree

  • Immutable distributed data integrity. Bitcoin, Git, IPFS — anywhere the data is published and consumers need to verify they got the right thing.
  • Light client verification. Mobile wallets verifying transactions without full blockchain download.
  • Efficient diff and sync. Cassandra's anti-entropy repair, ZFS scrub, BitTorrent block verification — comparing Merkle trees finds differences in O(log n) instead of O(n).
  • Content-addressed storage. File deduplication, P2P content distribution, "this block hash is its own identity" systems.
  • Audit logs. Certificate Transparency, distributed ledgers, append-only logs that need cryptographic proof of inclusion.

Common Merkle tree mistakes

  • Not handling odd numbers of leaves. When a level has an odd number of nodes, you must decide how to pair the last one. Bitcoin duplicates it (hash of node with itself); Ethereum pads with a NULL hash; some implementations promote it to the next level. All are valid; the choice must be consistent across the system.
  • Hashing leaves and internal nodes the same way. Vulnerable to "type confusion" — an attacker can present an internal node hash as if it were a leaf hash, or vice versa, sometimes leading to forged proofs. Modern designs (RFC 6962, Certificate Transparency) prepend a 0x00 byte for leaves and 0x01 for internal nodes during hashing.
  • Using a non-cryptographic hash. MurmurHash, xxHash, FNV — fast but trivially collision-able. For security-critical Merkle trees use SHA-256, SHA-3, or BLAKE2/3. For non-security uses (file dedup), faster hashes are fine.
  • Verifying without the trusted root. A Merkle proof only proves "this leaf is consistent with this root." If you don't have a trusted source for the root hash, the proof tells you nothing. Bitcoin gets the root from the proof-of-work-secured blockchain; Git's commits get hashed and signed.
  • Confusing tree position with proof position. The proof gives sibling hashes, not the path itself. The verifier needs to know whether each sibling is the left or right child to combine them correctly.

Frequently asked questions

How does a Merkle proof work?

To prove a leaf is in the tree, provide the leaf data plus the sibling hash at every level from leaf to root — that's the "Merkle path" or "audit path." A verifier computes the leaf hash, then iteratively hashes it with each sibling from the path, and checks the final result matches the known root. log₂(n) hashes total. The verifier doesn't need any other leaf or any internal node — just the path siblings.

Why does Bitcoin use a Merkle tree?

Each Bitcoin block contains many transactions (~3,000 typical). The block header includes the Merkle root of all transaction hashes. A light wallet ("SPV node") that doesn't store full blocks can verify a single transaction is included in a block by requesting the Merkle proof — log₂(3000) ≈ 12 hashes. Without the Merkle tree, the wallet would need to download the entire block to verify one transaction.

How is Git's "tree" related to a Merkle tree?

Git is essentially a Merkle tree of the project. Each blob is the hash of a file's contents; each tree object is the hash of a directory listing (filenames + child hashes); each commit is a hash of (tree, parent commit, metadata). The commit hash uniquely fingerprints the entire project state — change one byte anywhere and every hash up to the commit changes. That's how Git knows commits are immutable.

Why use SHA-256 instead of a faster hash?

SHA-256 is collision-resistant — no two different blocks can produce the same hash (in practice). Faster non-cryptographic hashes like xxHash or MurmurHash have collisions you can construct in milliseconds, defeating the integrity guarantee. Bitcoin uses SHA-256 because the cryptographic guarantees are non-negotiable. For systems where adversarial collisions aren't a concern (file deduplication, content-addressed storage), faster hashes like BLAKE3 are fine.

What's the difference between a Merkle tree and a hash chain?

A hash chain is a linked sequence — each block hashes the previous block, like Bitcoin's block chain. To verify item N you must walk through items 0 to N — O(n) work. A Merkle tree is a tree of hashes; verifying any leaf is O(log n). Bitcoin uses BOTH — the chain links blocks in time, the Merkle tree organizes transactions within each block.

How does BitTorrent use Merkle trees?

BitTorrent v2 uses Merkle trees for per-file integrity verification. Each file is split into blocks; blocks are hashed into a Merkle tree; the root is in the .torrent file. As you download blocks from peers, you verify each block's hash matches its position in the tree. Bad peers serving corrupted blocks are detected immediately and disconnected — without the Merkle tree, corruption could propagate.

Can Merkle trees handle insertions and deletions?

Pure Merkle trees are append-only — modifying any leaf changes every hash up the path, and the root hash cycles. For mutable data, "Merkle Patricia tries" (Ethereum) and "sparse Merkle trees" support efficient updates with the same verification properties. The trade-off is more complex code and slightly higher tree depth.