Compression
LZW Compression
Streaming dictionary coder — no external table, just codes
LZW (Lempel-Ziv-Welch) is a lossless dictionary coder that builds a code table on the fly. The encoder emits codes; the decoder rebuilds the same dictionary. Used in GIF, TIFF, and Unix compress.
- Encode / decodeO(n) time
- MemoryO(d) — d-entry dictionary, typically 4096
- Output code width9-12 bits (adaptive)
- Compression ratio~30-50% on English text
- Dictionary transmissionNone — both sides derive from code stream
- Used byGIF, TIFF, Unix compress, PDF LZWDecode
Interactive visualization
Watch the dictionary grow code-by-code as the encoder streams the input.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How LZW works
Both encoder and decoder start with the same initial dictionary. For byte-oriented LZW, that's 256 entries — one for each possible byte value, mapped to codes 0-255. The dictionary grows by one entry every time the encoder sees a substring that has just been newly extended.
The encoder algorithm:
- Read characters into a buffer
W, starting empty. - For each input character
c:- If
W + cis in the dictionary, setW ← W + c. - Otherwise: emit the code for
W, addW + cto the dictionary, setW ← c.
- If
- After the loop, emit the code for the final
W.
The decoder algorithm is symmetric — it consumes codes, outputs strings, and adds new entries to its dictionary in lockstep with the encoder.
Worked example — encoding ABABABA
Initial dictionary: A=1, B=2 (we'll use 1-indexed codes for clarity; real implementations start at 256 after reserving 0-255 for single bytes).
Input: A B A B A B A
Step 1: W="", read A. W+A="A" in dict → W="A".
Step 2: W="A", read B. W+B="AB" NOT in dict.
Emit code(A)=1. Add AB=3 to dict. W="B".
Step 3: W="B", read A. W+A="BA" NOT in dict.
Emit code(B)=2. Add BA=4 to dict. W="A".
Step 4: W="A", read B. W+B="AB" in dict → W="AB".
Step 5: W="AB", read A. W+A="ABA" NOT in dict.
Emit code(AB)=3. Add ABA=5 to dict. W="A".
Step 6: W="A", read B. W+B="AB" in dict → W="AB".
Step 7: W="AB", read A. W+A="ABA" in dict → W="ABA".
End: emit code(ABA)=5.
Output codes: 1, 2, 3, 5
Dictionary built: A=1, B=2, AB=3, BA=4, ABA=5
Four codes encoded seven characters. Each subsequent occurrence of AB or ABA would use even fewer codes — the dictionary adapts.
The decoder rebuilds the same dictionary
The decoder starts with the same initial dictionary {A=1, B=2} and processes the code stream 1, 2, 3, 5.
prev_string = decode(first_code)
output prev_string
for each subsequent code c:
if c in dict:
s = dict[c]
elif c == next_code: // special "cScSc" case
s = prev_string + prev_string[0]
else:
error
output s
add (prev_string + s[0]) to dict // mirror the encoder
prev_string = s
Trace on 1, 2, 3, 5:
Step 1: read 1 → output A. prev=A.
Step 2: read 2 → output B. Add A+B[0]="AB" as code 3. prev=B.
Step 3: read 3 → output AB. Add B+A[0]="BA" as code 4. prev=AB.
Step 4: read 5 → 5 is NOT yet in dict (we only added through 4).
Special case: s = prev + prev[0] = AB + A = "ABA". Output ABA.
Add AB+A[0]="ABA" as code 5. prev=ABA.
Reconstructed: A B AB ABA = ABABABA ✓
The special case in step 4 is LZW's most famous trap. The encoder emits code 5 (ABA) — but the decoder hasn't yet added code 5 when it tries to decode it. This happens whenever a new code is used in the very same step it's defined. The decoder must detect this and construct the string from prev + prev[0].
LZW variants
| Variant | Trick | Use case |
|---|---|---|
| Original LZW (12-bit) | Fixed dictionary cap at 4096 entries | GIF, TIFF, simple implementations |
| Variable-width codes | Start at 9 bits, grow to 12 as dictionary fills | Unix compress, GIF — saves bits early on |
| CLEAR code | Reserved code resets dictionary when full or unhelpful | Long inputs whose statistics drift |
| LZW-Welch with adaptive reset | Auto-clear when compression ratio degrades | Unix compress |
| LZMW / LZAP | Add concatenations of previously emitted codes to dictionary | Better ratio on repetitive inputs, slower |
| LZ77 (DEFLATE) | Sliding-window back-references — no dictionary at all | Modern alternative; gzip, PNG, zlib |
When to use LZW
- You're producing GIF or TIFF files. Both formats specify LZW. (TIFF supports other codecs too, but LZW is the most portable.)
- You need a streaming, single-pass compressor. LZW makes one forward pass over the input with no lookahead and no separate dictionary transmission — ideal for serial communication or memory-constrained devices.
- You want simple, patent-free code. The LZW patent expired worldwide by 2004. The algorithm fits in ~100 lines and runs on tiny microcontrollers.
- Your input has many short repetitions. LZW excels at repeated 2-8 character substrings — drawing primitives, indexed-color image data, simple log files.
Skip LZW for new general-purpose compression — zstd, brotli, and even gzip outperform it by 10-30% on most inputs because they combine LZ77 with sophisticated entropy coding. LZW's single fixed code width loses to modern Huffman or range coders.
LZW vs other compressors
| LZW | LZ77 / DEFLATE | BWT (bzip2) | ZSTD | |
|---|---|---|---|---|
| Strategy | Dictionary of substrings | Sliding-window back-references | Block-sort then entropy code | LZ77 + ANS |
| Streaming | Yes (single-pass) | Yes | No (block-based) | Yes (block or stream) |
| Memory | ~16-32 KB | ~32 KB window | ~5× block size | Configurable |
| Encode speed | ~50-100 MB/s | ~100-300 MB/s | ~15-25 MB/s | ~400-1000 MB/s |
| Compression on text | ~30-50% | ~60-65% | ~75-80% | ~70-75% |
| Patent history | Patented 1985-2004 | Always free | Always free | Free (Facebook) |
| Where used today | GIF, TIFF, PDF | gzip, PNG, zlib | bzip2, FM-index | Linux kernel, modern web |
Pseudo-code
// LZW encode.
function encode(input):
dict = { byte_value: byte_value for byte_value in 0..255 }
next_code = 256
W = ""
output = []
for c in input:
WC = W + c
if WC in dict:
W = WC
else:
output.append(dict[W])
dict[WC] = next_code
next_code += 1
W = c
if W: output.append(dict[W])
return output
// LZW decode.
function decode(codes):
dict = { i: chr(i) for i in 0..255 }
next_code = 256
prev = chr(codes[0])
output = [prev]
for c in codes[1:]:
if c in dict:
s = dict[c]
elif c == next_code:
s = prev + prev[0]
else:
raise "invalid LZW stream"
output.append(s)
dict[next_code] = prev + s[0]
next_code += 1
prev = s
return "".join(output)
JavaScript implementation
function lzwEncode(input) {
const dict = new Map();
for (let i = 0; i < 256; i++) dict.set(String.fromCharCode(i), i);
let nextCode = 256;
let w = '';
const output = [];
for (const c of input) {
const wc = w + c;
if (dict.has(wc)) {
w = wc;
} else {
output.push(dict.get(w));
dict.set(wc, nextCode++);
w = c;
}
}
if (w) output.push(dict.get(w));
return output;
}
function lzwDecode(codes) {
const dict = new Map();
for (let i = 0; i < 256; i++) dict.set(i, String.fromCharCode(i));
let nextCode = 256;
let prev = dict.get(codes[0]);
const output = [prev];
for (let i = 1; i < codes.length; i++) {
const c = codes[i];
let s;
if (dict.has(c)) {
s = dict.get(c);
} else if (c === nextCode) {
s = prev + prev[0]; // the famous edge case
} else {
throw new Error('invalid LZW stream');
}
output.push(s);
dict.set(nextCode++, prev + s[0]);
prev = s;
}
return output.join('');
}
const codes = lzwEncode('TOBEORNOTTOBEORTOBEORNOT');
console.log(codes); // [84,79,66,69,79,82,78,79,84,256,258,260,265,259,261,263]
console.log(lzwDecode(codes));
// TOBEORNOTTOBEORTOBEORNOT — round-trip exact
Python implementation
def lzw_encode(data):
dict_ = {chr(i): i for i in range(256)}
next_code = 256
w = ''
output = []
for c in data:
wc = w + c
if wc in dict_:
w = wc
else:
output.append(dict_[w])
dict_[wc] = next_code
next_code += 1
w = c
if w: output.append(dict_[w])
return output
def lzw_decode(codes):
dict_ = {i: chr(i) for i in range(256)}
next_code = 256
prev = dict_[codes[0]]
output = [prev]
for c in codes[1:]:
if c in dict_:
s = dict_[c]
elif c == next_code:
s = prev + prev[0]
else:
raise ValueError('invalid LZW stream')
output.append(s)
dict_[next_code] = prev + s[0]
next_code += 1
prev = s
return ''.join(output)
codes = lzw_encode('TOBEORNOTTOBEORTOBEORNOT')
print(codes)
print(lzw_decode(codes))
Real implementations pack the integer codes into a bitstream — 9 bits per code until the dictionary exceeds 511 entries, then 10 bits up to 1023, and so on. Unix compress caps at 16 bits (65,536 entries) by default.
Common LZW bugs and edge cases
- The cScSc case. If the encoder emits a code in the same step it defines it, the decoder reads a code it hasn't added yet. Always check
code == next_codeand construct the string asprev + prev[0]. Classic bug. - Dictionary overflow. When the dictionary is full and no CLEAR code is used, you have three options: stop adding entries, ignore additions (compression slowly drifts), or issue CLEAR and restart. Most real implementations issue CLEAR.
- Code width mismatches. Encoder and decoder must agree on when to widen codes (9 → 10 → 11 → 12 bits). GIF widens after code N rather than after entry N, an off-by-one that bit-trips many implementations.
- Endianness in the bitstream. GIF packs LZW codes LSB-first within bytes. TIFF can be either. Getting this wrong produces a valid-looking but garbled output.
- Initial table size. Some LZW variants use 257 initial codes (with code 256 as CLEAR), some use 258 (with 257 as END-OF-STREAM). The format spec dictates which.
- Single-character input. If the input is one byte, the encoder emits a single code and never adds to the dictionary. Decoder must handle this gracefully — the loop never executes.
Performance in real systems
- Unix compress (.Z files): ~50 MB/s encode, ~80 MB/s decode on a modern CPU. ~40-50% reduction on English text. Now obsolete in favor of gzip.
- GIF encoding: Single-pass LZW over a quantized indexed-color image. Typical 1024×1024 GIF takes ~20 ms to encode. Dictionary capped at 12 bits.
- TIFF LZW: Often beats DEFLATE on simple line drawings; loses to DEFLATE on photographs.
- vs gzip: On a 1 MB English text, LZW compresses to ~550 KB; gzip to ~340 KB. The gap widens with input size — LZW's fixed-width codes can't compete with gzip's adaptive Huffman.
- Memory footprint: 12-bit LZW uses ~16-32 KB of RAM for the dictionary (depending on hash-table overhead). Fits in L1 cache; minimal cache misses during decode.
- Patent legacy: Adobe's Postscript still uses LZW for embedded image streams in part because it predated DEFLATE. PDF kept LZWDecode as a filter for backward compatibility.
LZW is one of the most elegant compression algorithms ever published — a few dozen lines of code, single-pass streaming, no dictionary transmission, simple state machine. Its commercial trajectory was derailed by patent enforcement, but its theoretical contribution to streaming dictionary coders shaped every modern compressor.
Frequently asked questions
How does LZW work without sending a dictionary?
Encoder and decoder both start with the same initial dictionary — usually the 256 single-byte values, each mapped to its own code (0-255). As the encoder processes the input, it adds new multi-byte entries to its dictionary in a deterministic order. The decoder mirrors this: when it reads code X, it both outputs the corresponding string and adds the same predicted entry to its dictionary. Same input, same dictionary growth — no external transmission required.
What's the difference between LZW and LZ77?
LZ77 outputs (distance, length, next-char) triples that reference a sliding window of the recent input. LZW outputs codes into a growing dictionary of substrings — no distances, just codes. LZ77 is simpler to implement, handles long-range repetition well, and is the basis of DEFLATE (gzip, PNG). LZW handles short repetitions efficiently and was easier to patent — which is why GIF used it. Both compress similarly on typical text; LZ77 wins on inputs with long repeats far apart.
Why does the LZW dictionary need to reset?
The dictionary grows during encoding, and codes get longer as the dictionary grows (9 bits, then 10, 11, 12...). At 12 bits the dictionary is full at 4096 entries. Options: stop adding entries (compression degrades as data drifts), or send a special CLEAR code and restart with the initial 256-entry dictionary. Unix compress and GIF both use the CLEAR code approach, which adapts to changing data.
What's the famous LZW patent story?
Welch's 1984 paper went unpatented at first, but Sperry filed shortly after. Unisys (post-merger) began enforcing the patent in 1994, demanding royalties for GIF encoders. This triggered the 'Burn All GIFs' movement and accelerated PNG's adoption (which uses DEFLATE, not LZW). The patent expired worldwide by 2004, but the damage to LZW's reputation was done — modern compressors stuck with LZ77 derivatives.
What's the time and space complexity of LZW?
O(n) time for both encode and decode with a trie- or hash-table-backed dictionary — each input byte triggers one dictionary lookup and at most one insertion. Memory is O(d) where d is the dictionary size — typically capped at 4096 entries (12-bit codes) using ~16-32 KB of RAM. Compression ratio is data-dependent but typically reaches 30-50% reduction on English text — comparable to gzip on small inputs, worse on large ones.
Is LZW still used today?
Yes, but in niches. GIF files still use LZW for animated graphics on the web. TIFF supports LZW as one of several compression methods. The Unix compress utility (.Z files) was once standard but lost ground to gzip in the 1990s. PDF files may use LZWDecode for embedded streams. Modern general-purpose compressors (gzip, zstd, brotli) use LZ77 derivatives instead — they compress better on large inputs and have no patent history.
How big can the LZW dictionary grow?
By convention, LZW caps the dictionary at 4096 entries (12-bit codes), which fits 256 literals + 3840 multi-byte entries. Some variants extend to 16-bit codes (65,536 entries) for better compression on large repetitive inputs. The trade-off is that wider codes cost more bits per output — only worth it when the dictionary fills with truly useful long substrings.