Encoding
Base64 Encoding
Three bytes in, four ASCII characters out — universal binary-to-text transport
Base64 maps 3 bytes (24 bits) of binary to 4 characters drawn from a 64-symbol ASCII alphabet, expanding data by exactly 33%. The currency of MIME email, JWT tokens, and data URIs.
- Expansion3 bytes → 4 chars (33% growth)
- Alphabet64 chars: A-Z, a-z, 0-9, +, /
- Padding'=' when input not multiple of 3
- Encode / decodeO(n) time, > 10 GB/s with SIMD
- Variantsbase64url for JWT and URLs
- Used byMIME, JWT, data URIs, Basic auth, PEM
Interactive visualization
Watch the 24-bit string of "Man" slide into four 6-bit chunks and map to "TWFu" — Base64's canonical worked example.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How Base64 works
Computers store data as bytes — 8-bit values from 0 to 255. Most of those byte values are not printable: 0x00 is the null terminator, 0x01-0x1F are control codes, 0x7F is DEL, anything above 0x7F is non-ASCII. A surprising number of channels still refuse to transport arbitrary bytes: SMTP mail historically required 7-bit clean, URLs reserve syntactic characters, HTML attributes can't contain unescaped quotes. Whenever binary data needs to ride through a text-only pipe, you encode.
Base64 is the canonical choice. It takes any byte sequence and produces an output drawn from a 64-character alphabet that is universally safe: A-Z (values 0-25), a-z (26-51), 0-9 (52-61), + (62), and / (63). All 64 characters are ASCII and most are even URL-safe in their alphanumeric form.
The encoding rule:
- Group input into 3-byte chunks. Each chunk is 24 bits.
- Re-slice into 6-bit groups. 24 bits ÷ 6 = 4 groups. Each group is a value 0-63.
- Map each 6-bit value to the alphabet. Look up the character at that index.
- Pad the final chunk if needed. If the last group has only 1 or 2 bytes, fill the unused 6-bit slots with zero bits and emit
=characters to mark the padding.
Worked example — "Man" → "TWFu"
The string Man is three ASCII bytes: M=0x4D=77, a=0x61=97, n=0x6E=110. Concatenated as 24 bits:
M a n
01001101 01100001 01101110
Re-group as four 6-bit values:
010011 010110 000101 101110
19 22 5 46
Map through the alphabet: 19→T, 22→W, 5→F, 46→u. The output is TWFu — four characters from three input bytes, a precise 4/3 expansion. This is the example RFC 4648 itself uses, and it's worth memorizing as a sanity check.
Now imagine the input is just Ma (two bytes, 16 bits). The encoder pads with zero bits to fill the third 6-bit group, then emits one =:
M a (no n)
01001101 01100001 0000(pad)
010011 010110 0001 00 → T W E =
(4 chars including =)
And for one byte (M alone): two = characters trail the encoded prefix TQ==.
Alphabet variants
| Variant | Char 62 | Char 63 | Padding | Used by |
|---|---|---|---|---|
| Standard (RFC 4648 §4) | + | / | = required | MIME email, PEM, basic auth |
| URL-safe (RFC 4648 §5) | - | _ | = optional | URL params, filenames |
| base64url (JWT/JOSE) | - | _ | = omitted | JWT, OAuth2, JOSE specs |
| MIME quoted (RFC 2045) | + | / | = required; CRLF every 76 chars | SMTP message bodies |
| YUI/y64 (custom) | . | _ | varies | Yahoo cookies, legacy systems |
| Base64 with whitespace ignored | + | / | = required | OpenSSL CLI, lenient parsers |
The two consequential decisions are whether +// or -/_ appears in the alphabet, and whether padding is required. JWT chose URL-safe-nopad because tokens often appear in URL fragments where = needs to be percent-encoded — annoying enough that the standard drops it.
When to use Base64
- Embedding binary in a text format. JSON doesn't have a binary type; SVG/HTML attributes can't carry raw bytes; YAML and TOML are text-only. Base64 (or hex) is the bridge.
- HTTP Basic auth.
Authorization: Basic dXNlcjpwYXNz— Base64 ofuser:pass. Not encryption, just transit-safe encoding of the credential pair. - Inline images and small assets. A
data:image/png;base64,...URI inlines a small image into HTML/CSS. Saves a round trip; the 33% penalty doesn't matter for small files. - PEM certificates and keys. X.509 certs, RSA keys, and SSH public keys are stored as Base64 with header/footer lines for human-readable diffs and easy copy-paste.
- JWT tokens. The three dot-separated segments of a JWT are each base64url-encoded JSON or signature bytes. Compact, URL-safe, parseable in one regex.
Skip Base64 when bandwidth or storage is tight and the channel can carry binary directly (binary HTTP/2 frames, BSON, MessagePack). Skip it when you actually need encryption — Base64 is not a cipher.
Base64 vs other binary-to-text encodings
| Encoding | Alphabet size | Expansion | Notes |
|---|---|---|---|
| Hex (Base16) | 16 | 2× (100%) | Trivial to read; doubles the size |
| Base32 | 32 | 1.6× (60%) | Case-insensitive; used in DNS-based TXT records |
| Base64 | 64 | 1.33× (33%) | Universal; default for MIME |
| Base85 (Ascii85) | 85 | 1.25× (25%) | Adobe PDF; trickier alphabet |
| Base91 | 91 | 1.23× (23%) | Niche; needs full ASCII printable range |
| Quoted-printable | variable | ~1.05× (if mostly ASCII) | MIME's other encoding; degrades on binary |
Base64 wins on universality. Higher-base encodings give better ratios but use characters that aren't safe in every channel. Base32 wins when the channel is case-insensitive (DNS, voice-readout, OCR). Hex wins on debuggability — you can read it at a glance.
Pseudo-code
// Base64 encode.
ALPHA = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
function base64Encode(bytes):
out = []
for i = 0; i < bytes.length; i += 3:
b1 = bytes[i]
b2 = bytes[i+1] if i+1 < bytes.length else 0
b3 = bytes[i+2] if i+2 < bytes.length else 0
triple = (b1 << 16) | (b2 << 8) | b3
out.append(ALPHA[(triple >> 18) & 0x3F])
out.append(ALPHA[(triple >> 12) & 0x3F])
out.append(ALPHA[(triple >> 6) & 0x3F] if i+1 < bytes.length else '=')
out.append(ALPHA[triple & 0x3F] if i+2 < bytes.length else '=')
return join(out)
// Base64 decode (reverse mapping).
function base64Decode(text):
text = text.strip("=")
out = bytes()
for i = 0; i < text.length; i += 4:
v1 = ALPHA.index(text[i])
v2 = ALPHA.index(text[i+1])
v3 = ALPHA.index(text[i+2]) if i+2 < text.length else 0
v4 = ALPHA.index(text[i+3]) if i+3 < text.length else 0
triple = (v1 << 18) | (v2 << 12) | (v3 << 6) | v4
out.append((triple >> 16) & 0xFF)
if i+2 < text.length: out.append((triple >> 8) & 0xFF)
if i+3 < text.length: out.append(triple & 0xFF)
return out
JavaScript implementation
const ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
function base64Encode(bytes) {
let out = '';
for (let i = 0; i < bytes.length; i += 3) {
const b1 = bytes[i];
const b2 = i + 1 < bytes.length ? bytes[i + 1] : 0;
const b3 = i + 2 < bytes.length ? bytes[i + 2] : 0;
const t = (b1 << 16) | (b2 << 8) | b3;
out += ALPHA[(t >> 18) & 0x3F];
out += ALPHA[(t >> 12) & 0x3F];
out += i + 1 < bytes.length ? ALPHA[(t >> 6) & 0x3F] : '=';
out += i + 2 < bytes.length ? ALPHA[t & 0x3F] : '=';
}
return out;
}
function base64Decode(s) {
s = s.replace(/=+$/, '');
const out = [];
for (let i = 0; i < s.length; i += 4) {
const v1 = ALPHA.indexOf(s[i]);
const v2 = ALPHA.indexOf(s[i + 1]);
const v3 = i + 2 < s.length ? ALPHA.indexOf(s[i + 2]) : 0;
const v4 = i + 3 < s.length ? ALPHA.indexOf(s[i + 3]) : 0;
const t = (v1 << 18) | (v2 << 12) | (v3 << 6) | v4;
out.push((t >> 16) & 0xFF);
if (i + 2 < s.length) out.push((t >> 8) & 0xFF);
if (i + 3 < s.length) out.push(t & 0xFF);
}
return new Uint8Array(out);
}
const msg = new TextEncoder().encode('Man');
console.log(base64Encode(msg)); // "TWFu"
console.log(new TextDecoder().decode(base64Decode('TWFu'))); // "Man"
// Built-in alternative for browsers:
console.log(btoa('Man')); // "TWFu"
console.log(atob('TWFu')); // "Man"
Python implementation
import base64
# Standard
encoded = base64.b64encode(b'Man')
print(encoded) # b'TWFu'
print(base64.b64decode(encoded)) # b'Man'
# URL-safe variant (replaces + with -, / with _)
print(base64.urlsafe_b64encode(b'\xff\xfb')) # b'__s='
# Manual implementation, for instruction:
ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
def b64_encode(data: bytes) -> str:
out = []
for i in range(0, len(data), 3):
chunk = data[i:i+3]
n = len(chunk)
b1 = chunk[0]
b2 = chunk[1] if n > 1 else 0
b3 = chunk[2] if n > 2 else 0
t = (b1 << 16) | (b2 << 8) | b3
out.append(ALPHA[(t >> 18) & 0x3F])
out.append(ALPHA[(t >> 12) & 0x3F])
out.append(ALPHA[(t >> 6) & 0x3F] if n > 1 else '=')
out.append(ALPHA[t & 0x3F] if n > 2 else '=')
return ''.join(out)
print(b64_encode(b'Man')) # TWFu
print(b64_encode(b'Ma')) # TWE=
print(b64_encode(b'M')) # TQ==
Common Base64 bugs and edge cases
- Forgetting URL-safe substitution. Base64 output sent into a URL query string corrupts on parsers that interpret
+as space. Decoder receives garbage. Use base64url any time the output might enter a URL. - Stripping padding when decoder requires it. Some strict decoders reject inputs that aren't a multiple of 4 characters. JWT decoders typically tolerate missing
=; Java'sBase64.getDecoder()does not. Always know your decoder's padding policy. - Whitespace in the input. MIME mandates CRLF every 76 characters. A naive decoder that doesn't skip whitespace will reject these inputs. Use a lenient decoder or pre-strip whitespace.
- Treating Base64 as encryption. Beginners sometimes obscure passwords by Base64-encoding them. This protects against literally nothing — any browser console decodes it instantly. Use real cryptography for secrecy.
- UTF-8 round-trip mistakes. Base64 operates on bytes, not characters. Encoding
"héllo"in JavaScript withbtoathrows on non-Latin-1 characters; you must first UTF-8 encode to bytes. - Mismatched alphabets. A token encoded with base64url cannot be decoded with the standard alphabet — the
-and_characters are unknown. Most libraries default to standard; JWT libraries default to url. Mixing them silently produces invalid output.
Performance in real systems
- Browser
btoa/atob: ~500 MB/s on modern V8; runs in main-thread JS, can block on large inputs > 1 MB. - Node
Buffer.from(...).toString('base64'): ~2 GB/s with native code; preferred overbtoain Node. - SIMD libraries (base64-simd, fastbase64): 10-15 GB/s on AVX-512; used in JSON parsers (simdjson) for big inline binary fields.
- JWT signing roundtrip: <1 ms total for typical 500-byte tokens; Base64 is a fraction of that.
- MIME email decode: Negligible — even on 1990s hardware Base64 was never the bottleneck of email transport.
Base64 is the encoding that has long since stopped being interesting to optimize — fast enough on any modern hardware that the only reason to think about it is correctness. Get the alphabet right, get the padding right, and ship.
Frequently asked questions
Why does Base64 exist when binary is more compact?
Many transport channels can't carry arbitrary bytes safely. SMTP mail was originally 7-bit ASCII only; HTTP headers can't contain raw newlines or null bytes; URLs reserve characters like '/' and '?'; HTML inline data needs to be quotable inside attributes. Base64 maps every byte to a safe ASCII subset — A-Z, a-z, 0-9, +, / — so binary content can pass through these channels intact. The 33% size penalty is the trade for that universal safety.
How exactly does the 3-byte → 4-character conversion work?
Take 3 input bytes — that's 24 bits. Slice the 24 bits into four 6-bit groups (each value 0-63). Map each 6-bit value to one of the 64 alphabet characters: 0→A, 1→B, ..., 25→Z, 26→a, ..., 51→z, 52→0, ..., 61→9, 62→+, 63→/. Concatenate the four characters. Decoding reverses: each character maps back to a 6-bit value, four characters give 24 bits, regroup into 3 bytes.
What is the '=' padding for?
When the input isn't a multiple of 3 bytes, the final group has only 1 or 2 bytes. The encoder still pads the 6-bit groups to 4 characters using zero bits, then appends '=' to mark how much was padding: 1 input byte → 2 chars + '==', 2 input bytes → 3 chars + '='. Padding lets the decoder know the exact original length without a separate length field. Some variants (base64url-nopad, used in JWT) omit '=' and recover length from context.
What's the expansion ratio?
Exactly 4/3 ≈ 33% growth for input lengths that are a multiple of 3. For other lengths, the encoded output rounds up to the next multiple of 4 characters. A 100-byte input becomes ⌈100/3⌉ × 4 = 136 characters — 36% growth. MIME email also inserts a CRLF every 76 characters, adding another ~2.6% (so ~36% total). Data URIs and JWT skip the line-wrapping, hitting the cleaner 33-36%.
What's the difference between base64 and base64url?
The standard alphabet (RFC 4648) uses '+' and '/' as the 63rd and 64th characters. Both are problematic in URLs — '+' means space in query strings, '/' breaks path parsing. Base64url replaces them with '-' and '_' respectively, and often drops the '=' padding. Length is recovered from input size mod 4. JWT, OAuth2 tokens, and JOSE specs use base64url so tokens can sit safely in URL fragments.
Why is Base64 not encryption?
Base64 is a reversible deterministic mapping — anyone with the alphabet can decode it instantly, no secret key required. People sometimes confuse 'looks like gibberish' with 'is encrypted', but Base64 provides zero confidentiality. It's an encoding, like Morse code. To protect content you need actual encryption (AES, ChaCha20) — Base64 can then encode the ciphertext for transport, but the secrecy comes from the cipher, never from Base64.
What's the time and space complexity of Base64?
O(n) time for both encode and decode, with a single pass. Memory is O(n) for the output buffer. Modern x86 implementations using AVX2/AVX-512 SIMD reach 10+ GB/s — Base64 is rarely a bottleneck. The decoder needs a 256-entry lookup table mapping ASCII characters to their 6-bit values; an invalid character is signalled by a sentinel value (typically -1 or 64).