Cryptography
Constant-Time Cryptography
The algorithm is correct — but the clock is leaking your key
Constant-time cryptography writes code whose running time and memory-access pattern never depend on secret data, so an attacker measuring the clock or the cache learns nothing about the key — the standard defense against timing and cache side-channel attacks.
- Defends againstTiming & cache attacks
- Core ruleNo secret branches
- Second ruleNo secret memory addresses
- Compare costAlways O(n), never early-exit
- Table lookup costScan all entries, O(table)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How constant-time code defeats the clock
A cryptographic algorithm can be mathematically perfect and still hand over the key, because the math runs on a real machine that takes measurably different amounts of time depending on what it's computing. An if (secret_bit) branch, an early return on a comparison mismatch, or a table lookup at a secret index all create a correlation between the secret and something an attacker can observe — wall-clock time, cycle count, or which cache lines are now warm. That observable channel is a side channel, and the attack that exploits the time dimension is a timing attack.
Constant-time programming closes the channel by enforcing two invariants on every line that touches a secret:
- No secret-dependent control flow. The sequence of instructions executed must be the same regardless of secret values. No
if, no earlyreturn, no loop bound, noswitchdriven by a secret. You replace branches with arithmetic: compute both outcomes and select between them with a bitmask. - No secret-dependent memory addresses. Which byte of RAM you read or write must not depend on a secret, because the cache turns that into a timing difference. A table indexed by a secret nibble leaks the nibble through which cache line gets loaded. You replace the indexed read with a linear scan that touches every entry and masks out all but the wanted one.
The phrase "constant time" is slightly misleading. It does not mean every run consumes the identical number of cycles — runtime still varies with public inputs like message length, with OS scheduling, and with CPU frequency scaling. It means the timing and memory-access pattern are statistically independent of the secret. No amount of measurement should let an attacker distinguish "the key bit was 0" from "the key bit was 1."
The mechanism: branchless selection with bitmasks
The workhorse primitive is the constant-time conditional select. You want cond ? a : b without a branch. The trick is to turn a boolean into a full-width mask of all-ones (0xFFFFFFFF) or all-zeros (0x00000000), then blend:
mask = 0 − cond // cond ∈ {0,1} → mask ∈ {0x00000000, 0xFFFFFFFF}
result = (a & mask) | (b & ~mask)
When cond = 1, the mask is all-ones, so a & mask = a and b & ~mask = 0, giving a. When cond = 0, the mask is all-zeros and you get b. Either way the CPU executes the same five instructions — there is no branch to mispredict and no path that runs faster. This same idea, applied to whole arrays, gives a constant-time table lookup: AND every entry with a mask that's all-ones only at the wanted index, then OR the results together.
For comparing two secret buffers — the operation behind verifying a MAC, an HMAC, or a password hash — you must never short-circuit. The naive memcmp returns at the first differing byte, so a guess that matches the first 7 bytes runs longer than a guess that fails at byte 0. The constant-time version accumulates all byte differences with OR across the whole buffer and only inspects the result once at the end:
diff = 0
for i in 0..n: diff |= a[i] ^ b[i] // 0 only if every byte matched
equal = (diff == 0) // single branch on the accumulator
The cost is always exactly n iterations — O(n) — instead of the early-exit average of O(1) on random mismatches. That uniformity is the whole point.
When you must go constant-time
- Comparing secrets. MAC/HMAC tags, authentication tokens, password-hash digests, API keys — anywhere a "does this equal the expected value?" check runs on attacker-influenced input.
- Modular exponentiation and scalar multiplication. RSA decryption, Diffie-Hellman, and elliptic-curve scalar multiply must process every key bit identically — no skipping the squaring step when a bit is 0 (the classic square-and-multiply leak).
- Block-cipher S-boxes and table lookups. Software AES that indexes a 256-entry T-table by a secret byte leaks through the cache; constant-time AES uses bit-slicing or hardware AES-NI instead.
- Decryption and padding checks. The infamous padding-oracle and Lucky 13 attacks both came from secret-dependent timing in CBC padding and MAC verification.
You do not need constant time for data that isn't secret. Parsing a public certificate, formatting a log line, or branching on a message's public length is fine. The discipline is expensive in developer effort, so apply it precisely where a secret meets observable behavior — and nowhere else.
Naive vs constant-time, operation by operation
| Operation | Naive (leaky) | Constant-time | Cost change | What leaks if naive |
|---|---|---|---|---|
| Buffer equality | memcmp, early exit | OR-accumulate all bytes | O(1) avg → O(n) | Matching prefix length → byte-by-byte recovery |
| Conditional select | if (c) x=a; else x=b; | bitmask blend | ≈ same (5 ops) | Branch predictor / cycle count → the condition bit |
| Table lookup | index by secret byte | scan + mask all entries | O(1) → O(table) | Cache line loaded → the secret index |
| Modular exponent | square-and-multiply, skip on 0-bit | Montgomery ladder | ≈ 1.0× → ≈ 1.3× | Multiply pattern → every exponent bit |
| ECC scalar mult | double-and-add, conditional add | Montgomery ladder / complete formulas | ≈ 2× ops | Add pattern → the scalar (private key) |
| Division by secret | hardware div (data-dependent latency) | avoid; use multiply by inverse | varies by CPU | Cycle count → operand magnitude |
The pattern across the table is the same trade: you give up the optimization that made the secret case faster, and you buy immunity to extracting the secret. For everything except table lookups, the overhead is modest; lookups are the painful case, which is why modern ciphers avoid large secret-indexed tables entirely.
What the numbers actually say
- HMAC compare leak. A short-circuiting tag comparison reduces a forging attacker's work from brute-forcing a 128-bit tag (2¹²⁸ trials) to recovering it one byte at a time — about 16 bytes × 256 guesses ≈ 4,096 timed queries. The 2009 Nate Lawson / Taylor Nelson disclosure showed real web frameworks leaking exactly this; it's why Python ships
hmac.compare_digestand Node shipscrypto.timingSafeEqual. - Remote timing is practical. Brumley and Boneh's 2003 attack recovered an OpenSSL RSA private key over a local network, defeating the assumption that network jitter hides timing. The fix shipped was RSA blinding plus constant-time modular reduction.
- Lucky 13 (2013). A timing difference of as little as a few microseconds in TLS CBC MAC verification, amplified over roughly 2²³ TLS sessions, recovered plaintext bytes. The defense was constant-time MAC-then-decrypt processing.
- Cache attacks need no privilege. Flush+Reload distinguishes a cached vs uncached line by a gap of roughly 100–300 cycles (a cache miss costs about 100 ns) — large enough to read a secret AES table index from a co-resident process or even across a hypervisor.
- Constant-time AES S-box scan. Replacing one 256-byte T-table read with a masked scan of all 256 entries is 256× the memory traffic for that step — but on hardware with AES-NI the whole S-box is a single instruction with no table at all, so the dilemma disappears.
JavaScript implementation
The two primitives you reach for most: a timing-safe byte comparison and a branchless conditional select. In Node, prefer the built-in crypto.timingSafeEqual for buffers; the hand-rolled version below shows what it's doing.
// Constant-time equality: always touches every byte, one branch at the very end.
function timingSafeEqual(a, b) {
// Length check leaks length, which is usually public; if not, pad both first.
if (a.length !== b.length) return false;
let diff = 0;
for (let i = 0; i < a.length; i++) {
diff |= a[i] ^ b[i]; // OR keeps any nonzero difference sticky
}
return diff === 0; // single comparison on the accumulator
}
// Branchless select: returns cond ? x : y without a data-dependent branch.
// cond MUST be 0 or 1. Works on 32-bit unsigned values.
function ctSelect(cond, x, y) {
const mask = (-cond) >>> 0; // 1 -> 0xFFFFFFFF, 0 -> 0x00000000
return ((x & mask) | (y & ~mask)) >>> 0;
}
// Constant-time table lookup: read table[secretIdx] without using secretIdx
// as an address. Scans all entries; the cache sees a fixed access pattern.
function ctLookup(table, secretIdx) {
let out = 0;
for (let i = 0; i < table.length; i++) {
const isMatch = (i === secretIdx) ? 1 : 0; // see note below
out = ctSelect(isMatch, table[i], out);
}
return out;
}
A blunt warning about the JavaScript version: i === secretIdx is itself a comparison the JIT may compile to a branch, and JS engines deoptimize unpredictably — true constant-time guarantees are essentially impossible in a managed runtime. Treat this as illustrative. For real secrets in Node, use crypto.timingSafeEqual (native C++) and let WebCrypto / OpenSSL handle the heavy primitives.
Python implementation
Python has the same managed-runtime caveat, so for production use hmac.compare_digest (implemented in C). The pseudo-implementations below illustrate the technique.
import hmac
# Production: use the standard library. It is the constant-time compare.
def safe_verify(expected: bytes, received: bytes) -> bool:
return hmac.compare_digest(expected, received)
# Illustrative hand-rolled version of what compare_digest does:
def ct_equal(a: bytes, b: bytes) -> bool:
if len(a) != len(b): # length is assumed public
return False
diff = 0
for x, y in zip(a, b):
diff |= x ^ y # sticky OR over every byte
return diff == 0 # one decision, at the end
# Branchless select on fixed-width (32-bit) unsigned values.
MASK32 = 0xFFFFFFFF
def ct_select(cond: int, x: int, y: int) -> int:
m = (-cond) & MASK32 # cond in {0,1} -> {0x00000000, 0xFFFFFFFF}
return ((x & m) | (y & ~m & MASK32))
# Full-width branch-free conditional swap, for big numbers (RSA/DH operands).
# bit in {0,1}: returns (b, a) when bit == 1, else (a, b) — never branches on bit.
def cswap(bit: int, a: int, b: int):
mask = -bit # 0 -> 0, 1 -> all-ones (Python ints are arbitrary width)
t = mask & (a ^ b) # 0 when bit == 0, else (a ^ b)
return (a ^ t, b ^ t)
# Square-and-multiply that is *not* constant time (leaks the exponent):
def modexp_leaky(base, exp, mod):
r = 1
while exp > 0:
if exp & 1: # secret-dependent branch — LEAKS
r = (r * base) % mod
base = (base * base) % mod
exp >>= 1
return r
# Montgomery ladder: every bit does the same two operations, no skip.
# Maintains the invariant (r0, r1) = (base**k, base**(k+1)) for the prefix k.
def modexp_ladder(base, exp, mod, bits):
r0, r1 = 1, base % mod
for i in range(bits - 1, -1, -1):
bit = (exp >> i) & 1
r0, r1 = cswap(bit, r0, r1) # branch-free swap so the bit selects operands
r1 = (r0 * r1) % mod # same two ops every bit — multiply, then square
r0 = (r0 * r0) % mod
r0, r1 = cswap(bit, r0, r1) # swap back into canonical order, again branch-free
return r0
Note the asymmetry between the two modexp versions: the leaky one runs fewer multiplications when the exponent has fewer 1-bits, and that variation is exactly the signal a timing attacker integrates over many runs to reconstruct the private exponent. The ladder does identical work per bit, so the runtime carries no information about the exponent.
Variants and techniques worth knowing
Bit-slicing. Represent the cipher state across the bits of many machine words and compute the S-box as a Boolean circuit (logic gates) instead of a table. No memory indexing at all, so no cache leak. Käsper and Schwabe's 2009 bit-sliced AES even ran faster than table AES on some CPUs.
Hardware crypto instructions. AES-NI, ARM's cryptography extensions, and CLMUL compute a round (or a multiply) in a single fixed-latency instruction with no data-dependent table. When available, this is both the fastest and the safest option.
Blinding. Multiply the secret by a fresh random value before the operation and divide it out afterward (RSA blinding, ECC scalar blinding). Even if some timing leaks, it leaks about a randomized value, not the real secret. A defense-in-depth complement to constant-time code, not a replacement.
The Montgomery ladder. The canonical constant-time structure for both modular exponentiation and elliptic-curve scalar multiplication: maintain two running values and perform the same add-and-double per bit, selecting with a mask. Curve25519's X25519 is built around it.
Verification tools. Because compilers fight you, projects use tools that prove the property: ctgrind / Valgrind's secret-tracking, ctverif, dudect (statistical leakage testing on real hardware), and HACL*/Fiat-Crypto, which generate verified constant-time code.
Common bugs and edge cases
- The compiler re-branches your branchless code. An optimizer can pattern-match a bitmask select back into a conditional move or a real branch, or fold your careful loop into a
memcmpcall. You must inspect the emitted assembly; constant-time source does not imply constant-time binary. - Leaking the length. A length check before the loop is fine if the length is public (it usually is for MACs), but if the secret's length is sensitive, returning early on a length mismatch leaks it. Pad both inputs to a fixed size first.
- Secret-dependent division or modulo. Hardware integer division has operand-dependent latency on many CPUs. Avoid dividing by a secret; multiply by a precomputed inverse instead.
- Variable-time big-number libraries. Generic bignum routines normalize by stripping leading zero limbs, making the operation count depend on the magnitude of the secret. Use the fixed-width, constant-time paths your crypto library provides.
- Adding random delays as a "fix." Jitter is averaged away by collecting more samples; it raises the query count but not the security. Remove the variation; don't mask it.
- Trusting a managed runtime. JIT compilers, garbage collectors, and string-interning all introduce data-dependent timing you can't control. True constant-time guarantees live in C/assembly with verified toolchains, not in JavaScript, Python, or the JVM.
- Speculative execution. Even branch-free, in-order-looking code can leak through speculative paths (Spectre-class attacks) when a secret influences a load that's later squashed. Mitigations like speculation barriers may be needed on top of classic constant-time discipline.
Frequently asked questions
Why is == unsafe for comparing secrets like MACs or password hashes?
A normal comparison short-circuits at the first byte that differs, so it returns faster for a near-miss than for a wildly-wrong guess. An attacker who times responses can recover the secret one byte at a time, turning a 2^128 brute force into roughly 16 × 256 trials. Use a constant-time compare that ORs every byte difference and only branches on the final accumulator.
What's the difference between a timing attack and a cache attack?
A timing attack measures total wall-clock or cycle count and infers secrets from how long the operation took — e.g. a secret-dependent branch or early exit. A cache attack measures which memory lines got loaded by probing the cache (Flush+Reload, Prime+Probe); if a table index is secret, the attacker learns the index from which line is now hot. Constant-time code must defeat both: no secret branches and no secret memory addresses.
Can the compiler turn my constant-time code back into branchy code?
Yes — this is the biggest practical hazard. An optimizing compiler can recognize a bitmask select as an if-statement and emit a conditional branch, or replace a careful loop with a memcmp call. Mitigations include compiler barriers (an inline-asm clobber), the volatile keyword on the accumulator, hand-written assembly, or formal verification tools. You must check the actual generated assembly, not the source.
Is constant-time code slower than the naive version?
Usually a little, but far less than people expect. A constant-time byte comparison always touches all n bytes instead of stopping early, and a constant-time table lookup scans every entry instead of indexing one — that turns an O(1) read into O(table size). For AES S-boxes that's 256 reads instead of 1, but the cost is dwarfed by being immune to key extraction. Modern designs (ChaCha20, X25519, Kyber) are constant-time by construction and often as fast as the branchy alternatives.
Does "constant time" mean the same number of CPU cycles every run?
No. It means the timing and memory-access pattern are independent of secret data, not that every run is bit-identical in cycles. Runtime may still vary with public inputs (message length), OS scheduling, and frequency scaling. The security property is that none of the variation correlates with the key, nonce, or plaintext you're trying to protect.
Why can't I just add a random delay to hide the timing?
Random jitter is noise, and an attacker beats noise with averaging. If the true signal is a 50-nanosecond difference, collecting a few million timed samples lets statistics recover it through the random delay. Worse, the added delay often runs in a separate thread or syscall whose own timing leaks. The only robust fix is to remove the secret-dependent variation, not to bury it.