Encoding
COBS Encoding
Consistent Overhead Byte Stuffing — zero-byte elimination for serial framing
COBS rewrites a byte stream so 0x00 never appears inside a packet, freeing 0x00 to serve as an unambiguous end-of-frame delimiter. Worst-case overhead: 1 byte per 254.
- Overhead≤ 1 byte per 254 bytes
- Encode / decodeO(n) time, O(1) extra memory
- Frame delimiterA single 0x00 byte
- Worst-case 1024-byte packet+5 bytes (0.5%)
- ResynchronizationImmediate at next 0x00
- Used byUART protocols, OpenBCI, EtherDream, drone telemetry
Interactive visualization
Watch a 12-byte payload with two embedded zeros be rewritten as a single zero-free frame with a trailing 0x00 delimiter.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How COBS works
The problem is simple. You have a byte stream — a UART, a USB CDC pipe, a debug link — and you want to send packets over it. Packets are variable length. You need a way for the receiver to know where one packet ends and the next begins. The cleanest approach is a delimiter byte: pick a value, say 0x00, and declare "0x00 marks the end of every packet."
But your packet payload might contain 0x00 already. If you transmit it raw, the receiver will mistake an interior zero for an end-of-frame marker and chop the packet in half. So you need an encoding step that guarantees no 0x00 appears inside the packet. COBS — Consistent Overhead Byte Stuffing, designed by Cheshire and Baker in 1999 — is the optimal way to do this with bounded overhead.
The encoding rule:
- Reserve one extra byte at the start of the packet, called the overhead byte.
- Walk the input from left to right. Count non-zero bytes into a counter
n. - When you hit a 0x00 in the input, write
n+1into the previous overhead byte, then start a new overhead byte slot. Resetnto zero. Skip the 0x00 — don't copy it. - If
nreaches 254 without a 0x00, write 0xFF into the previous overhead byte and start a new overhead byte slot — this is the only case where overhead is paid without an input zero. - At end of input, fill in the final overhead byte and append a single 0x00 as the frame delimiter.
Worked example — 12 bytes with two embedded zeros
Input payload (hex):
45 33 00 7A 12 6B 8C 00 51 99 22 04
Walk it: read 45, 33 (n=2), then a zero — write n+1=3 to the first overhead byte, skip the zero, start a fresh count. Read 7A, 12, 6B, 8C (n=4), then a zero — write 5 to the next overhead byte, skip the zero, fresh count. Read 51, 99, 22, 04 (n=4) until end-of-input — write 5 to the final overhead byte. Append the frame delimiter 0x00:
03 45 33 05 7A 12 6B 8C 05 51 99 22 04 00
^^ ^^ ^^ ^^
overhead overhead overhead delimiter
Original payload: 12 bytes. Encoded frame: 14 bytes (12 + 1 overhead + 1 delimiter). The output contains zero 0x00 bytes inside the frame — only the final delimiter is zero. The receiver can scan for 0x00 with a tight inner loop and know with certainty that it's at a frame boundary.
Why the overhead is bounded
Two cases contribute overhead:
- A 0x00 in the input. The encoder writes a length byte (1 byte added) but skips the input zero (1 byte removed). Net: 0 bytes added. The zero is being relocated, not duplicated.
- A run of 254 non-zero bytes. The encoder writes a length byte 0xFF (1 byte added) but doesn't remove anything from the input. Net: 1 byte added, every 254 bytes of zero-free input.
So the overhead in bytes is ⌈payload_length / 254⌉ plus the trailing delimiter. For a 254-byte payload of all 0x42 (no zeros anywhere), the encoder emits 0xFF followed by 254 copies of 0x42, then 0x00 — total 256 bytes, a 1-byte overhead. For a 254-byte payload with at least one zero every 254 bytes, the overhead is also exactly 1 byte (just the trailing delimiter). The worst case is achieved on zero-free runs.
COBS vs other framing schemes
| Scheme | Worst-case overhead | Resync | Notes |
|---|---|---|---|
| COBS | 1 byte per 254 (≤0.4%) | Immediate at next 0x00 | Bounded; preferred for new designs |
| SLIP (RFC 1055) | 2× expansion (worst case) | Immediate at 0xC0 | Escape-character; simpler but variable |
| PPP (RFC 1662) | 2× expansion | Immediate at 0x7E | Per-byte escape table; complex CRC |
| HDLC bit-stuffing | ~0.4% (bit-level) | Bit-by-bit | Hardware-friendly; bit-oriented |
| Length-prefix framing | 2-4 bytes header per packet | None — must reread length | No resync from mid-stream |
| STX/ETX with escapes | 2× expansion | Immediate at STX | Common in industrial protocols |
The two dimensions to compare are worst-case overhead and resync behavior. COBS wins on both: ≤0.4% overhead and you can drop in mid-stream, find the next 0x00, and you're frame-aligned. Length-prefix is cheapest in the average case but offers no resync — if you miss the length byte, you're permanently lost until the link is reset.
When to use COBS
- Embedded serial links. Any UART, RS-485, USB CDC, or BLE characteristic where the channel is byte-oriented and packets need framing.
- Fixed-size buffers. Firmware that allocates packet buffers at compile time benefits from COBS's exact upper bound (
payload + ⌈payload/254⌉ + 1). - Recovery from glitches. When a noisy link drops bytes, COBS lets the receiver discard everything until the next 0x00 and resume cleanly on the following packet.
- Layering on UART without hardware flow control. Hardware protocols like CAN or RS-485 multi-drop have built-in framing; bare UART does not. COBS provides framing in software with minimal CPU and RAM cost.
Skip COBS when you're on a packet-oriented transport (TCP segments, UDP datagrams, Ethernet frames — they're already framed), when overhead doesn't matter and length-prefix is simpler, or when your packets are tiny enough that escape-character schemes' worst-case 2× is irrelevant.
Pseudo-code
// COBS encode: input bytes → zero-free output + trailing 0x00.
function cobsEncode(input):
output = []
output.append(0) // overhead byte placeholder
overhead_idx = 0
code = 1 // count up to 254 before forced break
for byte b in input:
if b == 0:
output[overhead_idx] = code
overhead_idx = output.length
output.append(0) // new overhead placeholder
code = 1
else:
output.append(b)
code += 1
if code == 255: // 254 non-zero bytes — force a block
output[overhead_idx] = 0xFF
overhead_idx = output.length
output.append(0)
code = 1
output[overhead_idx] = code
output.append(0) // frame delimiter
return output
// COBS decode: zero-free input (terminated by 0x00) → original bytes.
function cobsDecode(input):
output = []
i = 0
while i < input.length:
code = input[i]; i += 1
if code == 0: break // frame delimiter, done
for j in 1..code-1:
if i >= input.length: error("truncated")
output.append(input[i]); i += 1
if code < 0xFF and i < input.length:
output.append(0) // recovered zero
return output
JavaScript implementation
function cobsEncode(input) {
const output = [0]; // overhead placeholder
let overheadIdx = 0;
let code = 1;
for (let i = 0; i < input.length; i++) {
if (input[i] === 0) {
output[overheadIdx] = code;
overheadIdx = output.length;
output.push(0);
code = 1;
} else {
output.push(input[i]);
code++;
if (code === 0xFF) {
output[overheadIdx] = 0xFF;
overheadIdx = output.length;
output.push(0);
code = 1;
}
}
}
output[overheadIdx] = code;
output.push(0); // delimiter
return new Uint8Array(output);
}
function cobsDecode(input) {
const output = [];
let i = 0;
while (i < input.length) {
const code = input[i++];
if (code === 0) break;
for (let j = 1; j < code && i < input.length; j++) output.push(input[i++]);
if (code < 0xFF && i < input.length) output.push(0);
}
return new Uint8Array(output);
}
const packet = new Uint8Array([0x45, 0x33, 0x00, 0x7A, 0x12, 0x6B, 0x8C, 0x00, 0x51, 0x99, 0x22, 0x04]);
const encoded = cobsEncode(packet);
console.log([...encoded].map(b => b.toString(16).padStart(2, '0')).join(' '));
// 03 45 33 05 7a 12 6b 8c 05 51 99 22 04 00
console.log(cobsDecode(encoded));
// Uint8Array(12) [69, 51, 0, 122, 18, 107, 140, 0, 81, 153, 34, 4]
Python implementation
def cobs_encode(data: bytes) -> bytes:
output = bytearray([0]) # overhead placeholder
overhead_idx = 0
code = 1
for b in data:
if b == 0:
output[overhead_idx] = code
overhead_idx = len(output)
output.append(0)
code = 1
else:
output.append(b)
code += 1
if code == 0xFF:
output[overhead_idx] = 0xFF
overhead_idx = len(output)
output.append(0)
code = 1
output[overhead_idx] = code
output.append(0) # frame delimiter
return bytes(output)
def cobs_decode(frame: bytes) -> bytes:
output = bytearray()
i = 0
while i < len(frame):
code = frame[i]; i += 1
if code == 0: break
end = i + code - 1
output.extend(frame[i:end])
i = end
if code < 0xFF and i < len(frame):
output.append(0)
return bytes(output)
packet = bytes([0x45, 0x33, 0x00, 0x7A, 0x12, 0x6B, 0x8C, 0x00, 0x51, 0x99, 0x22, 0x04])
encoded = cobs_encode(packet)
print(encoded.hex(' ')) # 03 45 33 05 7a 12 6b 8c 05 51 99 22 04 00
print(cobs_decode(encoded) == packet) # True
Common COBS bugs and edge cases
- Forgetting the trailing 0x00 delimiter. Without it, the receiver waits forever for the end of the current packet. Many protocol bugs are "encoder forgot to flush the delimiter byte." Always assert the final byte is 0x00 in unit tests.
- Empty input. Encoding zero bytes should produce
[0x01, 0x00]— one overhead byte and the delimiter. Decoding this should yield an empty payload. Off-by-one logic often returns[0x00]or fails on this case. - Input that ends with 0x00. The final overhead byte after the last input zero is set to 1 (count includes the implicit next-zero). Then the delimiter follows. The trailing zero is recovered correctly only if the encoder writes the final overhead byte after the zero, not before.
- The 254-byte non-zero run. When
codereaches 0xFF, the encoder must write 0xFF to the overhead byte and start a new block — without inserting a virtual zero in the output. This is the one case in COBS where overhead is paid without a corresponding input zero. - In-place encoding pitfalls. Encoding in place requires shifting bytes to make room for overhead. Easier to just allocate an output buffer sized
input_size + ⌈input_size/254⌉ + 1. - Confusing COBS with COBS/R (reduced). COBS/R is a variant that elides the final overhead byte when the payload ends with a single non-zero. Saves 1 byte per packet but changes the decode rule. Don't mix encoders and decoders from different variants.
Performance in real systems
- STM32 firmware: Encoder runs at ~50 MB/s on an STM32F4 at 168 MHz — fast enough that COBS adds <1% CPU to a 1 Mbps UART link.
- Linux userspace: A naive C implementation processes ~500 MB/s on modern hardware; SIMD-accelerated variants exceed 2 GB/s.
- OpenBCI biosensor: Uses 33-byte sample frames over USB CDC at 250 kbps — COBS adds 1 byte of overhead per frame, <3% bandwidth cost.
- EtherDream laser DAC: Streams 30,000 point/sec at 30 bytes/point — COBS framing adds 0.4% overhead, negligible against the 7 Mbps total throughput.
- Drone telemetry: MAVLink v2 alternates frames with start-byte framing (escape-style) and COBS-wrapped variants; COBS variants give predictable buffer sizing for tight RAM budgets.
The takeaway: COBS is the encoding to reach for whenever you need framing on a byte stream and want a hard upper bound on packet size. The math is exact, the implementation fits in 30 lines, and resync from a noisy mid-stream insertion takes exactly one zero byte.
Frequently asked questions
Why use COBS instead of escape characters?
Classical escape-character framing (like PPP's 0x7D escape) has variable overhead — the worst case is 2x expansion if the payload is full of bytes that need escaping. COBS guarantees at most 1 byte of overhead per 254 bytes of payload regardless of what bytes appear in the payload. That bounded overhead makes COBS predictable for fixed-size buffers in embedded systems, where 'this packet will be ≤ N bytes after framing' is something firmware engineers need to assert at compile time.
What exactly does COBS do?
It rewrites a packet so the packet contains zero 0x00 bytes, then appends a single 0x00 byte as an end-of-frame delimiter. Inside the packet, every original 0x00 is replaced by a 'distance-to-next-zero' count. The receiver reads bytes until it sees 0x00 (end of frame), then walks the count chain to reinsert the original zeros. The packet itself is otherwise unchanged.
What's the overhead for a 1024-byte packet?
Worst case: ⌈1024 / 254⌉ = 5 bytes of overhead — about 0.5%. Plus the trailing 0x00 delimiter, so 6 bytes total framing overhead on 1024 bytes of payload. For a 64-byte packet, worst-case overhead is 1 byte. For a 254-byte packet, exactly 1 overhead byte regardless of content. COBS is rare among framing schemes in giving an exact formula with no payload-content dependency.
How does the COBS decoder rebuild the original?
The decoder starts by reading the first byte as a length count N. It then copies the next N-1 bytes to output as literals. If N was less than 255, it appends a 0x00 byte to the output (this is a 'recovered zero'). Then it repeats: read the next length byte, copy that many literals, append a zero — until it consumes the input. The terminating 0x00 of the frame is not output; it's just the signal to stop.
When is the 254-byte boundary problematic?
When a run of 254 non-zero bytes occurs in the payload. The encoder must insert an overhead byte (length=0xFF, meaning 'next 254 bytes are literals, do NOT append a zero'). That's the only case where overhead is paid without a corresponding zero in the input. For payloads with frequent zeros, COBS can sometimes have zero net expansion because each input zero already takes one byte and is replaced by one length byte — no extra cost.
What's the time and space complexity of COBS?
O(n) time for both encode and decode, with a single pass over the data. Memory is O(1) extra if you can write in place at most every 254 bytes, or O(n) if you build the encoded packet in a fresh buffer. The encoder needs a one-byte lookback to backfill length counts, which is small enough to live in a single CPU register on any microcontroller. Decoding is even simpler — a state machine with one counter.
Where is COBS used in practice?
Embedded UART protocols where a serial channel needs reliable framing — OpenBCI biosensors, RoboticsCape, many drone telemetry links, the EtherDream laser DAC, and several USB CDC firmware stacks. SLIP and PPP predate COBS and use escape characters; for greenfield designs after ~2000, COBS is the conventional choice. Cheap to implement, bounded overhead, recovers a stream from mid-flight bytes after a single 0x00 delimiter is seen.