Digital VLSI

CMOS Logic Gates

A PMOS pull-up over an NMOS pull-down — one always off, the other always on — multiplied by 10⁸ per square millimetre

CMOS pairs a PMOS pull-up with an NMOS pull-down so one transistor is always off and static current is essentially zero. The inverter, NAND, and NOR topologies built from this complementary pattern are the standard cells from which every digital chip — billions of transistors per square millimetre at 7 nm — is composed.

InventedWanlass, Fairchild 1963
Static current≈ 0 (one device off)
Inverter delay (7 nm)~ 10 ps
Dynamic powerP = α C V² f
Density (7 nm)~ 10⁸ trans./mm²

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The complementary idea

A logic gate is, at bottom, a switch. To output a logical 1 you connect the output node to the positive supply Vdd; to output a 0 you connect it to ground. The hard part is doing that with transistors that are small, fast, and don't melt under load. The pre-CMOS approach — NMOS logic in the 1970s — used an NMOS transistor as the pull-down and a resistor (or a depletion-mode NMOS acting like one) as the pull-up. When the pull-down was on, the resistor still carried current to ground; static dissipation was unavoidable. A 100 000-transistor NMOS chip drew several watts in steady state, and scaling further was thermally impossible.

CMOS solves the problem by making the pull-up active too — a PMOS transistor whose gate is wired to the same input as the NMOS pull-down. Because the PMOS conducts when its gate is low and the NMOS conducts when its gate is high, they are complementary: in any DC steady state, exactly one of them is on. There is never a path from Vdd to Gnd in steady state, so there is no static current. Power is dissipated only during the brief switching transition when both devices momentarily conduct, and through subthreshold leakage that grows exponentially as V_t shrinks.

This single observation — that an NMOS and a PMOS are duals and can be wired in complementary networks — is the entire reason 10⁸ transistors fit in a square millimetre without the package vapourising.

The inverter, in detail

The simplest CMOS gate is the inverter, a single complementary pair:

          Vdd
           |
        |‒‒|‒‒  PMOS  (source at Vdd, drain at Vout, gate at Vin)
           |
           +‒‒‒ Vout
           |
        |‒‒|‒‒  NMOS  (drain at Vout, source at Gnd, gate at Vin)
           |
          Gnd

When V_in is low (≈ 0 V), the NMOS gate-source voltage is below threshold so the NMOS is off, while the PMOS gate-source voltage is −V_dd, well below its (negative) threshold, so the PMOS is on. The output node sees a low-resistance path to Vdd through the PMOS and a high-resistance path to ground through the off NMOS, so V_out pulls up to V_dd — a logical 1.

When V_in is high (≈ V_dd), the NMOS turns on and the PMOS turns off. The output is pulled down to Gnd, a logical 0. Input low gives output high; input high gives output low — that is a logical inversion, and the gate symbol gets a small bubble on the output to denote it.

NAND, NOR, and the duality rule

Multi-input gates extend the complementary pair into a pull-up network (PUN) of PMOS transistors and a pull-down network (PDN) of NMOS transistors. The rule for building them is structural: the two networks are graph-theoretic duals. Series in one becomes parallel in the other, and vice versa.

Function	PUN (PMOS, to Vdd)	PDN (NMOS, to Gnd)
Inverter	1 PMOS	1 NMOS
NAND-2 (A · B)'	2 PMOS in parallel	2 NMOS in series
NAND-3 (A · B · C)'	3 PMOS in parallel	3 NMOS in series
NOR-2 (A + B)'	2 PMOS in series	2 NMOS in parallel
NOR-3 (A + B + C)'	3 PMOS in series	3 NMOS in parallel
AOI21 (A·B + C)'	(2 PMOS in series) in parallel with 1 PMOS	(2 NMOS in parallel) in series with 1 NMOS

NAND wins in practice because the slow PMOS devices end up in parallel (their drive currents add), while in NOR the PMOS devices are stacked in series (their channel resistances add). For equivalent rise time the NOR's PMOS transistors must be sized roughly twice as wide as NAND's, costing area. As a result, most logic in modern standard-cell libraries is synthesised toward NAND-rich forms — even when the algebraic intent looks more naturally NOR-shaped — and the cell library carries deeper NAND-N variants (NAND2, NAND3, NAND4) than NOR-N.

Note also that every CMOS gate is naturally inverting: the pull-down network conducts to ground when its inputs are high, producing an output low. To get a non-inverting AND or OR, you cascade an inverter — which is why AND2 in a standard-cell library is physically a NAND2 followed by an INV, and is slower than a bare NAND2.

Propagation delay

The first-order delay of a CMOS gate driving a capacitive load is

t_pd ≈ (C_load · V_dd) / I_drive

where C_load is the sum of all capacitance the output must charge (next-stage gate input capacitance + interconnect wire capacitance + the gate's own drain diffusion capacitance) and I_drive is the saturation current of the conducting transistor or stack. The intuition is straightforward: it takes charge Q = C · V_dd to swing the output a full rail, and that charge is delivered at rate I_drive.

For an inverter at 7 nm driving a fan-out of four identical inverters, a representative delay is about 10 ps. A few quick consequences:

Wider devices are faster, up to a point. Doubling transistor width doubles I_drive but also roughly doubles the gate's own input capacitance, so the upstream gate sees a heavier load. The net delay of an inverter chain is minimised at a fan-out of about e (~2.7) per stage, the famous "logical effort" result of Sutherland, Sproull, and Harris.
Series stacks slow the gate. A NAND-3 has three NMOS in series in the pull-down. The bottom transistor sees nearly V_dd across it but the top transistor has its source raised by the voltage drop on the lower devices, reducing its V_gs and therefore its drive. Effective drive of a 3-stack is closer to ⅓× of a single transistor, not full strength.
Wires dominate at scale. A long interconnect can add picofarads of load, far more than any gate. Modern timing closure spends most of its effort on RC-tuning of wires (buffer insertion, repeater spacing) rather than on the gates themselves.

Power: switching plus leakage

Total CMOS chip power is

P_total = P_dynamic + P_short_circuit + P_leakage
        ≈ α C V_dd² f  +  small  +  I_leak · V_dd

The dynamic term arises because each 0→1→0 cycle moves C V_dd of charge from the supply to ground. Energy per transition is ½ C V_dd², and there are two transitions per period, so energy per cycle is C V_dd². Multiply by frequency and by the activity factor α (the fraction of clock cycles on which the node actually toggles, typically 0.1–0.3 for random data) to get average dynamic power.

Short-circuit power, the brief crowbar current that flows during the input-slewing window when both PMOS and NMOS are partially on, is usually 5–15% of dynamic and is normally lumped into α.

Leakage power is the dark side. Subthreshold leakage current scales as exp(−V_t / nV_T), so as V_t was lowered to keep gates fast at lower V_dd, leakage shot up — by 65 nm it accounted for around a third of total power in mobile SoCs, prompting the introduction of high-V_t cells and power-gating. Gate-oxide tunnelling and gate-induced drain leakage (GIDL) add further leakage components at sub-22 nm nodes. FinFETs (introduced at 22 nm by Intel, 16/14 nm industry-wide) and gate-all-around nanosheets (3 nm+) exist primarily because the planar MOSFET could no longer hold its channel off cleanly.

Worked example: a 1 GHz inverter

Take a single 7 nm inverter with C_load = 0.5 fF (a fan-out of four), V_dd = 0.7 V, and a switching activity α = 0.5 at f = 1 GHz. Then

P_dyn = α C V² f
      = 0.5 · 0.5 × 10⁻¹⁵ · (0.7)² · 1 × 10⁹
      = 0.5 · 0.5 × 10⁻¹⁵ · 0.49 · 10⁹
      ≈ 1.2 × 10⁻⁷ W
      = 0.12 µW per inverter

That is microscopic. But a modern SoC has on the order of 10¹⁰ such switching nodes. At even 10% effective activity that scales to ~100 W of dynamic power, which is why power management — clock gating, DVFS, power islands — is the central activity of advanced chip design.

The standard cell library

Real digital design does not draw transistors. Synthesis maps a Verilog or VHDL register-transfer-level description onto a fixed vocabulary of pre-characterised cells called the standard cell library. Each cell:

Has the same height (a "row height"), so cells can be abutted along supply rails.
Implements one Boolean function — INV, BUF, NAND2/3/4, NOR2/3, AND2, OR2, AOI21, OAI22, XOR2, XNOR2, MUX2, half adder, full adder, latch, flip-flop, scan flip-flop, ...
Comes in multiple drive strengths (1×, 2×, 4×, 8× — sometimes finer) so synthesis can size each gate to its load.
Comes in multiple threshold flavours (low-V_t / regular-V_t / high-V_t — sometimes ultra-low and ultra-high) so place-and-route can swap fast-and-leaky cells onto critical paths while putting slow-and-tight cells everywhere else.

A typical commercial library carries 200–800 distinct cells. Each one has been hand-laid for area, characterised across voltage and temperature corners (Liberty .lib files), and laid out under design rules (LEF). The whole digital backend — synthesis, place-and-route, static timing analysis, sign-off — assumes this fixed cell vocabulary. Modern shells (TSMC's, Samsung's, Intel's, GlobalFoundries') offer multiple libraries for the same node, tuned for high-performance, balanced, or low-power use.

Density and the modern node

The bare numbers are worth pausing on. A 7 nm process — TSMC N7 (2018), Samsung 7LPP — fits on the order of 10⁸ transistors per mm². A 100 mm² die therefore holds 10¹⁰ transistors. A standard high-density 6T SRAM cell at 7 nm is about 0.027 µm²; a NAND2 of standard drive is around 0.1 µm². The full Apple M2 SoC, fabricated at TSMC N5 (5 nm), packs about 20 billion transistors into 155 mm². Each one of those transistors is sitting in a CMOS pair, drawing nothing in steady state, switching on demand.

None of this is possible without complementary logic. NMOS would melt the die. Bipolar would not scale below micron geometries. The Wanlass insight from 1963 — that putting a PMOS and an NMOS in series between rails gives you a zero-static-current switch — is what made dense digital chips a physical possibility.

Variants and extensions

Dynamic logic (domino, NORA). Skips one of the networks (typically the PUN) and uses a precharge-evaluate clocking scheme. Faster and smaller than static CMOS but charge-sharing prone; fell out of fashion at advanced nodes where leakage corrupts the precharged node.
Pass-transistor logic. Uses transistors as switches between signal nodes rather than between rails. Smaller (no full PUN+PDN) but suffers from V_t drops that erode noise margin, and is generally avoided except inside specialised cells like multiplexers.
Transmission gates. A parallel NMOS + PMOS used as an analog switch. Workhorse inside MUX2 and latch cells; passes a full rail in either direction unlike a single transistor.
Adiabatic logic. Recycles the energy normally dumped to ground by ramping V_dd slowly. Theoretically dissipates far less than CV²f but requires non-DC supply rails; remains a research curiosity.
FinFET and GAA nanosheet. Not different logic — same complementary inverter — but new transistor geometries (3D fins from 22/16 nm, gate-all-around stacked nanosheets at 3 nm and below) chosen because the planar MOSFET could no longer suppress leakage at short channel lengths. CMOS the logic style outlived planar CMOS the device.

Where CMOS gates show up

Every digital chip you have ever used. Microprocessors, GPUs, SoCs, FPGAs, microcontrollers, DRAM controllers, flash controllers, network chips, image sensors — all built from CMOS standard cells. The exceptions are vanishingly few (some RF, some power-electronics drivers, some old analog).
SRAM cells. The 6T SRAM bit cell is itself two cross-coupled CMOS inverters plus two access transistors — a tiny complementary logic structure replicated 10⁶ to 10⁹ times per chip in caches.
Sense amplifiers and I/O drivers. Higher-voltage variants (often a separate I/O CMOS device with thicker oxide) drive off-chip signals; on-die they look like very wide-channel CMOS inverters.
Image sensors. CMOS image sensors (the C in "CMOS sensor") put a photodiode plus 3–4 transistors per pixel; the readout is conventional CMOS logic on the same die.
RF transceivers. Modern RF front-ends are heavily CMOS — power amps and mixers built from carefully biased transistors, but the digital baseband is plain CMOS logic.

Common pitfalls

Forgetting CMOS is inherently inverting. Every pull-down network sinks when its inputs are high → the output goes low. There is no native non-inverting gate; AND and OR are NAND/NOR followed by an inverter. Students often draw an "AND" with two NMOS in series and one PMOS — that is a NAND, not an AND.
Confusing static vs. dynamic power. "CMOS draws no power" is shorthand for "no static current at the supply rails." It says nothing about switching. At full clock speed, a modern CPU dissipates 50–250 W almost entirely as α C V² f — there is no contradiction with static power being near zero.
Sizing the PMOS = NMOS. Holes have roughly 2× lower mobility than electrons, so PMOS devices are typically drawn 2–2.5× wider than NMOS to balance rise and fall times. Forgetting that asymmetry gives slow, skewed gates.
Ignoring stack height in series PDNs. A NAND-4 has four NMOS in series, dramatically reducing effective drive — and an internal node between them ramps slowly. Most libraries cap stack height at 3–4 and synthesise wider fan-ins from multi-level gates instead.
Treating leakage as negligible. Below 90 nm, leakage is a first-order power component, not an afterthought. Without high-V_t cells and aggressive power gating, idle chips at 7 nm would burn watts doing nothing.
Mixing up bubble conventions. A bubble on the input of a gate denotes an inverter on that input, not a different gate. NAND drawn with bubbles on its inputs is equivalent to a NOR by De Morgan; this is a deliberate identity used in schematic readability but it confuses beginners.

Frequently asked questions

Why is CMOS static current ≈ 0?

In any steady-state input, one of the two complementary networks is fully off. In an inverter with input low, the PMOS is on (pulls Vout up to Vdd) and the NMOS is off (no path to ground). With input high, the NMOS is on and the PMOS is off. Either way there is no DC path between Vdd and Gnd, so the only steady current is subthreshold leakage — picoamps per transistor at older nodes, several nA at 7 nm. Current flows only briefly when both transistors are partially on during a switching transition.

How is a CMOS NAND-2 different from a CMOS NOR-2?

They are duals. NAND-2 puts two NMOS in series between the output and ground (both inputs must be high to pull the output low) and two PMOS in parallel between the output and Vdd (either input low pulls the output high). NOR-2 is the opposite: two PMOS in series between Vdd and the output (both inputs must be low to pull the output high) and two NMOS in parallel between the output and ground. NAND-2 is the workhorse because the slow PMOS devices are in parallel; NOR-2 has them in series, making it considerably slower at the same drive strength, which is why most standard-cell libraries lean on NAND-based logic.

What sets the propagation delay of a CMOS gate?

To first order, t_pd ≈ (C_load · V_dd) / I_drive, where C_load is everything the gate has to charge or discharge (next-stage gate capacitance, wire capacitance, output diffusion capacitance) and I_drive is the saturation current of the conducting transistors. Raising V_dd makes the gate faster but burns more switching energy; widening transistors raises I_drive but also raises C_load on whatever drives this gate, so there is an optimum stack-up. At 7 nm a typical inverter has a delay around 10 ps with a fan-out of four.

Where does the P = C V² f formula come from?

Every time a CMOS gate's output goes from 0 to Vdd, it dumps an amount of energy (1/2) C V² into the load capacitance and pulls the same amount out of the supply — but the gate also pulls another (1/2) C V² that gets dissipated in the PMOS channel. Over a 0→1→0 cycle, total switching energy is C V². If the gate switches at frequency f with activity factor α (fraction of clock cycles on which the node toggles), average dynamic power is P_dyn = α C V² f. Add leakage current times Vdd and you have the full power equation. Switching power dominates at high activity; leakage dominates at idle and at advanced nodes.

Why do chips have low-Vt and high-Vt variants of the same cell?

Threshold voltage trades off speed against leakage. A low-V_t transistor switches faster (larger I_drive at a given V_dd – V_t) but leaks exponentially more current when 'off' because subthreshold leakage scales like exp(−V_t / nV_T). A high-V_t transistor is the opposite — slow but low leakage. Modern libraries provide multiple flavours (low, regular, high — sometimes ultra-low and ultra-high) at the same logical function and footprint. Place-and-route tools swap them in: low-V_t on the critical path for speed, high-V_t everywhere else to cap idle leakage. This is the dominant lever for trading dynamic frequency against standby power on a finished design.

What is a standard cell, and why does the library have 200+ of them?

A standard cell is a pre-characterised, fixed-height, hand-laid transistor pattern that implements one logical function — INV, NAND2, NAND3, NOR2, AOI21 (and-or-invert), OAI22, MUX2, full adder, latch, flip-flop, scan flip-flop, and many more. Each function comes in multiple drive strengths (1×, 2×, 4×, 8×) and multiple V_t flavours, so a single AND2 may exist as 30+ physical cells. A modern library carries 200–800 such variants. The synthesis tool maps the gate-level netlist onto this library; place-and-route abuts the cells along rows that share Vdd and Gnd rails. The entire digital backend assumes this fixed vocabulary.

Why did CMOS win over NMOS and bipolar?

NMOS logic (popular in the 1970s — the 8080, 6502, Z80) used an NMOS pull-down plus a passive resistive or depletion-NMOS pull-up. When the output was low, current flowed continuously through the pull-up to ground — so a 100 000-transistor chip drew watts of static current and could not be scaled further. Bipolar TTL had similar problems plus higher minimum supply voltage. CMOS's static-current-near-zero property meant power scaled only with switching activity, so transistor count could grow exponentially without melting the package. Wanlass's 1963 invention sat dormant for almost a decade because PMOS fabrication was hard; once that was solved, CMOS swept the industry by the mid-1980s.