Computer Graphics

Rasterization

How a 3D triangle becomes a wall of lit pixels — billions of times a second

Rasterization projects 3D triangles onto the screen and fills the pixels they cover, deciding inside/outside with edge functions and depth with a Z-buffer — the O(triangles × pixels) scan-conversion behind every real-time GPU.

  • CostO(triangles × covered pixels)
  • Inside test3 edge functions, same sign
  • VisibilityZ-buffer, O(1)/pixel
  • PrimitiveTriangle (planar, convex)
  • ThroughputBillions of frags/sec on a GPU

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How rasterization turns triangles into pixels

Everything you see in a real-time game or 3D app is a mesh of triangles. Rasterization is the step that answers a deceptively concrete question for every one of them: which screen pixels does this triangle cover, and what color should each one be? The genius is that it loops over triangles, not pixels — and for each triangle it touches only the pixels under its silhouette, never the whole screen.

The pipeline runs in five stages. First, each vertex is transformed from model space through world, view, and projection matrices into clip space. Second, the GPU performs the perspective divide (divide x, y, z by w) to land in normalized device coordinates, then maps that to integer pixel coordinates in the viewport transform. Third comes triangle setup: compute the bounding box of the projected triangle and the three edge-function coefficients. Fourth, coverage: for every candidate pixel in the bounding box, evaluate the three edge functions; a pixel is inside when all three share the same sign. Fifth, for each covered pixel the GPU produces a fragment — interpolating depth, color, normals, and texture coordinates from the three vertices using barycentric weights — and runs the fragment shader.

The inside test is the heart of it. For a directed edge from vertex A to B, the edge function

E_AB(P) = (P.x − A.x)·(B.y − A.y) − (P.y − A.y)·(B.x − A.x)

is the signed area of the parallelogram spanned by AB and AP — literally the z-component of a 2D cross product. Its sign tells you which side of the edge P lies on. Evaluate all three edges of the triangle; if the signs all match (all positive for a counter-clockwise triangle), P is inside. The beautiful part: each edge function is linear in P, so once you compute it at one corner you advance to the next pixel by adding a constant. Coverage becomes pure integer addition — no multiplies in the inner loop.

Barycentric coordinates do the interpolation

Once a pixel is inside, you need to blend the three vertices' attributes. The three edge functions, normalized by the total triangle area, are the barycentric coordinates (λ₀, λ₁, λ₂) that sum to 1. Any per-vertex attribute — color, depth, a texture coordinate — is interpolated as λ₀·a₀ + λ₁·a₁ + λ₂·a₂. Depth interpolates linearly in screen space (that's why the Z-buffer works), but texture coordinates do not, because perspective is non-linear. The fix is to interpolate u/w, v/w, and 1/w linearly, then divide at each pixel — perspective-correct interpolation. Skip it and textures visibly swim across surfaces, the signature wobble of PlayStation 1-era graphics.

When to rasterize (and when not to)

  • Real-time interaction — games, CAD viewports, AR/VR, anything that must hit 60–120 FPS. Rasterization is the only approach that fits a full frame in ~8–16 ms on consumer hardware.
  • Heavy triangle counts, simple shading — the cost scales with covered pixels, not scene complexity squared, so millions of triangles stay tractable.
  • Hardware acceleration — every GPU since the late 1990s has a fixed-function rasterizer wired directly into silicon; you get it almost for free.

Reach for spatial acceleration structures and ray tracing instead when you need physically accurate global illumination, soft shadows, mirror reflections, or refraction — effects that require following light bounces rasterization can't see. Modern engines do both: rasterize the primary visibility pass, then ray-trace selected effects on top. Offline film renderers favor path tracing because they have minutes per frame, not milliseconds.

Rasterization vs other visibility approaches

RasterizationRay tracingPainter's algorithmRay casting (2.5D)
Loop orderPer trianglePer pixel/rayPer polygon (sorted)Per screen column
VisibilityZ-buffer, per pixelNearest hit along rayBack-to-front overdrawFirst wall hit
CostO(tris × covered px)O(px × log tris) with BVHO(n log n) sort + overdrawO(columns × steps)
Secondary raysNo (needs hacks)NativeNoNo
InterpenetrationHandled per pixelHandledBreaks (cyclic overlap)N/A
HardwareFixed-function on every GPURT cores since 2018CPU, historicalCPU, historical
Typical useGames, real-time 3DFilm, RT reflectionsEarly 3D, 2D spritesWolfenstein/Doom era

The painter's algorithm — sort polygons far-to-near and paint over — was rasterization's predecessor, but it fails on mutually overlapping triangles (a cyclic depth order has no valid sort) and wastes work on overdraw. The Z-buffer, introduced in Ed Catmull's 1974 thesis, replaced it by deciding visibility per pixel instead of per polygon, which is why it survived.

What the numbers actually say

  • The inner loop is nearly free. After triangle setup, each pixel costs 3 additions to step the edge functions plus 1 comparison for coverage — no multiplies, no divides. This is why rasterizers fit in a few thousand transistors of fixed-function logic.
  • A 1080p frame is ~2.07 million pixels. At 60 FPS that's 124 million pixels/sec just for primary coverage, and with overdraw (each pixel touched by several triangles) and multisampling, GPUs routinely process billions of fragments per second.
  • Tiny triangles wreck efficiency. GPUs shade in 2×2 quads for derivative computation, so a triangle smaller than 2×2 pixels still spawns a full quad — up to 4× wasted shading. Sub-pixel triangles (common in over-tessellated meshes) can push useful-shading efficiency below 25%.
  • Z-buffer bandwidth dominates. A 1080p 32-bit depth buffer is ~8 MB; read-modify-write per fragment is why early-Z rejection and hierarchical-Z compression exist — they cut depth traffic by skipping occluded fragments before shading.
  • Setup cost is amortized over coverage. A triangle covering 1 pixel pays the same ~dozen setup operations as one covering 100,000 pixels, so large triangles are dramatically more efficient per pixel.

JavaScript implementation

A complete software rasterizer for one triangle with per-vertex color and a Z-buffer. This is exactly what the hardware does, just slower:

// Signed area of triangle ABP (2× the area). Sign = which side of AB point P is on.
function edge(ax, ay, bx, by, px, py) {
  return (px - ax) * (by - ay) - (py - ay) * (bx - ax);
}

// Rasterize triangle (v0, v1, v2). Each vertex: {x, y, z, r, g, b} in pixel/screen space.
function rasterize(v0, v1, v2, framebuffer, zbuffer, W, H) {
  // Bounding box, clamped to the screen.
  const minX = Math.max(0, Math.floor(Math.min(v0.x, v1.x, v2.x)));
  const maxX = Math.min(W - 1, Math.ceil(Math.max(v0.x, v1.x, v2.x)));
  const minY = Math.max(0, Math.floor(Math.min(v0.y, v1.y, v2.y)));
  const maxY = Math.min(H - 1, Math.ceil(Math.max(v0.y, v1.y, v2.y)));

  const area = edge(v0.x, v0.y, v1.x, v1.y, v2.x, v2.y);
  if (area === 0) return;            // degenerate (zero-area) triangle

  for (let y = minY; y <= maxY; y++) {
    for (let x = minX; x <= maxX; x++) {
      const px = x + 0.5, py = y + 0.5;   // sample at pixel center
      // Edge functions = unnormalized barycentric weights.
      let w0 = edge(v1.x, v1.y, v2.x, v2.y, px, py);
      let w1 = edge(v2.x, v2.y, v0.x, v0.y, px, py);
      let w2 = edge(v0.x, v0.y, v1.x, v1.y, px, py);

      // Inside iff all three share the sign of the total area.
      if ((w0 >= 0) === (area > 0) &&
          (w1 >= 0) === (area > 0) &&
          (w2 >= 0) === (area > 0)) {
        const l0 = w0 / area, l1 = w1 / area, l2 = w2 / area;
        const z = l0 * v0.z + l1 * v1.z + l2 * v2.z;   // depth interpolates linearly
        const i = y * W + x;
        if (z < zbuffer[i]) {             // Z-test: nearer wins
          zbuffer[i] = z;
          framebuffer[i * 3]     = l0 * v0.r + l1 * v1.r + l2 * v2.r;
          framebuffer[i * 3 + 1] = l0 * v0.g + l1 * v1.g + l2 * v2.g;
          framebuffer[i * 3 + 2] = l0 * v0.b + l1 * v1.b + l2 * v2.b;
        }
      }
    }
  }
}

Two details mirror real hardware. The (w >= 0) === (area > 0) comparison handles either winding (clockwise or counter-clockwise) without flipping vertices, and sampling at the pixel center (+ 0.5) matches the GPU's top-left fill convention that prevents adjacent triangles from double-drawing or leaving gaps along a shared edge.

Python implementation

import math

def edge(ax, ay, bx, by, px, py):
    return (px - ax) * (by - ay) - (py - ay) * (bx - ax)

def rasterize(v0, v1, v2, frame, zbuf, W, H):
    """Each vertex is a dict: {'x','y','z','r','g','b'} in screen space."""
    min_x = max(0, math.floor(min(v0['x'], v1['x'], v2['x'])))
    max_x = min(W - 1, math.ceil(max(v0['x'], v1['x'], v2['x'])))
    min_y = max(0, math.floor(min(v0['y'], v1['y'], v2['y'])))
    max_y = min(H - 1, math.ceil(max(v0['y'], v1['y'], v2['y'])))

    area = edge(v0['x'], v0['y'], v1['x'], v1['y'], v2['x'], v2['y'])
    if area == 0:                      # degenerate triangle
        return

    pos = area > 0
    for y in range(min_y, max_y + 1):
        for x in range(min_x, max_x + 1):
            px, py = x + 0.5, y + 0.5
            w0 = edge(v1['x'], v1['y'], v2['x'], v2['y'], px, py)
            w1 = edge(v2['x'], v2['y'], v0['x'], v0['y'], px, py)
            w2 = edge(v0['x'], v0['y'], v1['x'], v1['y'], px, py)
            if (w0 >= 0) == pos and (w1 >= 0) == pos and (w2 >= 0) == pos:
                l0, l1, l2 = w0 / area, w1 / area, w2 / area
                z = l0 * v0['z'] + l1 * v1['z'] + l2 * v2['z']
                i = y * W + x
                if z < zbuf[i]:
                    zbuf[i] = z
                    frame[i] = (
                        l0 * v0['r'] + l1 * v1['r'] + l2 * v2['r'],
                        l0 * v0['g'] + l1 * v1['g'] + l2 * v2['g'],
                        l0 * v0['b'] + l1 * v1['b'] + l2 * v2['b'],
                    )

For perspective-correct attributes (textures), store u/w, v/w, and 1/w per vertex, interpolate all three with the barycentric weights, then divide the first two by the interpolated 1/w at each pixel. Depth (z) is the exception that interpolates correctly in screen space directly.

Variants worth knowing

Scanline rasterization. The classic CPU technique: sort edges, walk one scanline at a time, and fill the span between the left and right active edges. Cache-friendly and branch-light, but harder to parallelize than the bounding-box edge-function method GPUs use.

Tiled / binned rasterization. Split the screen into tiles (e.g. 16×16) and bin each triangle into the tiles it overlaps. Mobile GPUs (PowerVR, Apple, Mali, Adreno) rasterize tile-by-tile so the depth and color buffers fit in fast on-chip memory, slashing external bandwidth — the foundation of deferred tile-based rendering.

Conservative rasterization. A fragment is generated if the triangle touches a pixel at all, not just its center. Essential for voxelization, GPU collision detection, and some shadow techniques where missing a sliver of coverage causes leaks.

Hierarchical-Z and early-Z. Test depth before shading (early-Z) and against a coarse per-tile max depth (Hi-Z) to reject occluded fragments cheaply, avoiding the cost of running an expensive fragment shader on pixels that lose the depth test.

Visibility buffer (deferred shading's cousin). Rasterize only triangle IDs and barycentric data into a thin G-buffer, then shade in a separate full-screen pass. Decouples geometry density from shading cost — increasingly common for scenes with millions of micro-triangles.

Common bugs and edge cases

  • Affine texture interpolation. Interpolating u, v directly instead of u/w, v/w makes textures warp and swim under perspective. The fix is one extra divide per pixel.
  • Fill-rule cracks and double-draw. Without a consistent tie-break for pixels exactly on a shared edge (the top-left rule), seams between adjacent triangles either show background pixels or get drawn twice, causing blending artifacts.
  • Z-fighting. Two coplanar surfaces at nearly equal depth flicker because limited Z-buffer precision can't separate them. Mitigate with a depth bias (polygon offset), a wider near plane, or a reversed-Z floating-point depth buffer.
  • Missing near-plane clipping. Triangles crossing behind the camera produce a negative w; dividing by it flips coordinates and smears geometry across the screen. You must clip against the near plane before the perspective divide.
  • Tiny / sub-pixel triangles. Over-tessellation spawns triangles smaller than the 2×2 quad granularity, collapsing shading efficiency. Level-of-detail meshes and cluster culling exist to keep triangle size sane.
  • Forgetting back-face culling. Drawing both sides of every triangle roughly doubles rasterization work; cull triangles whose edge-function area sign indicates they face away from the camera.

Frequently asked questions

What is the difference between rasterization and ray tracing?

Rasterization loops over triangles and asks 'which pixels does this triangle cover?' Ray tracing loops over pixels and asks 'which triangle does this ray hit?' Rasterization is far cheaper per frame because it processes each triangle once and uses a Z-buffer for visibility, but it can't natively follow secondary rays, so reflections, refraction, and accurate shadows need extra passes or hacks.

Why do GPUs rasterize triangles instead of quads or polygons?

Three points are always coplanar and always convex, so a triangle's interior is unambiguous and its three edge functions are linear. Quads can be non-planar and can fold, making interpolation ambiguous. Any polygon can be split into triangles, so hardware standardizes on the simplest primitive that interpolates cleanly.

How does the edge function decide if a pixel is inside a triangle?

For each edge from vertex A to B, the edge function E(P) = (P.x − A.x)(B.y − A.y) − (P.y − A.y)(B.x − A.x) is the signed area of the parallelogram, i.e. the sign of the cross product. A pixel is inside when all three edge functions share the same sign. Each function is linear in P, so you compute it once per triangle and add a constant step per pixel.

What is the Z-buffer and why is it needed?

The Z-buffer (depth buffer) stores the closest depth seen so far at each pixel. Because rasterization draws triangles in arbitrary order, a fragment is only written if its interpolated depth is nearer than the stored value. This solves hidden-surface removal per pixel in O(1), replacing the older painter's algorithm that sorted whole polygons and broke on interpenetration.

Why does texture mapping need perspective-correct interpolation?

Linearly interpolating texture coordinates in screen space is wrong because perspective projection is non-linear: equal steps in screen space are unequal steps along the surface. You must interpolate u/w, v/w, and 1/w linearly, then divide at each pixel. Skipping this produces the warping swim seen in early PlayStation 1 textures, which interpolated affinely.

What causes jagged edges in rasterized images and how is it fixed?

A pixel is treated as either fully inside or fully outside a triangle, so slanted edges become staircases (aliasing). Multisample anti-aliasing (MSAA) takes several coverage samples per pixel but shades once; supersampling renders at higher resolution then downsamples; post-process methods like FXAA and temporal AA blur or accumulate across frames. Each trades quality for memory or temporal stability.