Computer Graphics

Ray Tracing

Follow a single beam of light backward, and the whole image falls into place

Ray tracing renders an image by shooting a ray from the camera through every pixel into the scene, finding the nearest surface hit, then spawning shadow, reflection, and refraction rays to compute physically accurate color.

Naive cost per rayO(n) objects
With a BVHO(log n)
Path-trace noisefalls as 1/√N
Rays per pixel (offline)100–10,000
Recursion depthtypically 4–16 bounces

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How ray tracing builds an image

The deep trick of ray tracing is that it runs the camera in reverse. In the real world, a light bulb sprays photons everywhere; a few bounce off surfaces, change color, and eventually land on your retina or a camera sensor. Simulating that forward is hopeless — for every photon that reaches the lens, billions miss entirely. So ray tracing inverts the problem: it starts at the eye and asks, for each pixel, which surface is visible here, and how is it lit?

The core loop is brutally simple. Place a virtual camera at a point in space and a virtual image plane in front of it, one cell per pixel. For each pixel, build a primary ray from the camera through that cell. March that ray into the scene and find the nearest object it strikes — the first t > 0 along the ray. That hit point, its surface normal, and its material are everything you need to shade the pixel.

Shading is where the recursion begins. At the hit point you ask three more questions, each answered by spawning new rays:

Is it lit? Fire a shadow ray from the hit point toward each light. If it reaches the light unobstructed, add that light's contribution; if it hits something first, the point is in shadow.
Is it shiny? If the surface is reflective, fire a reflection ray mirrored about the normal and recurse — the color it returns is blended into this pixel.
Is it transparent? If the surface refracts, bend a refraction ray through it using Snell's law and recurse again.

This recursion is the difference between flat ray casting and full ray tracing. Each bounce can spawn more bounces — a mirror reflecting glass reflecting a mirror — and you stop when a ray escapes the scene, hits a non-reflective surface, or reaches a maximum recursion depth (a glass-and-mirror room would otherwise recurse forever).

The heart of it: ray–object intersection

Every ray is a parametric line, P(t) = O + t·D, where O is the origin and D the direction. Intersecting it with a shape means solving for t. The sphere case is the classic teaching example because it reduces to a quadratic:

|O + tD − C|² = r²            (point on sphere of center C, radius r)
(D·D)t² + 2D·(O−C)t + (O−C)·(O−C) − r² = 0
            a t² + b t + c = 0

The discriminant b² − 4ac tells the story: negative means a miss, zero means a tangent graze, positive means two hits (entry and exit). Take the smaller positive root — that's the nearest surface in front of the camera. Triangles use the Möller–Trumbore algorithm; planes and boxes have their own closed forms. A renderer is, at bottom, a pile of these intersection routines plus the recursion that glues them together.

When ray tracing is the right tool

Photorealism is the product — film VFX, architectural visualization, product renders. Pixar, ILM, and every major studio render with path tracers because correctness beats speed when you have a render farm and overnight.
Effects that fake badly in rasterization — accurate reflections, refraction through glass and water, soft shadows, ambient occlusion, and global illumination all emerge naturally from following light, instead of being bolted on as screen-space hacks.
Hybrid real-time pipelines — modern games rasterize the bulk of the frame and trace rays only for reflections, shadows, or GI, then denoise. This is what "RTX on" actually means.

When not to reach for it: tight real-time budgets without RT hardware, where rasterization's O(triangles) pipeline is still an order of magnitude cheaper; and stylized or flat-shaded looks where physical accuracy buys you nothing. Rasterization answers "where does this triangle land on screen?" and is embarrassingly parallel; ray tracing answers "what does this pixel see?" and pays for that generality.

Ray tracing vs the alternatives

	Rasterization	Ray casting	Whitted ray tracing	Path tracing	Photon mapping
Direction of work	geometry → pixels	eye → first hit	eye, recursive	eye, stochastic	light + eye passes
Reflections / refraction	screen-space hacks	none	sharp, exact	full, physical	full, physical
Global illumination	baked / probes	none	no (direct only)	yes (converges)	yes (caustics shine)
Soft shadows	shadow maps	hard only	many shadow rays	free via sampling	free
Image quality vs time	instant, approximate	instant, flat	seconds, sharp CGI	noisy → clean as 1/√N	fast convergence, biased
Where it's used	games, real-time	early demos, voxels	1980s–90s CGI	film, offline default	caustics, render research

The headline split is rasterization vs ray tracing. Rasterization is faster because it never asks the hard global question — it projects triangles and resolves overlap with a depth buffer. Ray tracing is more correct because every pixel queries the whole scene. Whitted's 1980 method made reflections and refractions tractable but ignored indirect light; path tracing (Kajiya, 1986) unified everything under the rendering equation at the cost of noise.

What the numbers actually say

The naive cost is staggering. Cost is roughly pixels × samples × objects × bounces intersection tests. A 1920×1080 frame at 1000 samples per pixel, against a 1-million-triangle scene, at 8 bounces, is about 1.6 × 10¹⁶ ray–triangle tests for a single image. Brute force is a non-starter.
A BVH turns O(n) into O(log n) per ray. Replacing a linear scan of 1M triangles (≈10⁶ tests) with a balanced bounding-volume hierarchy drops the per-ray work to ≈20 box tests plus a handful of triangle tests — a roughly 10,000× reduction in the inner loop.
Noise falls as 1/√N. Halving the visible grain in a path-traced image requires 4× the samples. Going from 100 to a clean 10,000 samples is a 100× cost increase to cut noise by 10×. This square-root wall is why denoisers exist.
Hardware moved the goalposts. Dedicated RT cores in modern GPUs perform on the order of billions of ray–triangle intersections per second, making real-time hybrid ray tracing at 1–4 samples per pixel (then AI-denoised) viable at 60+ FPS — something that took minutes per frame on CPUs two decades ago.

JavaScript implementation

A minimal Whitted-style tracer: primary rays, nearest-hit, diffuse shading with a shadow ray, and one level of mirror reflection. It renders into an ImageData buffer you can blit to a canvas.

const sub = (a, b) => [a[0]-b[0], a[1]-b[1], a[2]-b[2]];
const add = (a, b) => [a[0]+b[0], a[1]+b[1], a[2]+b[2]];
const scale = (a, s) => [a[0]*s, a[1]*s, a[2]*s];
const dot = (a, b) => a[0]*b[0] + a[1]*b[1] + a[2]*b[2];
const norm = a => scale(a, 1 / Math.sqrt(dot(a, a)));

// Sphere: { c: center, r: radius, color, mirror: 0..1 }
function hitSphere(O, D, s) {
  const oc = sub(O, s.c);
  const a = dot(D, D), b = 2*dot(oc, D), c = dot(oc, oc) - s.r*s.r;
  const disc = b*b - 4*a*c;
  if (disc < 0) return Infinity;          // miss
  const t = (-b - Math.sqrt(disc)) / (2*a);
  return t > 1e-3 ? t : Infinity;          // nearest hit in front
}

function nearest(O, D, scene) {
  let best = { t: Infinity, s: null };
  for (const s of scene.spheres) {
    const t = hitSphere(O, D, s);
    if (t < best.t) best = { t, s };
  }
  return best;
}

function trace(O, D, scene, depth) {
  const { t, s } = nearest(O, D, scene);
  if (!s) return [10, 14, 26];             // background (the void)

  const P = add(O, scale(D, t));           // hit point
  const N = norm(sub(P, s.c));             // surface normal
  const L = norm(sub(scene.light, P));     // toward the light

  // Shadow ray: is anything between P and the light?
  const shadow = nearest(add(P, scale(N, 1e-3)), L, scene);
  const lit = shadow.s ? 0 : Math.max(0, dot(N, L));
  let col = scale(s.color, 0.1 + 0.9*lit); // ambient + diffuse

  // One bounce of mirror reflection
  if (s.mirror > 0 && depth > 0) {
    const R = sub(D, scale(N, 2*dot(D, N)));
    const refl = trace(add(P, scale(N, 1e-3)), norm(R), scene, depth - 1);
    col = add(scale(col, 1 - s.mirror), scale(refl, s.mirror));
  }
  return col;
}

function render(ctx, w, h, scene) {
  const img = ctx.createImageData(w, h);
  const cam = [0, 0, -5];
  for (let y = 0; y < h; y++) {
    for (let x = 0; x < w; x++) {
      // map pixel to a point on the image plane at z = 0
      const px = (x / w - 0.5) * 2, py = (0.5 - y / h) * 2;
      const D = norm(sub([px, py, 0], cam));
      const c = trace(cam, D, scene, 3);   // max depth 3
      const i = (y*w + x) * 4;
      img.data[i] = c[0]; img.data[i+1] = c[1]; img.data[i+2] = c[2]; img.data[i+3] = 255;
    }
  }
  ctx.putImageData(img, 0, 0);
}

Two details that trip everyone up are already in here. The 1e-3 offset along the normal when spawning shadow and reflection rays is the shadow-acne fix: without it, a ray re-intersects the very surface it started on at t ≈ 0 and the object shadows itself. And the depth counter is what stops infinite recursion between two mirrors.

Python implementation

The same tracer with NumPy, vectorized over all pixels at once — instead of a per-pixel loop, every operation is a batched array op, which is how Python becomes fast enough to actually use.

import numpy as np

def normalize(v):
    return v / np.linalg.norm(v, axis=-1, keepdims=True)

def intersect_sphere(O, D, center, radius):
    # O: (N,3) ray origins, D: (N,3) directions  ->  (N,) distances
    oc = O - center
    b = 2.0 * np.einsum('ij,ij->i', oc, D)
    c = np.einsum('ij,ij->i', oc, oc) - radius * radius
    disc = b * b - 4.0 * c                     # a = 1 since D is unit
    t = np.full(O.shape[0], np.inf)
    hit = disc >= 0
    sq = np.sqrt(np.maximum(disc, 0.0))
    root = (-b - sq) / 2.0
    valid = hit & (root > 1e-3)
    t[valid] = root[valid]
    return t

def render(w, h, spheres, light):
    # build one primary ray per pixel
    cam = np.array([0.0, 0.0, -5.0])
    xs = (np.arange(w) / w - 0.5) * 2.0
    ys = (0.5 - np.arange(h) / h) * 2.0
    gx, gy = np.meshgrid(xs, ys)
    pts = np.stack([gx.ravel(), gy.ravel(), np.zeros(w * h)], axis=-1)
    D = normalize(pts - cam)
    O = np.broadcast_to(cam, D.shape).copy()

    # nearest hit across all spheres, all rays at once
    best_t = np.full(O.shape[0], np.inf)
    best_i = np.full(O.shape[0], -1, dtype=int)
    for i, s in enumerate(spheres):
        t = intersect_sphere(O, D, s['c'], s['r'])
        closer = t < best_t
        best_t[closer] = t[closer]
        best_i[closer] = i

    color = np.tile(np.array([10, 14, 26]), (O.shape[0], 1)).astype(float)
    for i, s in enumerate(spheres):
        m = best_i == i
        P = O[m] + D[m] * best_t[m, None]
        N = normalize(P - s['c'])
        L = normalize(light - P)
        diffuse = np.clip(np.einsum('ij,ij->i', N, L), 0, 1)
        color[m] = s['color'] * (0.1 + 0.9 * diffuse)[:, None]

    return np.clip(color, 0, 255).astype(np.uint8).reshape(h, w, 3)

Vectorizing the whole image is the standard speed move in any Python tracer — the per-ray Python overhead would otherwise dominate. The same idea scales: production GPU tracers run thousands of rays in lockstep across SIMD lanes for exactly this reason.

Variants worth knowing

Path tracing. Instead of Whitted's fixed reflection/refraction rays, scatter each ray randomly according to the surface's BRDF and average many such paths. It solves the full rendering equation and produces unbiased global illumination — color bleeding, soft shadows, indirect light — at the cost of noise that fades as 1/√N. This is the offline industry default.

Bidirectional path tracing. Trace paths from both the camera and the lights, then connect them. It dramatically improves convergence for scenes where light reaches the camera only through hard-to-find paths — light through a keyhole, caustics under water.

Photon mapping. A two-pass method: first shoot photons from the lights and store where they land, then trace from the eye and gather nearby photons. Excellent at caustics — the bright dancing patterns at the bottom of a pool — which pure backward tracing struggles to find.

Distribution (stochastic) ray tracing. Cook's 1984 idea: jitter rays across a pixel, a lens aperture, a time interval, and an area light to get antialiasing, depth of field, motion blur, and soft shadows — all by sampling distributions rather than firing one ray.

Cone and beam tracing. Replace the infinitely thin ray with a cone or beam that has area, so a single primary "ray" can cover a pixel's footprint and antialias analytically. Mathematically heavier per intersection, rarely used in practice but conceptually clean.

Common bugs and edge cases

Shadow acne / self-intersection. Spawning a secondary ray exactly at the hit point makes it re-hit the same surface at t ≈ 0. Fix by offsetting the origin a tiny epsilon along the normal, or by ignoring hits below a small t_min.
Forgetting to normalize the ray direction. Many intersection formulas assume |D| = 1. An unnormalized direction quietly corrupts your t values and lighting; the image looks subtly wrong with no error.
Picking the wrong quadratic root. Take the smallest positive root, not the smaller of two roots — if the camera is inside a sphere, the near root is negative and you must skip it.
Infinite recursion in mirrored scenes. Two parallel mirrors recurse forever without a depth cap; always pass and decrement a maximum bounce count.
Total internal reflection in refraction. When light hits a dense-to-thin boundary past the critical angle, Snell's law has no solution — the term under the square root goes negative. You must detect this and reflect instead of refract, or you get black holes in glass.
Gamma and clamping. Light is linear; displays are not. Forgetting to gamma-correct makes everything muddy and dark, and clamping HDR colors to 0–255 too early blows out highlights.
No acceleration structure. A correct tracer with a linear object scan is correct but unusably slow past a few hundred objects. A BVH isn't an optimization you add later — it's table stakes for any real scene.

Frequently asked questions

Why do we trace rays backward from the camera instead of forward from the light?

A light source emits photons in every direction, and only a vanishing fraction ever reach the camera — forward tracing would waste almost all of its work. Backward tracing starts from the pixels we actually need to fill, so every ray contributes to the final image. The catch: backward tracing can't naturally capture effects that depend on where light came from, like caustics, which is why path tracing and bidirectional methods exist.

What's the difference between ray casting, ray tracing, and path tracing?

Ray casting shoots one ray per pixel and shades the first hit — no secondary rays, so no reflections or refractions. Classic (Whitted) ray tracing adds recursive shadow, reflection, and refraction rays for sharp mirror and glass effects. Path tracing replaces the deterministic recursion with random sampling of the full rendering equation, converging to photorealistic global illumination as you average thousands of samples per pixel.

How expensive is ray tracing, and why is it slow?

A naive tracer tests every ray against every object: O(pixels × samples × objects × bounces). A 1920×1080 image at 1000 samples against 1M triangles with 8 bounces is roughly 16 quadrillion intersection tests. Acceleration structures like a BVH cut the per-ray cost from O(n) to O(log n), and modern GPUs add dedicated ray-triangle intersection hardware (RT cores) that does billions of tests per second.

What is a BVH and why does ray tracing need one?

A bounding volume hierarchy is a tree of nested boxes wrapping the scene's geometry. A ray first tests the cheap box at each node; if it misses, the entire subtree is skipped. This turns the per-ray search from a linear scan of every triangle into a logarithmic descent, the single biggest speedup in any production renderer. KD-trees and grids solve the same problem with different trade-offs.

Why do ray-traced images look grainy before they finish?

Path tracing estimates each pixel by averaging random light paths. With few samples the estimate is noisy — visible as grain — and noise falls only as 1/√N, so halving the noise needs four times the samples. That's why real-time ray tracing leans heavily on AI denoisers that reconstruct a clean image from a sparse 1-to-4-samples-per-pixel signal.

Does ray tracing handle shadows automatically?

Yes — that's one of its elegances. At each surface hit you fire a shadow ray toward each light. If that ray hits anything before reaching the light, the point is in shadow. Soft shadows come for free in path tracing by sampling random points on an area light; in classic ray tracing they require firing many shadow rays per light and averaging the visibility.