Computer Graphics
GPU Ray Tracing
Following photons in reverse — RT cores trace billions of rays per second
GPU ray tracing simulates light by shooting rays from the camera, finding the closest scene-geometry intersection, and recursively bouncing rays toward lights and reflective surfaces. NVIDIA RT Cores (introduced 2018, Turing) accelerate two operations: (1) bounding-volume hierarchy (BVH) traversal, which prunes geometry tests; and (2) ray-triangle intersection. An RTX 4090 hits ~191 SMs × ~10 GRays/s aggregate. Real-time games use a hybrid pipeline: rasterization for primary visibility, ray tracing for shadows, reflections, ambient occlusion. Tensor cores then "denoise" the noisy 1-sample-per-pixel output back to crisp imagery.
- RT cores per SM1 (Turing), 2 (Ada)
- Aggregate rays~10 GRays/s (RTX 4090)
- BVH traversalHardware-accelerated since Turing
- Acceleration structureBVH, 2-wide or 4-wide branching
- DenoisingSVGF, ReSTIR, OptiX OIDN
- Hybrid pipeline ratioTypically 1–2 rays/pixel/frame
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why GPU ray tracing matters
- Photorealistic reflections. Rasterized screen-space reflections fail outside the camera frustum; ray tracing captures off-screen reflections and inter-object bounces accurately.
- Accurate shadows. Shadow maps suffer from aliasing and contact-shadow gaps; ray-traced shadows resolve hard contacts and area-light penumbras correctly.
- Global illumination. Indirect light bouncing off walls and ceilings — previously baked into static lightmaps — becomes dynamic with RTXGI and similar systems.
- Cyberpunk 2077 path tracing mode. Released April 2023; first AAA shipping path tracer, replaces virtually all rasterized lighting passes with rays.
- RTX Remix. NVIDIA's mod platform for legacy games (Portal RTX, 2022) replaces fixed-function lighting with full path tracing.
- Beyond graphics. Ray traversal hardware accelerates audio occlusion (ray-cast-based reverb), physics queries (raycasting in robotics and game AI), and even some scientific simulations.
- Lumen and Nanite (Unreal 5). Software ray tracing in screen space and SDFs falls back to hardware RT for off-screen and far-distance traces.
Common misconceptions
- "Ray tracing replaces rasterization." Almost every shipping game uses a hybrid pipeline. Primary visibility (what the camera sees first) is rasterized for speed; ray tracing handles secondary effects like reflections, shadows, and AO. Pure path-traced modes are the exception.
- "More rays = always better." Doubling rays per pixel halves variance only by 1 over square-root of 2. Beyond a point, denoising quality dominates over raw sample count; a smart 1-spp signal denoises better than a naive 4-spp signal.
- "DLSS is ray tracing." No — DLSS is neural upscaling that runs on Tensor Cores. It's frequently bundled with RT to recover performance, but the technologies are independent. You can use DLSS without ray tracing and vice versa.
- "It only matters for graphics." Ray traversal is used for audio occlusion, AI line-of-sight checks, robotics path planning, and physics raycasts. Any "what's the closest thing along this direction" query benefits from BVH-accelerated intersection.
- "RT cores do shading." They don't. RT cores only do traversal and intersection; shading happens in regular CUDA cores via the closest-hit and any-hit shaders bound to the ray pipeline.
- "BVH is built per frame." The bottom-level acceleration structure (BLAS) is built once per static mesh; the top-level (TLAS) is rebuilt per frame as objects move. Skinned meshes need BLAS refits, much cheaper than full rebuilds.
- "AMD and Intel can't do it." They can — AMD RDNA 2/3/4 has Ray Accelerators, Intel Arc has Ray Tracing Units. NVIDIA's lead is in mature denoisers (DLSS RR), not in raw RT throughput per dollar.
Frequently asked questions
What is a ray vs a path?
A ray is a single half-line with an origin and direction. A path is a sequence of connected rays formed by recursively bouncing off surfaces. Whitted-style ray tracing (1980) traced primary rays plus deterministic reflection and shadow rays. Modern path tracing samples random bounces according to the rendering equation, accumulating thousands of paths per pixel for noise-free global illumination. Real-time engines fire 1 to 4 paths per pixel per frame and rely on denoising to reconstruct.
How does the BVH speed up ray-triangle tests?
Without acceleration, every ray would test every triangle: a scene with 5 million triangles requires 5 million tests per ray. A bounding-volume hierarchy wraps groups of triangles in axis-aligned boxes, and groups of boxes in larger boxes, forming a tree. A ray descends only into boxes it intersects. This brings cost from O(N) to O(log N), so 5 million triangles take roughly 25 box tests plus a handful of triangle tests per ray. Modern GPU BVHs are 2-wide or 4-wide for SIMD efficiency.
What does the RT core actually do that a CUDA core can't?
An RT core is fixed-function hardware that performs BVH traversal and ray-triangle intersection in parallel with the shader core. CUDA cores can do the same math, but each box test takes dozens of instructions; the RT core does it in a few clocks via dedicated logic. On Turing, one RT core per SM yields roughly 10 gigarays per second of intersection throughput on top-end parts. Ada (RTX 40-series) doubles RT core capability per SM and adds Opacity Micromap and Displaced Micro-Mesh engines.
Why is denoising needed?
Real-time path tracing fires 1 to 4 random paths per pixel; the resulting image is extremely noisy because Monte Carlo integration converges as 1 over square-root of samples. Reaching offline-quality noise levels would need 1000+ paths per pixel, far beyond the millisecond budget. Denoisers reconstruct a clean image: SVGF (spatiotemporal variance-guided filtering) reuses temporal samples, ReSTIR resamples lights, and OptiX OIDN uses a trained neural network. NVIDIA Ray Reconstruction integrates denoising directly into DLSS 3.5+.
How does DLSS relate to ray tracing?
DLSS is image upscaling, not ray tracing. The renderer computes a low-resolution image (say 1080p) and DLSS uses Tensor Cores plus motion vectors and a trained neural network to reconstruct a 4K frame. Ray tracing and DLSS are complementary: trace rays at low resolution, denoise, then upscale. DLSS 3 adds frame generation, interpolating an entirely synthetic frame between two rendered ones. DLSS 3.5 Ray Reconstruction replaces traditional denoisers with a single neural network.
Why don't pure path-traced games run at 60 fps?
Path tracing at 4K with 2 paths per pixel generates 16.6 million rays per frame, and each path bounces 4 to 8 times, so the GPU traces over 100 million ray-scene queries per frame. Even at 10 GRays per second, that's 10 ms just for intersection plus shading, denoising, BVH refit for animated geometry, and rasterized passes. Cyberpunk 2077 path tracing hits 16 to 20 fps native at 4K on RTX 4090; DLSS 3 frame generation lifts it to 60-100 fps by upscaling and interpolating.