Calculus
Partial Derivatives
Differentiate one variable at a time — the building block of every multivariable result
A partial derivative ∂f/∂x of a multivariable function f(x, y, ...) is the rate of change of f with respect to x while every other variable is held fixed. Formally, ∂f/∂x = lim_{h→0} [f(x+h, y, ...) − f(x, y, ...)] / h. Geometrically, it is the slope of the curve formed by slicing the graph of f with the plane y = constant. Partial derivatives are the building blocks of the gradient, the Jacobian, the Hessian, every partial differential equation, and most of multivariable calculus.
- Notation∂f/∂x, f_x, D_x f, ∂_x f
- Definition∂f/∂x = lim_{h→0} [f(x+h, y, …) − f(x, y, …)] / h
- Holds fixedAll variables except x
- Mixed partials equalIf continuous (Clairaut's theorem)
- Symbol introducedLegendre, 1786; popularised by Jacobi, 1841
- Used inGradient, Jacobian, Hessian, PDEs, optimisation
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The limit definition
For a function f(x, y) of two variables, the partial derivative with respect to x at the point (x, y) is the limit:
∂f/∂x = lim_{h → 0} [f(x + h, y) − f(x, y)] / h
Compare this with the single-variable derivative:
df/dx = lim_{h → 0} [f(x + h) − f(x)] / h
The forms are identical. The difference is what the function is — a multivariable f has more arguments, but during the limit we vary only one of them, holding y (and any other variables) frozen. Practically, you compute partial derivatives by treating the held variables as constants and differentiating with the usual single-variable rules.
The partial with respect to y is defined symmetrically:
∂f/∂y = lim_{h → 0} [f(x, y + h) − f(x, y)] / h
For functions of three or more variables, every partial derivative ∂f/∂x_i has the same form: vary x_i while every other variable stays fixed.
Geometric interpretation
Plot z = f(x, y) as a surface in three-dimensional space. Slicing the surface with the plane y = y₀ (a vertical plane parallel to the xz-plane through y = y₀) produces a curve. That curve is the graph of g(x) = f(x, y₀) — a single-variable function of x. Its slope at x = x₀ is the partial derivative ∂f/∂x at (x₀, y₀).
Slicing with x = x₀ produces another curve, whose slope at y₀ is ∂f/∂y. Each partial derivative is a slope along one coordinate direction at the chosen point. Sliding the slicing plane around the surface gives the partial derivative as a function — the original surface's "directional slopes" along the x and y axes.
Worked example — a simple computation
Let f(x, y) = x² y + 3xy² + sin(x). Compute the partials.
Treat y as a constant. Then:
∂f/∂x = 2xy + 3y² + cos(x)
Treat x as a constant. Then:
∂f/∂y = x² + 6xy
Each derivative obeys the same single-variable rules — power rule, sum rule, the derivatives of sin, ln, exp — but the "constant" you ignore changes depending on which variable you are differentiating against. Notice that ∂(sin x)/∂y = 0 because sin(x) is treated as a constant when y is the variable.
Higher-order and mixed partials
Partial derivatives of partial derivatives:
∂²f/∂x² = ∂/∂x (∂f/∂x)
∂²f/∂y² = ∂/∂y (∂f/∂y)
∂²f/∂x∂y = ∂/∂x (∂f/∂y) ("mixed partial")
∂²f/∂y∂x = ∂/∂y (∂f/∂x) (the other mixed partial)
Clairaut's theorem (also called Schwarz's theorem on the equality of mixed partials) says ∂²f/∂x∂y = ∂²f/∂y∂x whenever both exist and are continuous near the point. For any function you can write down using elementary operations (polynomial, trig, exp, log, ...) the mixed partials are equal. Pathological counterexamples exist where the second partials are discontinuous, but they require deliberate construction.
For our running example f(x, y) = x²y + 3xy² + sin(x):
∂²f/∂x∂y = ∂/∂x (x² + 6xy) = 2x + 6y
∂²f/∂y∂x = ∂/∂y (2xy + 3y² + cos x) = 2x + 6y ✓ equal
Partial vs total derivative — key distinctions
| Partial derivative ∂f/∂x | Total derivative df/dx | |
|---|---|---|
| Domain | Multivariable function f | Composition along a path |
| Other variables | Held fixed | Allowed to depend on x |
| Notation | ∂f/∂x, f_x | df/dx, sometimes Df |
| Value at a point | Single number per direction | Single number (chain-rule sum) |
| Gradient relation | Components of ∇f | ∇f · velocity |
| Captures | Direct dependence on one variable | Direct + indirect dependence on x |
| Equation example | Heat equation: ∂u/∂t = α ∇²u | Lagrangian mechanics: dL/dt |
| Common in | PDEs, vector calculus, optimisation | Thermodynamics, economics, dynamics |
The most important takeaway: partial derivatives ignore indirect chains of dependence. If u and v both depend on time, ∂(u + v)/∂t treats v as a frozen variable; d(u + v)/dt accounts for v's own time dependence.
The multivariable chain rule
If z = f(x, y) and x = g(t), y = h(t), how does z change with t? Both x and y now depend on t, so the change in z has two contributions:
dz/dt = (∂f/∂x) (dx/dt) + (∂f/∂y) (dy/dt)
Each term is "rate of change in one direction × velocity in that direction." The formula generalises: for f(x₁, x₂, …, x_n) with each x_i = g_i(t),
df/dt = Σ_i (∂f/∂x_i) (dx_i/dt) = ∇f · (velocity vector)
This is precisely how the directional derivative arises (∇f · û), and how backpropagation in neural networks chains together gradients across layers.
The heat equation — partial derivatives in space and time
One of the most influential equations of mathematical physics is the heat (or diffusion) equation:
∂u/∂t = α ∇²u = α (∂²u/∂x² + ∂²u/∂y² + ∂²u/∂z²)
Here u(x, y, z, t) is the temperature at position (x, y, z) at time t, and α is the thermal diffusivity. Why partials and not total derivatives? Because temperature really depends on both space and time. The left side asks: how fast does temperature change at a fixed location? The right side asks: how does the temperature at a point compare to the spatial average around it?
If u at a point is lower than the average of u in a small neighborhood, ∇²u > 0 and ∂u/∂t > 0 — the point heats up. If u is higher than the average, ∇²u < 0 and the point cools down. Partial derivatives in space encode "how spatially un-flat is the temperature?"; the partial in time encodes "how fast is the temperature evolving?" The same equation governs particle diffusion, the smoothing of images, and the smoothing of probability distributions in stochastic processes.
Partial derivatives in economics — marginals
In economics, partial derivatives are called marginals. If U(x₁, x₂, ...) is a utility function, ∂U/∂x_i is the marginal utility of good i. If C(K, L) is the cost as a function of capital K and labour L, ∂C/∂K is the marginal cost of capital. Production functions f(K, L) — the Cobb–Douglas form A K^α L^β is canonical — have partials that economists call marginal product of capital and marginal product of labour.
The principle of equal marginal utility per dollar — ∂U/∂x_i divided by price p_i is the same across all goods i at the optimum — is one of microeconomics' core results. It is just Lagrange multiplier optimisation in disguise, and partial derivatives carry the action.
Where partial derivatives appear
- Partial differential equations. The heat equation, wave equation, Laplace equation, Schrödinger equation, Navier–Stokes, Maxwell's equations — every PDE is built from partial derivatives.
- Optimisation. Critical points of f(x, y, ...) satisfy ∂f/∂x = ∂f/∂y = ... = 0. The Hessian (matrix of second partials) classifies them as max, min, or saddle.
- Vector calculus. Gradient, divergence, curl, Laplacian — all are expressions in partial derivatives.
- Machine learning. Backpropagation computes partials of the loss with respect to every parameter via the chain rule.
- Physics. Lagrangian and Hamiltonian mechanics use ∂L/∂q and ∂L/∂q̇ to derive Euler–Lagrange equations. Thermodynamic state functions (U, H, F, G) are filled with partials at constant pressure, constant volume, etc.
- Engineering. Stress, strain, flux, current density — every continuum-mechanics quantity satisfies field equations written in partial derivatives.
- Economics and finance. Marginal utility, marginal cost, the Greeks of options pricing (Δ, Γ, Θ, ν, ρ are partials of option price).
Common mistakes
- Mixing up partial and total derivatives. Especially common in thermodynamics. ∂U/∂T at constant volume is one quantity; at constant pressure another; dU/dT depends on what is changing as you change T.
- Forgetting to specify what is held fixed. A partial derivative is meaningless without saying which variables are constant. Thermodynamics writes (∂U/∂V)_T to mean "at constant T," because the same notation in a different problem could mean "at constant pressure."
- Treating ∂x as if it were a number. Unlike Leibniz's d, the symbol ∂ is not a literal differential — you cannot "cancel" the ∂y in (∂z/∂y)(∂y/∂x). The total derivative chain rule survives this game; the partial chain rule does not in general.
- Differentiating the wrong variable. When y depends on x, ∂f/∂x and ∂f/∂x|_y vary by the chain-rule term (∂f/∂y)(dy/dx). Sketch the dependency graph before computing.
- Concluding differentiability from existence of partials. All partials existing at a point does not imply f is differentiable there. You need continuity of the partials (or a more careful linearity argument).
- Forgetting metric factors in non-Cartesian coordinates. ∂/∂θ in polar/cylindrical/spherical coordinates is a derivative per radian, not per metre. Convert with arc-length scales before comparing to physical rates.
- Confusing ∂ with d for one-variable functions. If f depends on x alone, ∂f/∂x and df/dx coincide — but writing ∂ where d is conventional looks wrong and signals a misunderstanding of the multivariable distinction.
Frequently asked questions
What is the difference between a partial derivative and a total derivative?
A partial derivative ∂f/∂x treats every other variable as a fixed constant — it ignores any indirect dependencies. A total derivative df/dx accounts for indirect dependencies through the chain rule. Example: f(x, y) where y itself depends on x via y = g(x). Then ∂f/∂x = (the derivative treating y as constant) but df/dx = ∂f/∂x + (∂f/∂y)(dy/dx). Confusing the two is one of the most common errors in thermodynamics and economics, where many variables depend on many others.
Are mixed partial derivatives always equal?
Almost always. Clairaut's theorem (also called Schwarz's theorem) states that ∂²f/∂x∂y = ∂²f/∂y∂x whenever both mixed partials exist and are continuous in a neighborhood. The continuity hypothesis matters: pathological functions exist where the order of differentiation matters. The standard counterexample is f(x, y) = xy(x²−y²)/(x²+y²) with f(0,0)=0, where ∂²f/∂x∂y(0,0) = −1 and ∂²f/∂y∂x(0,0) = +1 because the second-order partials are discontinuous at the origin. In practice — for any function you can write down with elementary or smooth pieces — mixed partials are equal.
Why does the heat equation involve a partial derivative in time and a second partial in space?
The heat equation ∂u/∂t = α ∇²u models how temperature u(x, y, z, t) evolves. The left side is the time-rate of change of temperature at a fixed point. The right side is the Laplacian — the difference between u at a point and the average u in a small neighborhood. If your point is colder than its neighbors, ∇²u > 0 and u rises (heat flows in); if hotter, ∇²u < 0 and u falls. Partial derivatives separate the time and space contributions because temperature really does depend on both, and changes in time and changes in space are physically different.
What does ∂ mean and how do I pronounce it?
The symbol ∂ (a stylized lowercase 'd') is sometimes called 'partial,' sometimes 'del,' sometimes 'dee.' It denotes a partial derivative, as opposed to d for ordinary derivative. The symbol was introduced by Adrien-Marie Legendre in 1786 and popularized by Jacobi in 1841. Pronounce it 'partial' or just 'd' depending on context — 'partial f partial x' or 'd f d x' both work.
How does the chain rule work for multivariable functions?
If z = f(x, y) and both x = g(t) and y = h(t) depend on t, then dz/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt). Each ∂f/∂x · dx/dt term is the contribution of one path from t to z; the formula sums all paths. For multiple intermediate variables, the rule generalises to a sum over paths, equivalent to multiplying the Jacobian matrices. Backpropagation in neural networks is repeated application of this rule across the layers of a computation graph.
What is the geometric interpretation of ∂f/∂x?
Imagine the graph of f(x, y) as a surface in 3-D. Slicing the surface with the plane y = y₀ produces a curve in the xz-plane. The slope of this curve at the point (x₀, y₀) is ∂f/∂x at that point. Similarly slicing with x = x₀ produces a curve whose slope is ∂f/∂y. Each partial derivative is a slope along one axis; the gradient bundles them into a 2-D vector pointing in the direction of steepest ascent of the full surface.
Can a function have all partial derivatives but not be differentiable?
Yes. The existence of all partials only describes behavior along the coordinate axes. A function is differentiable at a point only if it can be linearly approximated in every direction simultaneously — that requires the partials to be continuous (sufficient condition) or a stronger linearity condition. The classic example is f(x, y) = xy/(x²+y²) extended by f(0,0)=0: both ∂f/∂x(0,0) and ∂f/∂y(0,0) exist and equal zero, but f is not even continuous at the origin (let alone differentiable) because f(t, t) = 1/2 for any t ≠ 0.