Numerical Analysis
Secant Method
Root finding by two-point linear approximation — superlinear convergence at golden-ratio order without needing a derivative
The secant method solves f(x) = 0 by drawing the secant line through the last two iterates and using its x-intercept as the next guess. It is Newton's method with f'(x_n) replaced by a finite-difference slope (f(x_n) − f(x_{n-1}))/(x_n − x_{n-1}). Convergence order is the golden ratio φ ≈ 1.618 — superlinear, faster than bisection's order 1, slower than Newton's 2, and requires no derivative.
- Iterationx_{n+1} = x_n − f(x_n)·(x_n−x_{n-1})/(f(x_n)−f(x_{n-1}))
- Convergence orderφ = (1+√5)/2 ≈ 1.618
- Per step1 new f-evaluation
- Needs derivativeNo
- Starting dataTwo initial guesses x_0, x_1
- Production formBrent's method (secant + bisection)
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The two-point secant idea
Newton's method requires f'(x). When the derivative is unknown, expensive, or undefined, you can approximate it from two recent function values:
f'(x_n) ≈ (f(x_n) − f(x_{n-1})) / (x_n − x_{n-1})
Plug this approximation into Newton's update x_{n+1} = x_n − f(x_n)/f'(x_n) and you get the secant iteration:
x_{n+1} = x_n − f(x_n) · (x_n − x_{n-1}) / (f(x_n) − f(x_{n-1}))
Geometrically: draw the secant line through (x_{n-1}, f(x_{n-1})) and (x_n, f(x_n)). It crosses the x-axis at x_{n+1}. Then slide the window — drop x_{n-1}, keep x_n, and use x_{n+1} as the new latest point. Each step needs one new function evaluation (for f(x_{n+1})); the old value f(x_n) gets reused.
Why the order is the golden ratio
Let r be a simple root and e_n = x_n − r the error at iteration n. Taylor-expand f around r and substitute into the iteration formula. After algebra, the error obeys
e_{n+1} ≈ C · e_n · e_{n-1} where C = f''(r) / (2 f'(r))
Assume convergence at some power p: |e_n| ∼ A |e_{n-1}|^p. Then |e_{n+1}| ∼ A |e_n|^p, and substituting into the error relation:
A |e_n|^p ∼ C · |e_n| · |e_{n-1}|
Using |e_{n-1}| ∼ |e_n|^{1/p} from the assumption and substituting:
|e_n|^p ∼ |e_n| · |e_n|^{1/p} = |e_n|^{1 + 1/p}
Equating exponents: p = 1 + 1/p, i.e. p² − p − 1 = 0. The positive root is
p = (1 + √5) / 2 = φ ≈ 1.6180339887...
The same characteristic equation that produces the Fibonacci sequence controls the convergence of secant iteration — both descend from the recurrence relation x_n = x_{n-1} + x_{n-2}. So secant's error at iteration n+1 is roughly the (1.618)-th power of e_n: better than linear (where p = 1), worse than quadratic (p = 2), but per evaluation often the best you can do without derivatives.
Worked example: computing √2 (again)
Solve f(x) = x² − 2 = 0. Start with two initial guesses x_0 = 1 and x_1 = 2. The true root is √2 ≈ 1.41421356237.
n x_n f(x_n) secant slope digits correct
─ ─────────── ─────────── ────────────── ──────────────
0 1.000000000 -1.000000 — 0
1 2.000000000 2.000000 3.000000000 0
2 1.333333333 -0.222222 (-1 - 2)/(1 - 2) 1 (using x_0, x_1)
3 1.400000000 -0.040000 0.666666667 2
4 1.413793103 -0.001188 0.733333333 3
5 1.414215686 0.000006 0.804597701 5
6 1.414213562 -7.6e-12 0.821428571 11
7 1.414213562 0 0.828429157 16 (machine precision)
Eight iterations to reach machine precision from a deliberately weak start. Each step gains roughly 1.618 digits — half-way between linear (one per step) and quadratic (doubling). Compare to Newton's method on the same problem, which reaches machine precision in five iterations starting from x_0 = 1 alone.
The cost-adjusted comparison favours secant when f' is expensive. Suppose computing f' costs three times as much as computing f. Then Newton at five iterations costs 5 × (1 + 3) = 20 evaluation-equivalents; secant at eight iterations costs 8 × 1 + 1 (for the initial second evaluation) = 9 evaluation-equivalents — less than half. For automatic-differentiation in a tight loop with hand-written analytical derivatives, the calculus reverses, but for problems where f' is awkward (table lookups, simulation outputs, finite-difference approximations), secant routinely wins.
Bisection vs secant vs Newton
| Bisection | Secant | Newton | Regula falsi | Brent | Halley | |
|---|---|---|---|---|---|---|
| Convergence order | 1 (linear) | φ ≈ 1.618 | 2 (quadratic) | 1 worst case, φ best | order φ generic, 1 fallback | 3 (cubic) |
| Per-step cost | 1 f | 1 f | 1 f + 1 f' | 1 f | 1 f | 1 f + 1 f' + 1 f'' |
| Needs derivative | No | No | Yes | No | No | Yes (f and f'') |
| Needs bracket | Yes | No | No | Yes | Yes | No |
| Guaranteed convergence | Yes | Local | Local | Yes (slow) | Yes | Local |
| Iterations for 10⁻¹⁵ | ~50 | ~7 | ~5 | 10-100 (variable) | ~7 | ~4 |
| Best when | Need certainty | f' is expensive | f and f' are cheap, good start | Need safety + derivative-free | Production default | High-order derivatives cheap |
The hierarchy: bisection is safe but slow; Newton is fast but fragile; secant sits between. Brent's method (used inside SciPy's brentq, MATLAB's fzero, and most production root-finders) gets the best of both — bisection's guarantee with secant's speed — by maintaining a bracket and falling back to bisection whenever a secant or inverse-quadratic step would step outside the bracket.
Regula falsi: secant with a safety net
Pure secant has no guarantee of convergence: the iterates can wander off if the secant slope sends the next point far from the root. The simplest robust variant is regula falsi ("rule of false position"), known since antiquity. Maintain two points a, b with f(a)·f(b) < 0 — that is, the root lies in [a, b]. Compute the secant intersection
c = a − f(a) · (b − a) / (f(b) − f(a))
Then replace either a or b with c so that the sign change is preserved. Unlike pure secant, regula falsi always brackets the root — but it can stall: one endpoint can stay fixed for many iterations, dropping convergence from φ to linear. The Illinois algorithm (Dowell & Jarratt 1971) halves the function value at the stagnant endpoint after every iteration to restore superlinear convergence.
Variants like Anderson-Björck and Pegasus push further, with provable order φ. These are the secant-family methods inside production hybrid root finders like Brent and TOMS748.
Brent's method: the production default
Brent's method (Richard Brent 1973) is the universal root-finder: scipy.optimize.brentq, MATLAB fzero, Boost C++ brent_find_minima, NumPy numpy.roots for special cases. Inputs: a continuous function f, a bracket [a, b] with f(a)·f(b) < 0, and tolerance ε.
At each step, Brent tries the most aggressive technique available:
- Inverse quadratic interpolation when three iterates are available — fits an inverse polynomial x(f) to f-values 0 and the three latest pairs.
- Secant interpolation when inverse-quadratic is not applicable.
- Bisection if the proposed step would fall outside the bracket or fails a safety test.
The bracket is maintained throughout, so convergence is guaranteed for any continuous function — and the order is close to φ in practice. The implementation is about 100 lines of code that has dominated 1D root finding for fifty years.
Where the secant method shows up
- Financial mathematics — yield-to-maturity. Computing the yield r that satisfies present value Σ CF_t / (1 + r)^t = price requires root finding. Excel's
YIELD(), Bloomberg's price-to-yield conversions, and trading-system bond calculators all use secant or Brent inside. - Thermodynamics — saturation properties. Solving for saturation temperature from saturation pressure (Antoine equation, NIST refprop) is a one-dimensional root problem. ASPEN, Aspen HYSYS, and gPROMS process simulators use secant-family root finders inside their equation-of-state evaluators.
- Power systems — power flow. The Newton-Raphson power flow is standard, but when the Jacobian is expensive or ill-conditioned, secant updates (Broyden) are used as a fallback. MATPOWER, PSS/E, and PowerWorld all expose Broyden as an option.
- Numerical PDE solvers. Implicit time stepping for stiff PDEs reduces to a nonlinear system at each step. When Jacobian computation is prohibitive (large 3D problems), Broyden or limited-memory secant updates dominate. CVODE, IDA, Sundials, PETSc.
- Engineering optimisation. Inside a bracketing line-search for nonlinear optimisation (BFGS, conjugate gradient, trust-region), the one-dimensional minimisation step uses a secant-based interpolation. Standard in NLOPT, IPOPT, and ANSYS optimisation modules.
- Adaptive numerical integration. Determining quadrature node positions that satisfy orthogonality conditions (e.g., Gauss-Legendre nodes are roots of Legendre polynomials) uses Newton or secant on the polynomial evaluations.
Variants and extensions
- Regula falsi (false position). Maintains a bracket containing the root and uses the secant within the bracket. Always converges, but can stall at linear rate without a "deflated endpoint" trick (Illinois, Anderson-Björck, Pegasus).
- Muller's method. Fits a quadratic through the last three iterates and uses its closest root to the latest x_n as the next guess. Order ≈ 1.84 (the real root of p³ − p² − p − 1 = 0). Handles complex roots naturally; used in polynomial root-finders.
- Inverse quadratic interpolation. Fits x as a quadratic in f using three (x, f) pairs and evaluates the polynomial at f = 0. Used inside Brent's method; order ≈ 1.84.
- Broyden's method. The multivariate analogue of secant. Approximates the Jacobian by a rank-1 update from the change in F across the last iterate. Superlinear convergence; standard in dense nonlinear-equation solvers.
- Limited-memory Broyden / Anderson acceleration. Use the last few iterates to construct a quasi-secant update without storing a full Jacobian. Used in fixed-point acceleration for self-consistent field iterations (DFT, Hartree-Fock).
Common pitfalls
- Division by f(x_n) − f(x_{n-1}). Near a multiple root or after a long convergence, the two function values can be nearly equal, blowing up the secant slope. Guard against this by checking |f(x_n) − f(x_{n-1})| against a tolerance and falling back to bisection or restarting with a fresh pair of guesses.
- Diverging from a poor start. Pure secant has no bracket and no guarantee of staying near the root. Always wrap secant with bracketing logic for production use — Brent's method is the canonical solution.
- Slow convergence at multiple roots. At a root of multiplicity k > 1, both Newton and secant drop from their respective superlinear orders to linear (rate 1 − 1/k). If you suspect multiplicity, the modified secant x_{n+1} = x_n − k·f(x_n)/secant_slope restores superlinear convergence.
- Re-evaluating f instead of caching. The point of secant is to reuse f(x_n) from the previous step. A naive implementation that re-evaluates f doubles the cost, defeating secant's advantage over Newton. Always cache the previous f.
- Stopping criteria. Stopping at |x_{n+1} − x_n| < ε can be fooled by oscillation; stopping at |f(x_n)| < ε can be fooled by ill-conditioning. The robust approach uses both criteria together, with a sanity check on the bracket width.
Why secant earns its place
Newton is the textbook fastest method, bisection the textbook safest. The secant method is the practical compromise: fast enough to be useful, simple enough to implement, and — critically — derivative-free. Whenever computing f' is more expensive than computing f (which is most of the time in real engineering and finance), the secant family runs faster per evaluation than Newton even at lower convergence order. The Brent algorithm that powers every production 1D root finder is essentially secant with bracketing safety; the Broyden family that drives quasi-Newton nonlinear-equation solvers is essentially secant in multiple dimensions. The golden-ratio convergence order is the price of being derivative-free — and the secant method's enduring popularity is the price the world is willing to pay for not having to differentiate f by hand.
Frequently asked questions
Why is the convergence order the golden ratio?
Let e_n = x_n − r be the error. A Taylor analysis near a simple root r shows e_{n+1} ≈ C · e_n · e_{n-1} for a constant C = f''(r)/(2f'(r)). Assume convergence at some power p: |e_{n+1}| ∼ A |e_n|^p. Substituting gives |e_n|^p ∼ |e_n| · |e_{n-1}|. Using |e_n| ∼ |e_{n-1}|^p again: |e_{n-1}|^{p²} ∼ |e_{n-1}|^{1+p}, so p² = 1 + p — exactly the equation defining the golden ratio. Solving: p = (1 + √5)/2 = φ ≈ 1.618. The same algebraic structure that produces Fibonacci numbers controls secant-method convergence.
How does the secant method compare to Newton's method?
Newton has order 2 (quadratic), secant has order φ ≈ 1.618 (superlinear). Per iteration, Newton needs one f-evaluation and one f'-evaluation; secant needs only one new f-evaluation (the older one is reused). When f' is expensive to compute (numerical derivative, complicated formula, automatic differentiation in a tight loop), secant can be the faster method overall despite needing more iterations. The cost-adjusted convergence order — accounting for the work per step — favours secant when computing f' costs more than computing f. Brent's method combines secant with bisection to get safety plus speed.
When does the secant method fail or stall?
Three common failure modes. First, division by zero: if f(x_n) = f(x_{n-1}) the secant is horizontal and the formula produces NaN. This happens with multiple roots (where f flattens) or with poor starting guesses. Second, no convergence: the secant iterates can wander away from the root for non-convex f or with starting points far from the root. Third, near multiple roots, convergence drops from order φ to order 1 (linear), just like Newton's method. The robust fix is bracketing — Brent's method combines secant with bisection by maintaining an interval [a, b] with f(a)·f(b) < 0 and falling back to bisection whenever the secant step would jump outside the bracket.
What is the relationship between secant and regula falsi (false position)?
Regula falsi is the bracketed version of secant. Maintain two points a, b with f(a)·f(b) < 0 (i.e., bracketing a root). Compute the secant x-intercept c = a − f(a)(b − a)/(f(b) − f(a)). Choose the new bracket: if f(c) and f(a) have the same sign, replace a with c; otherwise replace b. Unlike pure secant, regula falsi always brackets the root and never diverges — but it can stall at linear convergence when one endpoint stays fixed (the 'stagnant endpoint' problem). The Illinois variant and Anderson-Björck variants adjust function values at the stagnant endpoint to restore superlinear convergence.
How is the secant method implemented in production code?
Rarely as pure secant — it can diverge. In SciPy, scipy.optimize.newton with no fprime argument falls back to secant. In production root-finders, the secant step is one ingredient of Brent's method, the default in scipy.optimize.brentq, MATLAB's fzero, Mathematica's FindRoot, and Boost's brent_find_minima. Brent combines bisection (always-safe), secant (fast superlinear), and inverse quadratic interpolation (even faster when applicable). The result is guaranteed convergence on any continuous f with a sign change, with secant-like speed on smooth problems.
Can the secant method be generalised to multiple dimensions?
Yes — the analogue is Broyden's method, a quasi-Newton scheme. Instead of computing the Jacobian J each iteration (expensive), Broyden approximates it from the change in F across the last two iterates: B_{n+1} = B_n + (Δf − B_n·Δx)·(Δx)ᵀ / (Δx·Δx). The update preserves a secant-like condition B_{n+1}·Δx = Δf in the most recent direction. Broyden's method converges superlinearly under standard hypotheses, just like secant in 1D, but with order between 1 and 2 depending on the dimension and the problem. Used in nonlinear-equation solvers when computing the Jacobian is impractical.
Why does the secant method need only one new evaluation per step?
The iteration uses x_n, x_{n-1}, f(x_n), and f(x_{n-1}). After computing x_{n+1}, the values x_n and f(x_n) become the 'previous' pair for the next iteration — only x_{n+1} and f(x_{n+1}) are new. So one f-evaluation per step after the initialisation (which costs two: f(x_0) and f(x_1)). Newton's method needs both f(x_n) and f'(x_n) at each step. When f' is expensive — say it requires numerical differentiation by additional f-evaluations, or an automatic-differentiation pass through complex code — secant's single-evaluation cost dominates Newton's two-evaluation cost in real running time even though Newton has higher theoretical convergence order.