Convex Analysis

Convex Set and Convex Hull

The set that contains every segment between its points — and the smallest one wrapping a given cloud

A convex set has no notch — every segment between its points stays inside. Its hull wraps N points minimally and is computable in O(N log N) in 2D.

Convexityx, y ∈ S, t ∈ [0,1] ⇒ (1−t)x + ty ∈ S
Convex hullSmallest convex set containing P
2D complexityO(N log N) — Graham scan
3D complexityO(N log N) — qhull standard
Carathéodory≤ n+1 points needed per combination
Closed underIntersection, scaling, affine maps

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Everyday examples of convexity

Convex and non-convex shapes are everywhere if you know what to look for. The interior of a sphere is convex; the surface of the sphere is not. A potato chip is non-convex along its curl; the wrapper enclosing it can be drawn as a convex region around its centroid. Star-shaped buildings (the Pentagon, a snowflake) are non-convex. The footprint of a typical residential lot is convex; a U-shaped office building's footprint is not.

Beyond geometry, abstract objects can also be convex sets. The set of probability distributions on a finite outcome space (the probability simplex) is convex. The set of doubly stochastic matrices (the Birkhoff polytope) is convex. The set of positive semidefinite matrices is a convex cone. The set of valid quantum states (density matrices) is the convex hull of pure states. These abstract convex sets are the natural state spaces for probability, optimisation and quantum information.

Why care about convexity at all

Optimisation is fundamentally easier on convex sets. That single observation drives the structural distinction between convex programs (LPs, QPs, semidefinite programs) — which have global polynomial-time algorithms — and general nonlinear or combinatorial problems, which often need exponential search.

The Boyd-Vandenberghe slogan: the great divide in optimisation is between convex and non-convex, not between linear and nonlinear. Convex non-linear programs (entropy maximisation, kernel ridge regression, semidefinite programs) are tractable; non-convex linear programs do not exist (linear programs are convex by definition); and non-convex programs in general can be NP-hard. Knowing whether your feasible set is convex is the first question, and answering it usually requires checking whether each defining inequality is convex.

The definition in one picture

A subset S of ℝⁿ is convex if for every pair of points x, y in S and every scalar t ∈ [0, 1], the point (1−t)x + ty is also in S. Geometrically: every segment between two points of S stays in S. A disk, a half-plane, a triangle, a hyperplane, the whole space — all convex. A crescent moon, a doughnut, a 5-pointed star — not convex; you can find two points whose connecting segment leaves the set.

Equivalent formulations:

Convex combinations. S is convex iff Σλᵢ xᵢ ∈ S whenever every xᵢ ∈ S, every λᵢ ≥ 0, and Σλᵢ = 1. A finite convex combination is a weighted average that sums to 1; convexity is closure under those.
Intersection of half-spaces. Every closed convex set is the intersection of all closed half-spaces containing it. Polyhedra are the special case of finite intersections.
Epigraph of a convex function. The set {(x, t) ∈ ℝⁿ × ℝ : f(x) ≤ t} is convex iff f is a convex function. So set-convexity and function-convexity are two views of the same structure.

Convexity is preserved by surprisingly many operations: intersection, scaling, translation, affine image and preimage, Cartesian product, Minkowski sum (A ⊕ B = {a + b : a ∈ A, b ∈ B}), and projection. These closure properties are what let you build complex convex sets out of simple ones — for instance, polytopes from half-spaces, second-order cones from quadratic constraints, semidefinite cones from matrix inequalities.

The convex hull and why it is unique

The convex hull conv(P) of an arbitrary set P ⊂ ℝⁿ is the smallest convex set containing P. Two equivalent definitions:

Intersection definition. conv(P) = ∩{C : C convex, C ⊇ P}. The arbitrary intersection of convex sets is convex, so this intersection is convex and is by construction the smallest such set.
Combinatorial definition. conv(P) = {Σλᵢ xᵢ : xᵢ ∈ P, λᵢ ≥ 0, Σλᵢ = 1, finite sum}. All convex combinations of finite subsets of P.

The two definitions agree because convexity is exactly closure under convex combinations. For a finite point set P = {p₁, …, p_N} ⊂ ℝ², the convex hull is a convex polygon with at most N vertices. The hull vertices are the extreme points of conv(P): points that cannot be written as a non-trivial convex combination of other points in the hull. Carathéodory's theorem says every point of conv(P) in ℝⁿ is a convex combination of at most n+1 elements of P.

Computing the convex hull in 2D

Three classical algorithms for N points in the plane.

Graham scan (Ronald Graham, 1972). Sort points by polar angle around the lowest point. Walk through them, maintaining a stack of hull candidates; pop any point that makes a right turn relative to the previous two. The remaining stack is the upper hull; repeat for the lower hull. Sort dominates at O(N log N), the scan itself is O(N).
Andrew's monotone chain (1979). Sort by x-coordinate (then y as tiebreaker). Build the lower hull left-to-right with the same right-turn-pop logic; then the upper hull right-to-left. Concatenate. Same O(N log N), cleaner to implement than Graham scan because there is no polar-angle sort.
Jarvis march / gift wrapping (1973). Start at the leftmost point. Repeatedly select the next hull vertex as the point making the smallest counter-clockwise turn relative to the previous edge. Continues until you return to the start. O(N·H) where H is the number of hull vertices — fast when H is small (mostly interior points), slow when nearly all points are on the hull.

Chan's algorithm (Timothy Chan, 1996) combines Graham and Jarvis to achieve O(N log H), proven optimal in the algebraic decision-tree model. For 3D, Preparata-Hong (divide-and-conquer) and incremental construction both achieve O(N log N) and the de facto standard implementation is qhull. In dimension d ≥ 4 the worst-case output size is O(N^⌊d/2⌋), which is what limits explicit hull computation beyond moderate N.

Worked example — hull of 6 points

Take P = {(0,0), (4,0), (4,3), (1,1), (0,3), (2,2)} in ℝ². Run Andrew's monotone chain.

Step 1. Sort by (x, y) ascending: (0,0), (0,3), (1,1), (2,2), (4,0), (4,3).

Step 2 — lower hull, left to right. Push (0,0). Push (0,3); now stack is [(0,0), (0,3)]. Next (1,1): the turn (0,0) → (0,3) → (1,1) is a right turn (clockwise) using cross product, so pop (0,3). Stack [(0,0)]. Push (1,1). Next (2,2): turn (0,0) → (1,1) → (2,2) is collinear (cross = 0); we discard collinear points from the strict hull. Pop (1,1). Stack [(0,0)]. Push (2,2). Next (4,0): turn (0,0) → (2,2) → (4,0) is a right turn; pop (2,2). Stack [(0,0)]. Push (4,0). Stack [(0,0), (4,0)]. Last (4,3): turn (0,0) → (4,0) → (4,3) is a left turn (CCW). Push. Lower hull: (0,0), (4,0), (4,3).

Step 3 — upper hull, right to left. By symmetric procedure, the upper hull comes out (4,3), (0,3), (0,0). Concatenate (drop the duplicate endpoints): hull = (0,0), (4,0), (4,3), (0,3) — the four corners of the bounding rectangle. The interior points (1,1), (2,2) are not hull vertices, confirming the geometry — they sit strictly inside.

This particular six-point example is small enough to verify by inspection: the 4×3 rectangle is exactly the smallest convex set containing all six points.

Hull algorithms side by side

Algorithm	Worst case	Best/realistic	Dimension	Output-sensitive?
Jarvis march (gift wrapping)	O(N²)	O(N·H)	2D	Yes
Graham scan	O(N log N)	O(N log N)	2D	No
Andrew's monotone chain	O(N log N)	O(N log N)	2D	No
Chan's algorithm	O(N log H)	O(N log H)	2D & 3D	Yes (optimal)
Preparata-Hong divide-and-conquer	O(N log N)	O(N log N)	2D, 3D	No
Incremental (qhull standard)	O(N^⌊d/2⌋)	O(N log N) typical 3D	Any d	No
QuickHull	O(N²) worst, but rare	O(N log N) average	Any d	No

For the typical N up to a few hundred thousand in 2D, Andrew's chain at O(N log N) wins on simplicity and constants. For 3D and beyond, qhull's incremental construction is the universal default in scientific libraries (SciPy, MATLAB, MATLAB's convhull, R's convhulln all dispatch to qhull internally).

The separating hyperplane theorem

The deepest fact about convex sets, and the one that powers most of duality, optimisation and machine learning, is the separating hyperplane theorem. Given two disjoint non-empty convex sets A, B in ℝⁿ, there exists a non-zero vector a and a scalar b with a·x ≤ b for x ∈ A and a·x ≥ b for x ∈ B. Geometrically, a flat hyperplane can be slid between them so that each set lies on one side.

Variants tighten this. If both sets are closed and at least one is compact, the hyperplane can be made strictly separating (a·x < b on A, a·x > b on B). If a set is closed convex and a point lies outside it, there is a hyperplane separating the point from the set — the supporting hyperplane theorem. These are the ingredients of LP duality (the separating hyperplane between feasible cone and improvement cone), of the Hahn-Banach theorem in functional analysis (separating a subspace from a point), and of support vector machines (the maximum-margin separator between two classes is, by definition, a separating hyperplane).

Where convexity earns its keep

Optimisation. Convex programs (convex objective, convex feasible set) have no spurious local minima — every local minimum is global. Linear, quadratic, semidefinite, and conic programs are all convex and solved reliably by polynomial-time interior-point methods.
Machine learning — SVMs. Support vector machines compute a maximum-margin separating hyperplane between two classes. The optimisation is a convex quadratic program; the kernel trick is a clean way to lift non-convex data into a higher-dimensional space where it becomes linearly separable.
Computational geometry. Convex hulls are the backbone primitive: nearest-neighbour searches use Voronoi diagrams (dual to Delaunay triangulations, which are 'lift to a paraboloid then convex hull'); collision detection uses GJK on hulls of point clouds; shape matching aligns convex hulls.
Robotics — configuration spaces. Free-space planning is easier on convex regions; the inflated robot footprint plus obstacle Minkowski sum produces convex C-space cells that motion planners exploit.
Game theory. Mixed-strategy Nash equilibria are fixed points of convex-set-valued maps; Kakutani's fixed-point theorem (a generalisation of Brouwer's) gives existence by leveraging convexity.
Image processing. Convex hulls are used in shape analysis — solidity = (area of object) / (area of hull) is a standard descriptor of shape compactness. Morphological closing approximates the convex hull cheaply.
Geographic information systems (GIS). The minimum-bounding convex polygon of a point cloud is the 'home range' in ecology, the 'convex footprint' in urban planning, the 'buffer zone' in geofencing. Standard GIS toolkits (PostGIS, GeoPandas) ship convex-hull operators that wrap qhull.
Compressed sensing. Sparse signal recovery via L¹ minimisation succeeds because the L¹ ball is the convex hull of the signed coordinate vectors — minimising L¹ over a convex constraint set drives the solution to a hull vertex, which is a sparse signal.

Common mistakes

Confusing convex set and convex function. A convex set is a region; a convex function is a graph. They are linked through the epigraph: f convex ⟺ epigraph of f is a convex set. Treating them as interchangeable produces nonsense.
Computing the hull as the bounding box. The hull is the bounding polygon, not the bounding rectangle. For thin diagonal point clouds the bounding box can be enormous compared to the hull. Always run a hull algorithm.
Missing collinear point handling. When three consecutive points are collinear, deciding whether to include the middle one matters for some applications (mesh boundaries) and not for others (hull vertices only). Pick a convention and document it.
Trying to compute hulls in 5+ dimensions. Worst-case output size is O(N^⌊d/2⌋), which means N = 100 in 6D can have 10¹² facets. For high-dimensional convexity questions, work algebraically (linear programs, polytopes by face descriptions) rather than enumerating hull faces.
Floating-point degeneracy. Hull algorithms decide left/right turns by cross products that can underflow to zero on near-collinear input. Use exact arithmetic (rational, or interval + filter) for hulls of nearly-degenerate point clouds — qhull provides robust modes.
Assuming the hull is the optimisation feasible region. The hull of feasible points is in general a strict subset of the feasible region for continuous problems; the optimum of a linear objective over a convex set lies on the hull's boundary but is not always at a vertex unless the set is a polytope.
Conflating closed and open convex sets. An open convex set has no boundary points; a closed convex set includes its boundary. Many theorems (separating hyperplane, supporting hyperplane) require closedness. Using them on open sets without checking can silently produce wrong conclusions.
Believing convex hull is monotone in inclusion. conv(A ∪ B) ⊇ conv(A) ∪ conv(B), but the reverse inclusion fails: even two disjoint segments can have a hull containing points in neither original convex hull. Union of convex sets is not generally convex.

Extreme points and Krein-Milman

An extreme point of a convex set C is one that cannot be written as a non-trivial convex combination of two distinct points of C. The vertices of a polygon are extreme; interior points of an edge are not; centres of disks are not (they are obvious midpoints of diameters). For a compact convex set in ℝⁿ, the Krein-Milman theorem (1940) says C is the closed convex hull of its extreme points. Geometrically: every compact convex set is recoverable from its corners, no matter how curvy the boundary.

This theorem is the workhorse behind optimisation duality and many existence proofs in analysis. The Banach-Alaoglu theorem combined with Krein-Milman shows that the closed unit ball of a dual Banach space has extreme points — used in the existence of extreme states in quantum mechanics, extreme stochastic matrices in Markov chains (the Birkhoff polytope of doubly stochastic matrices has the permutation matrices as its extreme points), and the existence of optimal vertices in LP.

Carathéodory's theorem and its quantitative form

Carathéodory's theorem (Constantin Carathéodory, 1907) says: in ℝⁿ, every point of conv(P) is a convex combination of at most n + 1 elements of P. So conv(P) of an enormous point set still admits a bound on how many points each interior member 'sees'. Helly's theorem (1923) is the dual statement for intersections, and Radon's theorem the partitioning version — all three are foundational pieces of combinatorial convex geometry.

The quantitative version (the approximate Carathéodory of Barman 2015) replaces 'at most n+1' with 'at most O(R² / ε²)' for ε-approximations of points in a hull of diameter R. The bound is dimension-free, which makes it the right tool for high-dimensional convex problems where n + 1 is too many. Barman's theorem powers fast Frank-Wolfe-style algorithms and ε-approximate Nash equilibria in games.

Convex cones and conic duality

A convex cone is a convex set closed under non-negative scaling: x ∈ C and λ ≥ 0 imply λx ∈ C. Cones come up everywhere in optimisation. The non-negative orthant {x ∈ ℝⁿ : x ≥ 0} is the cone of LP feasibility. The Lorentz cone {(x, t) : ‖x‖ ≤ t} is the cone of second-order cone programming. The positive semidefinite cone is the cone of semidefinite programming. The exponential cone, the power cone, and the dual cones of each of these define the rest of modern conic optimisation.

Conic duality generalises LP duality: a conic primal min c·x subject to Ax = b, x ∈ K has dual max b·y subject to c − Aᵀy ∈ K°, where K° is the dual cone. Strong duality holds under conic generalisations of Slater's condition. SOCP, SDP, and exponential-cone optimisation all share the same conic-duality structure, which makes a single solver (Mosek, ECOS, SDPT3) capable of handling diverse problem classes by exposing a unified conic API.

Polytopes vs general convex sets

A polytope is the convex hull of a finite point set, equivalently the bounded intersection of a finite number of half-spaces. Two descriptions, the same object: V-representation (vertices) vs H-representation (half-spaces). Converting between them — vertex enumeration and facet enumeration — is the central problem of polytope algorithmics, NP-hard in worst case but tractable for many structured polytopes.

For general (non-polytope) convex sets, the natural descriptions are different: support functions (h(d) = sup_{x ∈ S} d · x), gauge functions, or characteristic functions. Many algorithms (Frank-Wolfe, mirror descent, projected gradient) work on convex sets specified by an oracle that, given a direction, returns the support value or a maximising point. This abstraction lets you optimise over astronomically large polytopes (matchings, spanning trees, perfect bipartite matchings) by implementing only the oracle, never enumerating vertices.

Polar duality of convex sets

Every convex set has a polar: for a convex set C ⊂ ℝⁿ containing the origin, C° = {y : x · y ≤ 1 for all x ∈ C}. The polar of the unit ℓ¹-ball is the unit ℓ^∞-ball; the polar of the unit ball in any norm is the unit ball in the dual norm. Polar duality reverses inclusion: A ⊂ B ⟹ B° ⊂ A°. For closed convex sets containing the origin, (C°)° = C — the polar of the polar is the original set, a beautiful duality.

The polar dual is the right framework for LP duality, support vector machines (the polar of the margin polytope is the slack polytope), and online learning. Many ML algorithms can be cast in polar dual form, where the original problem's primal variables become the dual's constraint normals — a generalisation of the LP primal/dual pairing to arbitrary convex sets.

High-dimensional weirdness

Intuition built in 2D and 3D fails in high dimensions. A few startling facts about convex sets in ℝⁿ as n → ∞:

The unit ball in ℝⁿ has volume tending to zero as n → ∞ (peaks around n = 5, then collapses).
Almost all the volume of a high-dimensional ball sits in a thin shell near its boundary (concentration of measure).
Two random unit vectors in ℝⁿ are nearly orthogonal with high probability — most pairs have inner product O(1/√n).
The convex hull of 2ⁿ corners of the unit cube is exponentially smaller than the cube itself in 'middle' regions.
Random points sampled from a convex body cluster near the boundary, not the centre.

These phenomena are not pathological corner cases — they describe what happens in realistic high-dimensional optimisation, statistical learning, and quantum information. Algorithm design in high dimensions exploits or works around these effects deliberately (e.g. random projections, Johnson-Lindenstrauss, locality-sensitive hashing).

Costed claims

Convex hull of N points in 2D: O(N log N) by Graham scan, Andrew's monotone chain, or Chan's algorithm; in 3D also O(N log N) via qhull. Carathéodory: every interior point is a convex combination of at most n+1 vertices in ℝⁿ. Strong duality holds under Slater's condition (strictly feasible interior). KKT: Lagrangian gradient = 0 + λ·g(x) = 0 (slackness), necessary and sufficient under convexity. Jensen's inequality (log E[X] ≥ E[log X] for the concave version) and Cauchy-Schwarz (equality iff u, v collinear) are the analytic siblings of the convex-set story.

Frequently asked questions

What's the difference between a convex set and a convex function?

A convex SET S satisfies: x, y ∈ S and t ∈ [0,1] imply (1−t)x + ty ∈ S — the segment between any two points lies inside the set. A convex FUNCTION f satisfies: f((1−t)x + ty) ≤ (1−t)f(x) + t f(y) — the graph lies below every chord. The two are linked: f is convex iff its epigraph {(x, t) : f(x) ≤ t} is a convex set. Function-convexity is set-convexity of the region above the graph.

How fast can I compute a convex hull?

In 2D, optimal worst-case is O(N log N) by Graham scan (1972), Andrew's monotone chain (1979), or divide-and-conquer. Chan's algorithm (1996) is output-sensitive at O(N log H) where H is the number of hull vertices — best when H is small. In 3D the worst-case is still O(N log N) and the de facto standard implementation is qhull. In dimension d ≥ 4 the worst-case is O(N^⌊d/2⌋), and computing the hull explicitly becomes infeasible for moderate N.

Is a half-plane convex?

Yes. A half-plane {x : a·x ≤ b} contains every segment between its members: if a·x ≤ b and a·y ≤ b, then for t ∈ [0,1], a·((1−t)x + ty) = (1−t)(a·x) + t(a·y) ≤ b. The intersection of finitely many half-planes is a convex polytope — the feasible region of every LP. Convexity is closed under intersection, so all the standard 'set operations' (intersection, scaling, Minkowski sum, image under affine maps) preserve it.

Why does convexity matter for optimisation?

On a convex feasible set with a convex objective, every local minimum is automatically a global minimum. Proof: if x* is local but not global, take a strictly better point y; the segment from x* to y is inside the set by convexity, and the objective along it is below max(f(x*), f(y)) by convexity of f — so x* has nearby points with lower value, contradicting local minimality. Convex problems have no local-minimum traps.

What is the separating hyperplane theorem?

For two disjoint non-empty convex sets A and B in ℝⁿ, there exists a hyperplane H = {x : a·x = b} such that a·x ≤ b for x ∈ A and a·x ≥ b for x ∈ B. Geometrically, a flat sheet can be slid between them. This is the engine behind LP duality, Farkas' lemma, and the support vector machine — and it works only because A and B are convex.

What's a convex combination?

A convex combination of points x₁, …, x_k is a sum λ₁ x₁ + … + λ_k x_k with λᵢ ≥ 0 and Σ λᵢ = 1. The convex hull of a set P is exactly the set of all convex combinations of finite subsets of P. Carathéodory's theorem (1907) says that in ℝⁿ you never need more than n+1 points: every point of conv(P) is a convex combination of at most n+1 elements of P.

How is a convex hull built incrementally?

Sort points by x (then by y as a tiebreaker), giving an O(N log N) sort. Walk left-to-right building the lower hull: push each point onto a stack and, while the top three points make a non-left turn, pop. Then walk right-to-left for the upper hull. The two chains concatenate into the full hull boundary in O(N) post-sort work. Total cost O(N log N). This is Andrew's monotone chain, the cleanest 2D hull algorithm to implement.