Probability

Expected Value

The long-run average of a random outcome — weighted sum of values by probabilities

The expected value of a random variable is the probability-weighted average of its possible outcomes — what you'd get on average over many trials. E[X] = Σ x · P(X = x). Used to evaluate gambles, design insurance products, optimize decisions under uncertainty, and define every probability concept that follows it (variance, covariance, conditional expectation).

  • Discrete formulaE[X] = ∑ x · P(X = x)
  • Continuous formulaE[X] = ∫ x · f(x) dx
  • LinearityE[aX + bY] = a·E[X] + b·E[Y] (always; independence not required)
  • VarianceVar(X) = E[(X − E[X])²] = E[X²] − E[X]²
  • Used inInsurance, gambling, ML, finance, decision theory
  • Often misinterpretedA typical value (median is more typical for skewed distributions)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The definition

For a discrete random variable X taking values x₁, x₂, ... with probabilities p₁, p₂, ...:

E[X] = ∑ xᵢ · pᵢ

For a continuous random variable with probability density f(x):

E[X] = ∫ x · f(x) dx

Either way, you weight each possible value by its probability and sum (or integrate). The result is the long-run average over many independent realizations of X.

Worked examples

Example 1 — fair six-sided die

X is the result of one roll. Possible values 1, 2, 3, 4, 5, 6 each with probability 1/6.

E[X] = (1+2+3+4+5+6)/6 = 21/6 = 3.5

Average over many rolls is 3.5. Notice — 3.5 is not even possible to roll. "Expected" is the long-run mean, not the typical single outcome.

Example 2 — coin flip with $1 win/$1 loss

P(heads) = 1/2, win $1; P(tails) = 1/2, lose $1.

E[X] = (1)(0.5) + (-1)(0.5) = 0

Fair game — average gain is zero. Over many flips, you break even (with variance — sometimes ahead, sometimes behind).

Example 3 — biased coin

P(heads) = 0.6, win $2; P(tails) = 0.4, lose $1.

E[X] = (2)(0.6) + (-1)(0.4) = 1.2 - 0.4 = $0.80 per flip

Positive expected value — over many flips, you gain $0.80 on average per flip. After 1000 flips, expected total winnings is $800.

Example 4 — lottery

$2 ticket. Jackpot $10 million with probability 1 in 100 million. (Smaller prizes ignored for simplicity.)

E[winnings] = 10,000,000 · 0.00000001 = $0.10
E[net] = $0.10 - $2 = -$1.90 per ticket

Each ticket loses $1.90 on average. The lottery is a tax on people bad at math (Voltaire's term, paraphrased). The expected dollar value is negative; the expected utility may be positive (the small dream of winning is itself worth something to many players).

Linearity of expectation

For any random variables X and Y (independent or not):

E[aX + bY] = a · E[X] + b · E[Y]

This is one of the most powerful identities in probability. It works without independence — even strongly correlated X and Y have E[X + Y] = E[X] + E[Y]. The variance, by contrast, requires independence — Var(X + Y) = Var(X) + Var(Y) only if X and Y are independent.

Linearity is the backbone of combinatorial probability. Want the expected number of fixed points in a random permutation of n elements? Define indicator variables I_k = 1 if k is fixed, 0 otherwise. E[I_k] = 1/n. Sum over k — E[number of fixed points] = n · 1/n = 1. Linearity gives this in one line; computing the actual distribution would take pages.

Variance and standard deviation

Expected value tells you the long-run average; variance measures the spread around it:

Var(X) = E[(X − E[X])²]
       = E[X²] − E[X]²       (computational form)

Standard deviation σ = √Var(X) is in the same units as X. For the fair die, Var = E[X²] − 3.5² = (1²+2²+...+6²)/6 − 12.25 = 91/6 − 12.25 ≈ 2.917; σ ≈ 1.71.

About 68% of values fall within 1σ of the mean for a normal distribution; broader for non-normal. The "spread" Var captures the typical deviation from the average.

Conditional expectation

The expected value of X given that some event A occurred:

E[X | A] = ∑ x · P(X = x | A)

For continuous variables, integrate against the conditional density. The Law of Total Expectation:

E[X] = E[E[X | Y]]

Compute the expected value of X conditional on Y, then take the expected value of that over Y. Useful when conditioning makes calculation easier — split a hard problem into cases based on Y, compute conditional expectation in each case, then average.

JavaScript — computing expected values

// Discrete expected value
function expectedDiscrete(values, probabilities) {
  let sum = 0;
  for (let i = 0; i < values.length; i++) {
    sum += values[i] * probabilities[i];
  }
  return sum;
}

// Fair die
const die = expectedDiscrete([1,2,3,4,5,6], Array(6).fill(1/6));
console.log(die);  // 3.5

// Lottery example
const lotteryNet = expectedDiscrete([10_000_000, 0], [0.00000001, 0.99999999]) - 2;
console.log(lotteryNet);  // -1.9

// Empirical estimate from samples
function empiricalMean(samples) {
  return samples.reduce((s, x) => s + x, 0) / samples.length;
}

// Simulate 1 million die rolls; converges to 3.5
const samples = Array.from({length: 1_000_000}, () => Math.floor(Math.random() * 6) + 1);
console.log(empiricalMean(samples));  // ≈ 3.500 (Law of Large Numbers)

Where expected value shows up

  • Insurance. Premiums are set so E[claims] is less than E[premiums], with margin for operating costs and profit. Without expected-value math, insurance companies would go broke.
  • Gambling and games. Casinos design games with negative expected value for the player. The "house edge" is exactly E[house gain per dollar wagered]. Some games (poker, sports betting) can have positive expected value for skilled players.
  • Finance. Expected return of an investment, expected loss of a portfolio. Modern portfolio theory (Markowitz) optimizes expected return for given risk (variance).
  • Reinforcement learning. The action-value function Q(s, a) is the expected discounted reward starting from state s with action a. The whole RL framework is built on optimizing expected returns.
  • Algorithm analysis. Average-case time complexity is the expected number of operations over random inputs. Quicksort's "expected O(n log n)" is a statement about expected operation count.
  • Decision theory. Pick the action with highest expected utility. Foundation of microeconomics, game theory, and rational choice models.
  • Statistics. Maximum likelihood, Bayesian inference, hypothesis testing — all involve computing expectations over data distributions.

Common mistakes

  • Treating expected value as the most likely value. They're different. The mean of a die is 3.5, which is impossible to roll. The mean of a heavy-tailed distribution can be extremely far from the median or mode.
  • Confusing expected value with utility. $1M with certainty vs $0 or $2.5M with 50% each — same expected dollar value, but most people prefer the certain $1M (risk aversion). Expected utility theory captures this; expected value alone doesn't.
  • Assuming linearity for variance. E[X+Y] = E[X] + E[Y] always. Var(X+Y) = Var(X) + Var(Y) ONLY when X and Y are independent. Forgetting this gives wrong variance estimates for correlated variables.
  • Not accounting for finite samples. Empirical mean from finite samples differs from true expected value. Sample size needs to be large enough; standard error decreases as 1/√n.
  • Ignoring infinite expected values. Some distributions (Cauchy, t-distribution with 1 degree of freedom) have undefined or infinite expected value. The integral defining E[X] doesn't converge. Means from such samples don't converge with sample size.
  • Forgetting the Law of Large Numbers requires independence. Sample mean → true mean only if samples are i.i.d. (or at least uncorrelated). Highly autocorrelated samples (like time-series) converge much slower or not at all.

Frequently asked questions

What does "expected value" mean if it's not what to expect?

It's the long-run average over many trials, not the most likely single value. For a fair die, E[X] = 3.5 — but you'll never roll a 3.5. The expected value is the average over many rolls, by the Law of Large Numbers. For a skewed distribution (like income), the expected value can differ wildly from the median; "expected" is misleading shorthand for "long-run mean."

When is E[X] not a useful summary?

When the distribution is heavily skewed or has heavy tails. Income distributions — mean is much larger than median, dominated by a few extreme earners. Distribution of crashes vs incremental gains — small frequent gains plus rare huge losses can have positive expected value but bankrupt you (St. Petersburg paradox, ergodicity). For decision-making under such conditions, expected utility (concave function of wealth) often beats expected dollar value.

Why is E[aX + bY] = a·E[X] + b·E[Y] always true (linearity)?

Because expectation is just an integral (or sum). Integrals distribute over linear combinations. Even when X and Y are dependent or correlated, this holds — it's a much stronger property than independence-based results. Linearity of expectation is the fundamental tool of combinatorics — count expected number of "good" outcomes by adding indicator variable expectations.

How is expected value related to variance?

Variance is the expected squared deviation from the mean — Var(X) = E[(X − E[X])²]. It measures spread around the average. The two together (mean and variance) characterize a normal distribution completely; for general distributions, higher moments (skewness, kurtosis) are also needed. The standard deviation σ = √Var has the same units as X.

What's the law of large numbers?

As you sample more, the sample mean approaches the true expected value. Specifically — the average of n i.i.d. samples X₁, ..., Xₙ converges to E[X] as n → ∞ (in probability — weak LLN; almost surely — strong LLN). Foundation of statistics, casinos, and insurance — short-term variance washes out over many trials.

Why does the casino always win?

Each game has expected value slightly favorable to the house. Over many bets, the law of large numbers guarantees the house's bottom line approaches its mathematical edge. Individual gamblers might win occasionally (variance), but in aggregate over many players and games, the house's expected profit per dollar wagered is realized. Casino profitability is engineered through expected value math.

How is expected utility different from expected value?

Expected value averages dollar outcomes. Expected utility averages utility (a concave function of dollar outcomes). Concave utility means you value $200 less than 2× the utility of $100 — diminishing marginal returns. Expected utility theory predicts risk-averse behavior — buying insurance, declining fair gambles. Pioneered by von Neumann and Morgenstern (1944).