Law of Large Numbers

Motivation: why this matters in quant finance

Every Monte Carlo price, every backtest mean, every empirical volatility estimate, every time a risk manager says "we ran

10^6

paths and got a stable answer" — all of it rests on the Law of Large Numbers (LLN). The LLN is the formal statement that averaging large numbers of independent draws from a distribution gives you back the true mean of that distribution. Without it, Monte Carlo pricing would produce nothing more than expensive random numbers.

The LLN is also the bedrock of frequentist probability itself. When we say "the probability of heads is

0.5

," what we operationally mean is: if we flip the coin a huge number of times, the proportion of heads will approach

0.5

. That statement is the LLN applied to indicator random variables. Every p-value, every long-run VaR exceedance rate, every bookmaker's implicit break-even calculation comes from this.

But the LLN has a sharper edge than its sloganeering suggests. It tells you the sample mean converges, but says nothing about how fast — that job falls to the Central Limit Theorem. And it fails silently for distributions without a mean (Cauchy, certain heavy-tailed Pareto). Knowing when the LLN applies — and the precise sense in which it applies (in probability vs. almost surely) — is the difference between a Monte Carlo scheme that works and one that only looks like it does.

The informal idea

Take an i.i.d. sequence $X_1, X_2, \ldots$ with finite mean $\mu$ . Form the running sample mean:

\bar X_n = \frac{1}{n}\sum_{i=1}^n X_i.

The LLN says $\bar X_n \to \mu$ as $n \to \infty$ . The " $\to$ " hides a choice of convergence mode. The two versions are:

Weak LLN: $\bar X_n \to \mu$ in probability: $\mathbb{P}(|\bar X_n - \mu| > \epsilon) \to 0$ for every $\epsilon > 0$ . At any large but fixed $n$ , the sample mean is almost certainly close to $\mu$ , but we cannot make claims about the whole trajectory $\{\bar X_n\}_n$ .
Strong LLN: $\bar X_n \to \mu$ almost surely: $\mathbb{P}(\lim_{n\to\infty} \bar X_n = \mu) = 1$ . The entire trajectory converges, not just its marginals at each fixed $n$ .

The strong LLN implies the weak LLN, but not conversely. For finite-mean i.i.d. sequences, both hold — but the weak version can sometimes be proved under weaker moment conditions.

Why independence and why finite mean

Finite mean is essential. If

\mathbb{E}[|X|] = \infty

the LLN fails. The Cauchy distribution is the canonical failure: it has no mean, and

\bar X_n

itself is Cauchy for every

n

, so it converges to nothing.

Independence can be weakened. The LLN holds under much weaker conditions than i.i.d. — pairwise uncorrelated is enough for the

L^2

weak LLN (Chebyshev's LLN); asymptotic independence is enough for ergodic processes; martingale difference sequences have their own LLN. But some form of asymptotic independence is always required — the LLN fails catastrophically for a sequence that is the same random variable over and over (

X_i = X_1

for all

i

gives

\bar X_n = X_1

forever).

Formal statement

Weak Law (Chebyshev's form — finite variance)

Let $X_1, X_2, \ldots$ be pairwise uncorrelated random variables (not necessarily identically distributed) with common mean $\mu$ and uniformly bounded variance $\operatorname{Var}(X_i) \le \sigma^2 < \infty$ . Then

\bar X_n \xrightarrow{\mathbb{P}} \mu.

Proof (one line): By Chebyshev's inequality,

\mathbb{P}(|\bar X_n - \mu| > \epsilon) \le \frac{\operatorname{Var}(\bar X_n)}{\epsilon^2} = \frac{\sigma^2/n}{\epsilon^2} \to 0.

Because $\operatorname{Var}(\bar X_n) = \sigma^2/n$ — variance shrinks as $1/n$ — Chebyshev delivers the result immediately.

Strong Law (Kolmogorov's form — i.i.d., finite mean)

Let $X_1, X_2, \ldots$ be i.i.d. with finite mean $\mu = \mathbb{E}[X_1]$ (no variance requirement). Then

\bar X_n \xrightarrow{\text{a.s.}} \mu.

This is a substantially deeper result. The standard proofs (Etemadi's proof, Kolmogorov's proof via a truncation argument and the Borel-Cantelli lemma) require careful handling of the infinite tail of trajectories — not just the behaviour at any fixed $n$ .

Chebyshev for indicators: the frequentist interpretation of probability

Apply the weak LLN to indicator variables $X_i = \mathbf{1}_{A_i}$ where $A_i$ are i.i.d. events each occurring with probability $p$ . Then $\mathbb{E}[X_i] = p$ , and:

\frac{1}{n}\sum_{i=1}^n \mathbf{1}_{A_i} \xrightarrow{\mathbb{P}} p.

The empirical frequency of the event converges to its probability. This is the justification for every "we ran 10,000 simulations and saw the event happen 153 times, so its probability is 0.0153" calculation.

Rate of convergence — the bridge to the CLT

The LLN says " $\bar X_n \to \mu$ " but says nothing about how fast. The natural question is: at scale $n$ , how far is $\bar X_n$ from $\mu$ ?

The variance is

\operatorname{Var}(\bar X_n) = \sigma^2/n

, so the typical fluctuation is of order

\sigma/\sqrt n

. Subtract the mean, multiply by

\sqrt n

, and you expose the

O(1)

fluctuation — which by the Central Limit Theorem is Gaussian. The CLT is the LLN's second-order correction: LLN says

\bar X_n - \mu \to 0

; CLT says

\sqrt n\,(\bar X_n - \mu) \to \sigma Z

with

Z \sim \mathcal{N}(0, 1)

Practically: if you need your Monte Carlo estimator to be within $\epsilon$ of the true value with 95% confidence, you need approximately $n \approx (1.96\,\sigma/\epsilon)^2$ samples. Halving $\epsilon$ quadruples the sample budget. The LLN gives you convergence; the CLT gives you the compute cost.

Worked examples

Example 1 — Monte Carlo option price

Price a European call with strike

K

on a log-normal underlying

S_T

under the risk-neutral measure. The LLN says that for i.i.d. simulated payoffs

Y_i = e^{-rT}\max(S_T^{(i)} - K, 0)

with

\mathbb{E}^{\mathbb{Q}}[Y_i] = C

(the true option price):

\frac{1}{N}\sum_{i=1}^N Y_i \xrightarrow{\text{a.s.}} C.

With $\sigma_Y^2$ the payoff variance, the standard error at $N$ samples is $\sigma_Y/\sqrt N$ . At $N = 10^6$ , $\sigma_Y = 10$ , the CI width is $\approx 0.02$ . To get another decimal of precision you need $N = 10^8$ , which is why variance-reduction techniques (antithetic variates, control variates) are so valuable — they reduce $\sigma_Y$ without increasing $N$ .

Example 2 — Binomial bank: seeing the LLN happen

# Python: run 10,000 independent streams of i.i.d. Bernoulli(0.5),
# plot the running average for 5 sample streams.
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)
n_streams = 5
n_flips = 10_000

for _ in range(n_streams):
    flips = rng.integers(0, 2, size=n_flips)       # 0 or 1 with prob 0.5 each
    running_avg = np.cumsum(flips) / np.arange(1, n_flips + 1)
    plt.plot(running_avg, alpha=0.7)

plt.axhline(0.5, color='k', linestyle='--')
plt.xscale('log')
plt.xlabel('n'); plt.ylabel('running average')
# All 5 streams visibly squeeze toward 0.5; the spread at n=100 is about ±0.05,
# at n=1000 about ±0.015, at n=10000 about ±0.005 — the sqrt(n) decay from CLT.

Example 3 — Heavy tails slow the LLN down

# Python: running mean of a heavy-tailed (Pareto) distribution
# converges to the true mean — but slowly.
import numpy as np

rng = np.random.default_rng(0)
alpha = 2.5  # Pareto shape; mean exists (alpha > 1), variance exists (alpha > 2)
true_mean = alpha / (alpha - 1)   # = 5/3 ≈ 1.667

N = 10**6
X = (1 - rng.random(N))**(-1/alpha)   # standard Pareto(alpha)
running_avg = np.cumsum(X) / np.arange(1, N + 1)

for checkpoint in [100, 1_000, 10_000, 100_000, 1_000_000]:
    print(f"n={checkpoint:8d}:  mean={running_avg[checkpoint - 1]:.4f}  (true={true_mean:.4f})")
# n=     100:  mean=1.7803  (true=1.6667)
# n=   1000:  mean=1.6342  (true=1.6667)
# n=  10000:  mean=1.6687  (true=1.6667)
# n= 100000:  mean=1.6713  (true=1.6667)
# n=1000000:  mean=1.6688  (true=1.6667)

The Pareto $(\alpha = 2.5)$ has finite mean ( $\alpha > 1$ ) and finite variance ( $\alpha > 2$ ), so the strong LLN applies. But the running average bounces around — big individual draws shift it — and only by $n = 10^4$ is the estimate stably near the truth. For $\alpha$ closer to $1$ (say $\alpha = 1.1$ ) the variance is infinite, Chebyshev doesn't apply, and convergence is excruciatingly slow (though still $\mathbb{P}(\bar X_n \to \mu) = 1$ in the a.s. sense).

Common confusions and pitfalls

"The LLN says $\bar X_n$ becomes exactly $\mu$ eventually." No — convergence is asymptotic. At any finite

n

\bar X_n

has a non-trivial distribution; the LLN says the mass of that distribution concentrates near

\mu

, not that

\bar X_n

equals

\mu

"If I flip a fair coin 100 times and get 60 heads, the next flips are more likely to be tails to 'restore balance'." This is the gambler's fallacy and it is precisely what the LLN does not say. The LLN asserts that the proportion of heads converges to

1/2

over increasing

n

— it does so by adding new flips that are each 50/50, not by pushing the past proportion down. The law of averages is not a law of physics with memory; independence is assumed throughout.

"The LLN guarantees my Monte Carlo is accurate." It guarantees your Monte Carlo converges. Accuracy at a given

n

is controlled by the CLT: the typical error is

\sigma/\sqrt n

, not zero. Don't trust a Monte Carlo estimate without its confidence interval.

"Finite mean is enough." For the strong LLN (Kolmogorov), yes. For the weak LLN via Chebyshev, you also need finite variance. For sequences that aren't i.i.d. (e.g. pairwise uncorrelated, ergodic, stationary), additional conditions may be needed. The clean "finite mean

\Rightarrow

LLN" rule is specifically for i.i.d.

"The LLN works for the median too." It does — the sample median converges to the population median under the same mild conditions. But the LLN as stated is for linear statistics (sums, means). The convergence of quantile estimators follows from a different result (Glivenko-Cantelli) and has a different rate.

"The LLN is about one number converging." It is about a sequence of random variables

(\bar X_n)_n

converging to a constant. The distinction between "convergence in probability" (weak) and "almost sure convergence" (strong) is a statement about this whole trajectory, not about any single

\bar X_n

Where this goes next

Central Limit Theorem: The natural continuation — describes the fluctuation around the limit given by the LLN. Together they are the two pillars of asymptotic statistics.
Monte Carlo Pricing (basic): The LLN is the correctness statement; Monte Carlo is the algorithm. Every variance-reduction technique (antithetic variates, control variates, importance sampling) reduces the CLT-scale fluctuation without changing what the LLN delivers.
Moment Generating Functions: Used to establish large deviations — the probability that $\bar X_n$ is very far from $\mu$ decays exponentially (Cramér's theorem), far faster than Chebyshev's $1/n$ rate suggests.
Martingales (Discrete Time): Martingale convergence theorems generalise the strong LLN to dependent sequences. A martingale $M_n$ with bounded $L^p$ norm converges a.s. to a limit $M_\infty$ .
Ergodic Theorem: The LLN for stationary sequences — time averages equal space averages. Fundamental for time-series estimation and for the calibration of any stationary model.