CONTENTS

Law of Large Numbers

Motivation: why this matters in quant finance

Every Monte Carlo price, every backtest mean, every empirical volatility estimate, every time a risk manager says "we ran 10610^6 paths and got a stable answer" — all of it rests on the Law of Large Numbers (LLN). The LLN is the formal statement that averaging large numbers of independent draws from a distribution gives you back the true mean of that distribution. Without it, Monte Carlo pricing would produce nothing more than expensive random numbers.
The LLN is also the bedrock of frequentist probability itself. When we say "the probability of heads is 0.50.5," what we operationally mean is: if we flip the coin a huge number of times, the proportion of heads will approach 0.50.5. That statement is the LLN applied to indicator random variables. Every p-value, every long-run VaR exceedance rate, every bookmaker's implicit break-even calculation comes from this.
But the LLN has a sharper edge than its sloganeering suggests. It tells you the sample mean converges, but says nothing about how fast — that job falls to the Central Limit Theorem. And it fails silently for distributions without a mean (Cauchy, certain heavy-tailed Pareto). Knowing when the LLN applies — and the precise sense in which it applies (in probability vs. almost surely) — is the difference between a Monte Carlo scheme that works and one that only looks like it does.

The informal idea

Take an i.i.d. sequence X1,X2,X_1, X_2, \ldots with finite mean μ\mu. Form the running sample mean:

Xˉn=1ni=1nXi.\bar X_n = \frac{1}{n}\sum_{i=1}^n X_i.

The LLN says Xˉnμ\bar X_n \to \mu as nn \to \infty. The "\to" hides a choice of convergence mode. The two versions are:

  • Weak LLN: Xˉnμ\bar X_n \to \mu in probability: P(Xˉnμ>ϵ)0\mathbb{P}(|\bar X_n - \mu| > \epsilon) \to 0 for every ϵ>0\epsilon > 0. At any large but fixed nn, the sample mean is almost certainly close to μ\mu, but we cannot make claims about the whole trajectory {Xˉn}n\{\bar X_n\}_n.
  • Strong LLN: Xˉnμ\bar X_n \to \mu almost surely: P(limnXˉn=μ)=1\mathbb{P}(\lim_{n\to\infty} \bar X_n = \mu) = 1. The entire trajectory converges, not just its marginals at each fixed nn.

The strong LLN implies the weak LLN, but not conversely. For finite-mean i.i.d. sequences, both hold — but the weak version can sometimes be proved under weaker moment conditions.

Why independence and why finite mean

Finite mean is essential. If E[X]=\mathbb{E}[|X|] = \infty the LLN fails. The Cauchy distribution is the canonical failure: it has no mean, and Xˉn\bar X_n itself is Cauchy for every nn, so it converges to nothing.
Independence can be weakened. The LLN holds under much weaker conditions than i.i.d. — pairwise uncorrelated is enough for the L2L^2 weak LLN (Chebyshev's LLN); asymptotic independence is enough for ergodic processes; martingale difference sequences have their own LLN. But some form of asymptotic independence is always required — the LLN fails catastrophically for a sequence that is the same random variable over and over (Xi=X1X_i = X_1 for all ii gives Xˉn=X1\bar X_n = X_1 forever).

Formal statement

Weak Law (Chebyshev's form — finite variance)

Let X1,X2,X_1, X_2, \ldots be pairwise uncorrelated random variables (not necessarily identically distributed) with common mean μ\mu and uniformly bounded variance Var(Xi)σ2<\operatorname{Var}(X_i) \le \sigma^2 < \infty. Then

XˉnPμ.\bar X_n \xrightarrow{\mathbb{P}} \mu.
Proof (one line): By Chebyshev's inequality,
P(Xˉnμ>ϵ)Var(Xˉn)ϵ2=σ2/nϵ20.\mathbb{P}(|\bar X_n - \mu| > \epsilon) \le \frac{\operatorname{Var}(\bar X_n)}{\epsilon^2} = \frac{\sigma^2/n}{\epsilon^2} \to 0.

Because Var(Xˉn)=σ2/n\operatorname{Var}(\bar X_n) = \sigma^2/n — variance shrinks as 1/n1/n — Chebyshev delivers the result immediately.

Strong Law (Kolmogorov's form — i.i.d., finite mean)

Let X1,X2,X_1, X_2, \ldots be i.i.d. with finite mean μ=E[X1]\mu = \mathbb{E}[X_1] (no variance requirement). Then

Xˉna.s.μ.\bar X_n \xrightarrow{\text{a.s.}} \mu.

This is a substantially deeper result. The standard proofs (Etemadi's proof, Kolmogorov's proof via a truncation argument and the Borel-Cantelli lemma) require careful handling of the infinite tail of trajectories — not just the behaviour at any fixed nn.

Chebyshev for indicators: the frequentist interpretation of probability

Apply the weak LLN to indicator variables Xi=1AiX_i = \mathbf{1}_{A_i} where AiA_i are i.i.d. events each occurring with probability pp. Then E[Xi]=p\mathbb{E}[X_i] = p, and:

1ni=1n1AiPp.\frac{1}{n}\sum_{i=1}^n \mathbf{1}_{A_i} \xrightarrow{\mathbb{P}} p.
The empirical frequency of the event converges to its probability. This is the justification for every "we ran 10,000 simulations and saw the event happen 153 times, so its probability is 0.0153" calculation.

Rate of convergence — the bridge to the CLT

The LLN says "Xˉnμ\bar X_n \to \mu" but says nothing about how fast. The natural question is: at scale nn, how far is Xˉn\bar X_n from μ\mu?

The variance is Var(Xˉn)=σ2/n\operatorname{Var}(\bar X_n) = \sigma^2/n, so the typical fluctuation is of order σ/n\sigma/\sqrt n. Subtract the mean, multiply by n\sqrt n, and you expose the O(1)O(1) fluctuation — which by the Central Limit Theorem is Gaussian. The CLT is the LLN's second-order correction: LLN says Xˉnμ0\bar X_n - \mu \to 0; CLT says n(Xˉnμ)σZ\sqrt n\,(\bar X_n - \mu) \to \sigma Z with ZN(0,1)Z \sim \mathcal{N}(0, 1).

Practically: if you need your Monte Carlo estimator to be within ϵ\epsilon of the true value with 95% confidence, you need approximately n(1.96σ/ϵ)2n \approx (1.96\,\sigma/\epsilon)^2 samples. Halving ϵ\epsilon quadruples the sample budget. The LLN gives you convergence; the CLT gives you the compute cost.

Worked examples

Example 1 — Monte Carlo option price

Price a European call with strike KK on a log-normal underlying STS_T under the risk-neutral measure. The LLN says that for i.i.d. simulated payoffs Yi=erTmax(ST(i)K,0)Y_i = e^{-rT}\max(S_T^{(i)} - K, 0) with EQ[Yi]=C\mathbb{E}^{\mathbb{Q}}[Y_i] = C (the true option price):
1Ni=1NYia.s.C.\frac{1}{N}\sum_{i=1}^N Y_i \xrightarrow{\text{a.s.}} C.

With σY2\sigma_Y^2 the payoff variance, the standard error at NN samples is σY/N\sigma_Y/\sqrt N. At N=106N = 10^6, σY=10\sigma_Y = 10, the CI width is 0.02\approx 0.02. To get another decimal of precision you need N=108N = 10^8, which is why variance-reduction techniques (antithetic variates, control variates) are so valuable — they reduce σY\sigma_Y without increasing NN.

Example 2 — Binomial bank: seeing the LLN happen

# Python: run 10,000 independent streams of i.i.d. Bernoulli(0.5), # plot the running average for 5 sample streams. import numpy as np import matplotlib.pyplot as plt rng = np.random.default_rng(0) n_streams = 5 n_flips = 10_000 for _ in range(n_streams): flips = rng.integers(0, 2, size=n_flips) # 0 or 1 with prob 0.5 each running_avg = np.cumsum(flips) / np.arange(1, n_flips + 1) plt.plot(running_avg, alpha=0.7) plt.axhline(0.5, color='k', linestyle='--') plt.xscale('log') plt.xlabel('n'); plt.ylabel('running average') # All 5 streams visibly squeeze toward 0.5; the spread at n=100 is about ±0.05, # at n=1000 about ±0.015, at n=10000 about ±0.005 — the sqrt(n) decay from CLT.

Example 3 — Heavy tails slow the LLN down

# Python: running mean of a heavy-tailed (Pareto) distribution # converges to the true mean — but slowly. import numpy as np rng = np.random.default_rng(0) alpha = 2.5 # Pareto shape; mean exists (alpha > 1), variance exists (alpha > 2) true_mean = alpha / (alpha - 1) # = 5/3 ≈ 1.667 N = 10**6 X = (1 - rng.random(N))**(-1/alpha) # standard Pareto(alpha) running_avg = np.cumsum(X) / np.arange(1, N + 1) for checkpoint in [100, 1_000, 10_000, 100_000, 1_000_000]: print(f"n={checkpoint:8d}: mean={running_avg[checkpoint - 1]:.4f} (true={true_mean:.4f})") # n= 100: mean=1.7803 (true=1.6667) # n= 1000: mean=1.6342 (true=1.6667) # n= 10000: mean=1.6687 (true=1.6667) # n= 100000: mean=1.6713 (true=1.6667) # n=1000000: mean=1.6688 (true=1.6667)

The Pareto(α=2.5)(\alpha = 2.5) has finite mean (α>1\alpha > 1) and finite variance (α>2\alpha > 2), so the strong LLN applies. But the running average bounces around — big individual draws shift it — and only by n=104n = 10^4 is the estimate stably near the truth. For α\alpha closer to 11 (say α=1.1\alpha = 1.1) the variance is infinite, Chebyshev doesn't apply, and convergence is excruciatingly slow (though still P(Xˉnμ)=1\mathbb{P}(\bar X_n \to \mu) = 1 in the a.s. sense).

Common confusions and pitfalls

"The LLN says Xˉn\bar X_n becomes exactly μ\mu eventually." No — convergence is asymptotic. At any finite nn, Xˉn\bar X_n has a non-trivial distribution; the LLN says the mass of that distribution concentrates near μ\mu, not that Xˉn\bar X_n equals μ\mu.
"If I flip a fair coin 100 times and get 60 heads, the next flips are more likely to be tails to 'restore balance'." This is the gambler's fallacy and it is precisely what the LLN does not say. The LLN asserts that the proportion of heads converges to 1/21/2 over increasing nn — it does so by adding new flips that are each 50/50, not by pushing the past proportion down. The law of averages is not a law of physics with memory; independence is assumed throughout.
"The LLN guarantees my Monte Carlo is accurate." It guarantees your Monte Carlo converges. Accuracy at a given nn is controlled by the CLT: the typical error is σ/n\sigma/\sqrt n, not zero. Don't trust a Monte Carlo estimate without its confidence interval.
"Finite mean is enough." For the strong LLN (Kolmogorov), yes. For the weak LLN via Chebyshev, you also need finite variance. For sequences that aren't i.i.d. (e.g. pairwise uncorrelated, ergodic, stationary), additional conditions may be needed. The clean "finite mean \Rightarrow LLN" rule is specifically for i.i.d.
"The LLN works for the median too." It does — the sample median converges to the population median under the same mild conditions. But the LLN as stated is for linear statistics (sums, means). The convergence of quantile estimators follows from a different result (Glivenko-Cantelli) and has a different rate.
"The LLN is about one number converging." It is about a sequence of random variables (Xˉn)n(\bar X_n)_n converging to a constant. The distinction between "convergence in probability" (weak) and "almost sure convergence" (strong) is a statement about this whole trajectory, not about any single Xˉn\bar X_n.

Where this goes next

  • Central Limit Theorem: The natural continuation — describes the fluctuation around the limit given by the LLN. Together they are the two pillars of asymptotic statistics.
  • Monte Carlo Pricing (basic): The LLN is the correctness statement; Monte Carlo is the algorithm. Every variance-reduction technique (antithetic variates, control variates, importance sampling) reduces the CLT-scale fluctuation without changing what the LLN delivers.
  • Moment Generating Functions: Used to establish large deviations — the probability that Xˉn\bar X_n is very far from μ\mu decays exponentially (Cramér's theorem), far faster than Chebyshev's 1/n1/n rate suggests.
  • Martingales (Discrete Time): Martingale convergence theorems generalise the strong LLN to dependent sequences. A martingale MnM_n with bounded LpL^p norm converges a.s. to a limit MM_\infty.
  • Ergodic Theorem: The LLN for stationary sequences — time averages equal space averages. Fundamental for time-series estimation and for the calibration of any stationary model.

Exercises

Test your understanding with 3 exercises for this lesson.