Every Monte Carlo price, every backtest mean, every empirical volatility estimate, every time a risk manager says "we ran 106 paths and got a stable answer" — all of it rests on the Law of Large Numbers (LLN). The LLN is the formal statement that averaging large numbers of independent draws from a distribution gives you back the true mean of that distribution. Without it, Monte Carlo pricing would produce nothing more than expensive random numbers.
The LLN is also the bedrock of frequentist probability itself. When we say "the probability of heads is 0.5," what we operationally mean is: if we flip the coin a huge number of times, the proportion of heads will approach 0.5. That statement is the LLN applied to indicator random variables. Every p-value, every long-run VaR exceedance rate, every bookmaker's implicit break-even calculation comes from this.
But the LLN has a sharper edge than its sloganeering suggests. It tells you the sample mean converges, but says nothing about how fast — that job falls to the Central Limit Theorem. And it fails silently for distributions without a mean (Cauchy, certain heavy-tailed Pareto). Knowing when the LLN applies — and the precise sense in which it applies (in probability vs. almost surely) — is the difference between a Monte Carlo scheme that works and one that only looks like it does.
The informal idea
Take an i.i.d. sequence X1,X2,… with finite mean μ. Form the running sample mean:
Xˉn=n1i=1∑nXi.
The LLN says Xˉn→μ as n→∞. The "→" hides a choice of convergence mode. The two versions are:
Weak LLN:Xˉn→μin probability: P(∣Xˉn−μ∣>ϵ)→0 for every ϵ>0. At any large but fixed n, the sample mean is almost certainly close to μ, but we cannot make claims about the whole trajectory {Xˉn}n.
Strong LLN:Xˉn→μalmost surely: P(limn→∞Xˉn=μ)=1. The entire trajectory converges, not just its marginals at each fixed n.
The strong LLN implies the weak LLN, but not conversely. For finite-mean i.i.d. sequences, both hold — but the weak version can sometimes be proved under weaker moment conditions.
Why independence and why finite mean
Finite mean is essential. If E[∣X∣]=∞ the LLN fails. The Cauchy distribution is the canonical failure: it has no mean, and Xˉn itself is Cauchy for every n, so it converges to nothing.
Independence can be weakened. The LLN holds under much weaker conditions than i.i.d. — pairwise uncorrelated is enough for the L2 weak LLN (Chebyshev's LLN); asymptotic independence is enough for ergodic processes; martingale difference sequences have their own LLN. But some form of asymptotic independence is always required — the LLN fails catastrophically for a sequence that is the same random variable over and over (Xi=X1 for all i gives Xˉn=X1 forever).
Formal statement
Weak Law (Chebyshev's form — finite variance)
Let X1,X2,… be pairwise uncorrelated random variables (not necessarily identically distributed) with common mean μ and uniformly bounded variance Var(Xi)≤σ2<∞. Then
Because Var(Xˉn)=σ2/n — variance shrinks as 1/n — Chebyshev delivers the result immediately.
Strong Law (Kolmogorov's form — i.i.d., finite mean)
Let X1,X2,… be i.i.d. with finite mean μ=E[X1] (no variance requirement). Then
Xˉna.s.μ.
This is a substantially deeper result. The standard proofs (Etemadi's proof, Kolmogorov's proof via a truncation argument and the Borel-Cantelli lemma) require careful handling of the infinite tail of trajectories — not just the behaviour at any fixed n.
Chebyshev for indicators: the frequentist interpretation of probability
Apply the weak LLN to indicator variables Xi=1Ai where Ai are i.i.d. events each occurring with probability p. Then E[Xi]=p, and:
n1i=1∑n1AiPp.
The empirical frequency of the event converges to its probability. This is the justification for every "we ran 10,000 simulations and saw the event happen 153 times, so its probability is 0.0153" calculation.
Rate of convergence — the bridge to the CLT
The LLN says "Xˉn→μ" but says nothing about how fast. The natural question is: at scale n, how far is Xˉn from μ?
The variance is Var(Xˉn)=σ2/n, so the typical fluctuation is of order σ/n. Subtract the mean, multiply by n, and you expose the O(1) fluctuation — which by the Central Limit Theorem is Gaussian. The CLT is the LLN's second-order correction: LLN says Xˉn−μ→0; CLT says n(Xˉn−μ)→σZ with Z∼N(0,1).
Practically: if you need your Monte Carlo estimator to be within ϵ of the true value with 95% confidence, you need approximately n≈(1.96σ/ϵ)2 samples. Halving ϵ quadruples the sample budget. The LLN gives you convergence; the CLT gives you the compute cost.
Worked examples
Example 1 — Monte Carlo option price
Price a European call with strike K on a log-normal underlying ST under the risk-neutral measure. The LLN says that for i.i.d. simulated payoffs Yi=e−rTmax(ST(i)−K,0) with EQ[Yi]=C (the true option price):
N1i=1∑NYia.s.C.
With σY2 the payoff variance, the standard error at N samples is σY/N. At N=106, σY=10, the CI width is ≈0.02. To get another decimal of precision you need N=108, which is why variance-reduction techniques (antithetic variates, control variates) are so valuable — they reduce σY without increasing N.
Example 2 — Binomial bank: seeing the LLN happen
# Python: run 10,000 independent streams of i.i.d. Bernoulli(0.5),# plot the running average for 5 sample streams.import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(0)
n_streams = 5n_flips = 10_000for _ inrange(n_streams):
flips = rng.integers(0, 2, size=n_flips) # 0 or 1 with prob 0.5 each running_avg = np.cumsum(flips) / np.arange(1, n_flips + 1)
plt.plot(running_avg, alpha=0.7)
plt.axhline(0.5, color='k', linestyle='--')
plt.xscale('log')
plt.xlabel('n'); plt.ylabel('running average')
# All 5 streams visibly squeeze toward 0.5; the spread at n=100 is about ±0.05,# at n=1000 about ±0.015, at n=10000 about ±0.005 — the sqrt(n) decay from CLT.
Example 3 — Heavy tails slow the LLN down
# Python: running mean of a heavy-tailed (Pareto) distribution# converges to the true mean — but slowly.import numpy as np
rng = np.random.default_rng(0)
alpha = 2.5# Pareto shape; mean exists (alpha > 1), variance exists (alpha > 2)true_mean = alpha / (alpha - 1) # = 5/3 ≈ 1.667N = 10**6X = (1 - rng.random(N))**(-1/alpha) # standard Pareto(alpha)running_avg = np.cumsum(X) / np.arange(1, N + 1)
for checkpoint in [100, 1_000, 10_000, 100_000, 1_000_000]:
print(f"n={checkpoint:8d}: mean={running_avg[checkpoint - 1]:.4f} (true={true_mean:.4f})")
# n= 100: mean=1.7803 (true=1.6667)# n= 1000: mean=1.6342 (true=1.6667)# n= 10000: mean=1.6687 (true=1.6667)# n= 100000: mean=1.6713 (true=1.6667)# n=1000000: mean=1.6688 (true=1.6667)
The Pareto(α=2.5) has finite mean (α>1) and finite variance (α>2), so the strong LLN applies. But the running average bounces around — big individual draws shift it — and only by n=104 is the estimate stably near the truth. For α closer to 1 (say α=1.1) the variance is infinite, Chebyshev doesn't apply, and convergence is excruciatingly slow (though still P(Xˉn→μ)=1 in the a.s. sense).
Common confusions and pitfalls
"The LLN says Xˉn becomes exactly μ eventually." No — convergence is asymptotic. At any finite n, Xˉn has a non-trivial distribution; the LLN says the mass of that distribution concentrates near μ, not that Xˉn equals μ.
"If I flip a fair coin 100 times and get 60 heads, the next flips are more likely to be tails to 'restore balance'." This is the gambler's fallacy and it is precisely what the LLN does not say. The LLN asserts that the proportion of heads converges to 1/2 over increasing n — it does so by adding new flips that are each 50/50, not by pushing the past proportion down. The law of averages is not a law of physics with memory; independence is assumed throughout.
"The LLN guarantees my Monte Carlo is accurate." It guarantees your Monte Carlo converges. Accuracy at a given n is controlled by the CLT: the typical error is σ/n, not zero. Don't trust a Monte Carlo estimate without its confidence interval.
"Finite mean is enough." For the strong LLN (Kolmogorov), yes. For the weak LLN via Chebyshev, you also need finite variance. For sequences that aren't i.i.d. (e.g. pairwise uncorrelated, ergodic, stationary), additional conditions may be needed. The clean "finite mean ⇒ LLN" rule is specifically for i.i.d.
"The LLN works for the median too." It does — the sample median converges to the population median under the same mild conditions. But the LLN as stated is for linear statistics (sums, means). The convergence of quantile estimators follows from a different result (Glivenko-Cantelli) and has a different rate.
"The LLN is about one number converging." It is about a sequence of random variables (Xˉn)n converging to a constant. The distinction between "convergence in probability" (weak) and "almost sure convergence" (strong) is a statement about this whole trajectory, not about any single Xˉn.
Where this goes next
Central Limit Theorem: The natural continuation — describes the fluctuation around the limit given by the LLN. Together they are the two pillars of asymptotic statistics.
Monte Carlo Pricing (basic): The LLN is the correctness statement; Monte Carlo is the algorithm. Every variance-reduction technique (antithetic variates, control variates, importance sampling) reduces the CLT-scale fluctuation without changing what the LLN delivers.
Moment Generating Functions: Used to establish large deviations — the probability that Xˉn is very far from μ decays exponentially (Cramér's theorem), far faster than Chebyshev's 1/n rate suggests.
Martingales (Discrete Time): Martingale convergence theorems generalise the strong LLN to dependent sequences. A martingale Mn with bounded Lp norm converges a.s. to a limit M∞.
Ergodic Theorem: The LLN for stationary sequences — time averages equal space averages. Fundamental for time-series estimation and for the calibration of any stationary model.
Exercises
Test your understanding with 3 exercises for this lesson.