CONTENTS

Central Limit Theorem

Motivation: why this matters in quant finance

Every time a quant computes a standard error on a backtest Sharpe, a confidence interval on a VaR estimate, a bootstrapped p-value for a strategy's alpha, or a normal approximation for a portfolio P&L, they are leaning on the Central Limit Theorem (CLT). It is the single result that lets a practitioner say "my estimator is approximately normal" without knowing anything else about the underlying distribution.
The CLT explains why the Black-Scholes model assumes log-normal returns in the first place: daily log-returns are sums of many tiny intraday increments, and by the CLT those sums are approximately normal — hence log-prices are normal and prices are log-normal. It also explains why factor models, portfolio aggregations, and Monte Carlo output distributions all look Gaussian even when the atomic input distributions are not. Without the CLT, half of quantitative finance would have no statistical foundation.
But the theorem has teeth beyond its statement. Its failure modes tell you when to be suspicious: heavy-tailed distributions with infinite variance (Cauchy, certain Pareto tails) break the CLT entirely; finite but large variance makes the approximation slow; dependence across observations requires stronger conditions (central-limit theorems for martingales and mixing sequences). Knowing when the CLT doesn't apply is the mark of a careful quant.

The informal idea

The CLT is a three-word statement: averages are normal. But to understand why, you need to understand two competing effects in the sum Sn=X1++XnS_n = X_1 + \cdots + X_n of i.i.d. random variables:
  1. The mean grows linearly. E[Sn]=nμ\mathbb{E}[S_n] = n\mu, so SnS_n drifts off to ±\pm\infty (unless μ=0\mu = 0).
  2. The standard deviation grows as n\sqrt n. Var(Sn)=nσ2\operatorname{Var}(S_n) = n\sigma^2, so the fluctuation around the mean has scale σn\sigma\sqrt n.

To see the shape of the fluctuation we have to cancel the drift and rescale the fluctuation:

Zn=SnnμσnZ_n = \frac{S_n - n\mu}{\sigma\sqrt n}

ZnZ_n has mean 00 and variance 11 for every nn. The CLT says that as nn \to \infty, the distribution of ZnZ_n converges to N(0,1)\mathcal{N}(0, 1) — regardless of what distribution the XiX_i came from, as long as they have finite variance.

This is remarkable. The input can be binary, uniform, exponential, log-normal, or some arbitrary mess — all of them funnel into the same Gaussian limit. The Gaussian is a universal attractor for the sum of many independent finite-variance pieces.

Why n\sqrt n and not something else?

If we rescaled by nn (like the Law of Large Numbers), the variance of Sn/nS_n/n would be σ2/n0\sigma^2/n \to 0 and we would see only the deterministic mean μ\mu. If we rescaled by 11 we would see pure drift. The right rescaling — the one that reveals the random fluctuation — has standard deviation of order 11, which means dividing by n\sqrt n. The LLN says the mean converges; the CLT says the fluctuation around that converging mean is Gaussian, at scale n\sqrt n.

Formal statement

Classical (Lindeberg-Lévy) CLT

Let X1,X2,X_1, X_2, \ldots be independent and identically distributed random variables on a common probability space, with finite mean μ=E[X1]\mu = \mathbb{E}[X_1] and finite variance σ2=Var(X1)>0\sigma^2 = \operatorname{Var}(X_1) > 0. Define the standardised partial sum
Zn=Snnμσn=1σni=1n(Xiμ).Z_n = \frac{S_n - n\mu}{\sigma\sqrt n} = \frac{1}{\sigma\sqrt n}\sum_{i=1}^n (X_i - \mu).

Then ZnZ_n converges in distribution to ZN(0,1)Z \sim \mathcal{N}(0, 1):

ZndN(0,1),equivalently,P(Znz)Φ(z) for every zR,Z_n \xrightarrow{d} \mathcal{N}(0, 1), \qquad \text{equivalently,} \qquad \mathbb{P}(Z_n \le z) \to \Phi(z) \text{ for every } z \in \mathbb{R},

where Φ\Phi is the standard normal CDF.

Equivalent formulations

Three statements are mutually equivalent and all called the CLT:

  1. SnnμσndN(0,1)\dfrac{S_n - n\mu}{\sigma\sqrt n} \xrightarrow{d} \mathcal{N}(0, 1).
  2. n(Xˉnμ)dN(0,σ2)\sqrt n\,(\bar X_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2), where Xˉn=Sn/n\bar X_n = S_n/n.
  3. Snnμ+σnZS_n \approx n\mu + \sigma\sqrt n\,Z for large nn, with ZN(0,1)Z \sim \mathcal{N}(0, 1).

Form 2 is the one used in statistics (e.g. constructing confidence intervals for a sample mean); form 3 is the one practitioners carry in their head.

The multidimensional CLT

For vector-valued i.i.d. random variables XiRdX_i \in \mathbb{R}^d with mean μ\mu and covariance matrix Σ\Sigma:

n(Xˉnμ)dNd(0,Σ).\sqrt n\,(\bar X_n - \mu) \xrightarrow{d} \mathcal{N}_d(0, \Sigma).

This is what underwrites factor-model risk estimation, where multiple asset returns are summed across time and the joint limit is a multivariate Gaussian with the asset covariance structure.

Sketch of the proof via characteristic functions

The cleanest proof uses characteristic functions φX(t):=E[eitX]\varphi_X(t) := \mathbb{E}[e^{itX}], which uniquely identify distributions. Let Yi=(Xiμ)/σY_i = (X_i - \mu)/\sigma so E[Yi]=0\mathbb{E}[Y_i] = 0 and Var(Yi)=1\operatorname{Var}(Y_i) = 1. By independence,
φZn(t)=E ⁣[exp ⁣(it1ni=1nYi)]=φY ⁣(tn) ⁣n.\varphi_{Z_n}(t) = \mathbb{E}\!\left[\exp\!\left(it \cdot \frac{1}{\sqrt n}\sum_{i=1}^n Y_i\right)\right] = \varphi_{Y}\!\left(\frac{t}{\sqrt n}\right)^{\!n}.

Taylor-expand φY\varphi_Y near 00: φY(s)=1+iE[Y]s12E[Y2]s2+o(s2)=112s2+o(s2)\varphi_Y(s) = 1 + i\mathbb{E}[Y]s - \tfrac{1}{2}\mathbb{E}[Y^2]s^2 + o(s^2) = 1 - \tfrac{1}{2}s^2 + o(s^2). Substituting s=t/ns = t/\sqrt n:

φZn(t)=(1t22n+o(1/n)) ⁣nexp ⁣(12t2)=φN(0,1)(t).\varphi_{Z_n}(t) = \left(1 - \frac{t^2}{2n} + o(1/n)\right)^{\!n} \longrightarrow \exp\!\left(-\tfrac{1}{2}t^2\right) = \varphi_{\mathcal{N}(0,1)}(t).
By Lévy's continuity theorem, pointwise convergence of characteristic functions to a continuous limit implies convergence in distribution. This proof is remarkable for what it shows: the Gaussian limit is forced by the first two moments alone — every piece of information beyond mean and variance is washed out in the n\sqrt n-rescaling.

Rate of convergence: the Berry-Esseen theorem

The CLT says P(Znz)Φ(z)\mathbb{P}(Z_n \le z) \to \Phi(z), but how fast? If the XiX_i have a finite third moment, the Berry-Esseen theorem gives a uniform bound:
supzRP(Znz)Φ(z)Cρσ3n,ρ=E[X1μ3],\sup_{z \in \mathbb{R}}\left|\mathbb{P}(Z_n \le z) - \Phi(z)\right| \le \frac{C\,\rho}{\sigma^3\sqrt n}, \qquad \rho = \mathbb{E}[|X_1 - \mu|^3],

with CC an absolute constant (modern value C0.47C \approx 0.47). The 1/n1/\sqrt n rate is typical and cannot be improved without further assumptions.

Two practical consequences:

  • For small nn (say n<30n < 30), the Gaussian approximation is bad unless the underlying distribution is itself near-Gaussian. Never trust a 95% CI built from an n=10n = 10 sample of heavy-tailed returns.
  • For heavy-tailed XiX_i (e.g. intraday returns with extreme kurtosis), the constant ρ/σ3\rho/\sigma^3 can be very large, so even at n=1000n = 1000 the tail of ZnZ_n differs materially from the Gaussian tail. This is why empirical VaR estimates for leveraged portfolios use bootstrap rather than normal quantiles.

Worked examples

Example 1 — From coin flips to the normal

Let Xi{0,1}X_i \in \{0, 1\} be i.i.d. Bernoulli(pp) with p=0.5p = 0.5, so μ=0.5\mu = 0.5 and σ=0.5\sigma = 0.5. Then SnS_n = number of heads in nn flips, and by the CLT

Snn/2n/2dN(0,1).\frac{S_n - n/2}{\sqrt{n}/2} \xrightarrow{d} \mathcal{N}(0, 1).

For n=100n = 100, SnS_n is approximately N(50,25)\mathcal{N}(50, 25). The probability of observing 60 or more heads:

P(S10060)1Φ ⁣(60505)=1Φ(2)0.023.\mathbb{P}(S_{100} \ge 60) \approx 1 - \Phi\!\left(\frac{60 - 50}{5}\right) = 1 - \Phi(2) \approx 0.023.

The exact binomial probability is 0.02840.0284 — the Gaussian approximation is off by about 20% relative, because n=100n = 100 is borderline for a Bernoulli with moderate pp. This illustrates the Berry-Esseen rate: finite-nn error is non-trivial.

Example 2 — Sharpe-ratio significance

A strategy generates n=252n = 252 daily returns with sample mean rˉ=0.06%\bar r = 0.06\% per day and sample standard deviation σ^=0.8%\hat\sigma = 0.8\% per day. Under the null hypothesis that the true mean is zero, the CLT says nrˉ/σ^dN(0,1)\sqrt n\,\bar r / \hat\sigma \xrightarrow{d} \mathcal{N}(0, 1). The test statistic:

t=2520.00060.0081.19.t = \frac{\sqrt{252}\cdot 0.0006}{0.008} \approx 1.19.

The two-sided p-value is 2(1Φ(1.19))0.232(1 - \Phi(1.19)) \approx 0.23 — not statistically significant. A trader with a one-year backtest and a Sharpe of rˉ252/σ^1.19\bar r \sqrt{252}/\hat\sigma \approx 1.19 cannot yet distinguish the strategy from noise. Anchoring this in the CLT clarifies that the uncertainty decays as 1/n1/\sqrt n: doubling the backtest length only cuts the confidence interval by a factor of 2\sqrt 2.

Example 3 — Numerical check of the CLT

# Python: show that Z_n = (S_n - n mu) / (sigma sqrt(n)) converges to N(0,1) import numpy as np rng = np.random.default_rng(0) n_values = [1, 5, 30, 200] N = 200_000 # number of independent "experiments" # Start from a heavy-ish distribution to stress the CLT: chi-squared with 2 dof, shifted mu = 2.0 sigma = 2.0 for n in n_values: X = rng.chisquare(2, size=(N, n)) # each row is n i.i.d. chi-sq draws S_n = X.sum(axis=1) Z_n = (S_n - n*mu) / (sigma*np.sqrt(n)) print(f"n={n:3d}: mean(Z_n)={Z_n.mean():+.3f}, sd(Z_n)={Z_n.std():.3f}, " f"skew={((Z_n**3).mean()):+.3f} (N(0,1) has skew 0)") # n= 1: mean(Z_n)=-0.005, sd(Z_n)=0.997, skew=+1.995 (= chi-sq skew, CLT hasn't kicked in) # n= 5: mean(Z_n)=+0.000, sd(Z_n)=0.999, skew=+0.899 # n= 30: mean(Z_n)=+0.002, sd(Z_n)=1.001, skew=+0.368 # n=200: mean(Z_n)=-0.001, sd(Z_n)=1.000, skew=+0.140

The skewness, which starts at 82.83\sqrt 8 \approx 2.83 for the chi-squared, decays as 1/n1/\sqrt n — exactly the Berry-Esseen rate. At n=200n = 200 the distribution is essentially Gaussian for any practical purpose.

Common confusions and pitfalls

"The CLT says sums of random variables are normal." Only standardised sums of i.i.d. finite-variance random variables are approximately normal — and only in the limit. The unstandardised sum SnS_n has variance nσ2n\sigma^2 \to \infty; it is not approaching any fixed distribution. Dropping "standardised" or dropping "finite variance" breaks the theorem.
"The CLT requires a large nn." The theorem is an asymptotic statement — it only says ZnN(0,1)Z_n \to \mathcal{N}(0,1) as nn \to \infty. For a specific finite nn, the quality of the approximation depends on (a) the sample size nn and (b) the distance of the source distribution from Gaussian, measured by its third (and higher) moments. Textbook rule "n=30n = 30 is enough" is folklore, not theorem.
"The CLT applies to any random process." No. I.i.d. is a strong assumption. For non-i.i.d. data there are central-limit theorems — for triangular arrays (Lyapunov), for martingale difference sequences, for stationary mixing processes — each with its own side conditions. The CLT for time series (e.g. GARCH residuals, autocorrelated returns) typically requires asymptotic independence, not exact independence.
"The CLT fails for heavy tails." Only when the tails are so heavy that σ2=\sigma^2 = \infty. Cauchy and certain Pareto distributions have no second moment and their sums converge to stable (non-Gaussian) distributions. But log-returns on real stocks, while heavy-tailed, have finite variance — the CLT applies, just slowly.
"If the CLT gives a normal limit, the original distribution is normal." Absolutely not. The CLT is about sums; the sum becomes normal but the individual XiX_i can be anything with finite variance. This is why sample means are approximately normal even when samples are skewed.
"The CLT tells you the mean." No — the Law of Large Numbers gives Xˉnμ\bar X_n \to \mu (one number). The CLT describes the fluctuation around that convergence: XˉnμσZ/n\bar X_n - \mu \approx \sigma Z/\sqrt n. LLN is the first-order statement, CLT is the second-order correction.

Where this goes next

  • Law of Large Numbers: The first-order partner of the CLT — Xˉnμ\bar X_n \to \mu almost surely. LLN gives convergence of the mean; CLT gives the Gaussian envelope around that convergence.
  • Characteristic Functions: The tool the CLT proof is built on. The CLT is essentially a second-order Taylor expansion of the characteristic function near the origin.
  • Brownian Motion: Donsker's invariance principle is the functional version of the CLT — the whole random-walk path (not just its endpoint) converges in distribution to Brownian motion. This is why continuous-time finance uses Brownian drivers.
  • Monte Carlo Pricing (Basic): The CLT is the engine behind Monte Carlo error bars — the standard error of the estimator decays as σ/N\sigma/\sqrt N, and confidence intervals use Gaussian quantiles because of the CLT.
  • Moment Generating Functions: An alternative route to the CLT proof, plus a tool for large-deviation bounds (Cramér's theorem) — the CLT describes typical fluctuations; large deviations describe rare ones.
  • Maximum Likelihood Estimation: MLE estimators are asymptotically normal by a CLT-style argument applied to the score function. Every confidence interval in classical statistics is CLT in disguise.

Exercises

Test your understanding with 3 exercises for this lesson.