Central Limit Theorem

Motivation: why this matters in quant finance

Every time a quant computes a standard error on a backtest Sharpe, a confidence interval on a VaR estimate, a bootstrapped p-value for a strategy's alpha, or a normal approximation for a portfolio P&L, they are leaning on the Central Limit Theorem (CLT). It is the single result that lets a practitioner say "my estimator is approximately normal" without knowing anything else about the underlying distribution.

The CLT explains why the Black-Scholes model assumes log-normal returns in the first place: daily log-returns are sums of many tiny intraday increments, and by the CLT those sums are approximately normal — hence log-prices are normal and prices are log-normal. It also explains why factor models, portfolio aggregations, and Monte Carlo output distributions all look Gaussian even when the atomic input distributions are not. Without the CLT, half of quantitative finance would have no statistical foundation.

But the theorem has teeth beyond its statement. Its failure modes tell you when to be suspicious: heavy-tailed distributions with infinite variance (Cauchy, certain Pareto tails) break the CLT entirely; finite but large variance makes the approximation slow; dependence across observations requires stronger conditions (central-limit theorems for martingales and mixing sequences). Knowing when the CLT doesn't apply is the mark of a careful quant.

The informal idea

The CLT is a three-word statement: averages are normal. But to understand why, you need to understand two competing effects in the sum

S_n = X_1 + \cdots + X_n

of i.i.d. random variables:

The mean grows linearly. $\mathbb{E}[S_n] = n\mu$ , so $S_n$ drifts off to $\pm\infty$ (unless $\mu = 0$ ).
The standard deviation grows as $\sqrt n$ . $\operatorname{Var}(S_n) = n\sigma^2$ , so the fluctuation around the mean has scale $\sigma\sqrt n$ .

To see the shape of the fluctuation we have to cancel the drift and rescale the fluctuation:

Z_n = \frac{S_n - n\mu}{\sigma\sqrt n}

$Z_n$ has mean $0$ and variance $1$ for every $n$ . The CLT says that as $n \to \infty$ , the distribution of $Z_n$ converges to $\mathcal{N}(0, 1)$ — regardless of what distribution the $X_i$ came from, as long as they have finite variance.

This is remarkable. The input can be binary, uniform, exponential, log-normal, or some arbitrary mess — all of them funnel into the same Gaussian limit. The Gaussian is a universal attractor for the sum of many independent finite-variance pieces.

Why $\sqrt n$ and not something else?

If we rescaled by

n

(like the Law of Large Numbers), the variance of

S_n/n

would be

\sigma^2/n \to 0

and we would see only the deterministic mean

\mu

. If we rescaled by

1

we would see pure drift. The right rescaling — the one that reveals the random fluctuation — has standard deviation of order

1

, which means dividing by

\sqrt n

. The LLN says the mean converges; the CLT says the fluctuation around that converging mean is Gaussian, at scale

\sqrt n

Formal statement

Classical (Lindeberg-Lévy) CLT

Let

X_1, X_2, \ldots

be independent and identically distributed random variables on a common probability space, with finite mean

\mu = \mathbb{E}[X_1]

and finite variance

\sigma^2 = \operatorname{Var}(X_1) > 0

. Define the standardised partial sum

Z_n = \frac{S_n - n\mu}{\sigma\sqrt n} = \frac{1}{\sigma\sqrt n}\sum_{i=1}^n (X_i - \mu).

Then $Z_n$ converges in distribution to $Z \sim \mathcal{N}(0, 1)$ :

Z_n \xrightarrow{d} \mathcal{N}(0, 1), \qquad \text{equivalently,} \qquad \mathbb{P}(Z_n \le z) \to \Phi(z) \text{ for every } z \in \mathbb{R},

where $\Phi$ is the standard normal CDF.

Equivalent formulations

Three statements are mutually equivalent and all called the CLT:

$\dfrac{S_n - n\mu}{\sigma\sqrt n} \xrightarrow{d} \mathcal{N}(0, 1)$ .
$\sqrt n\,(\bar X_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$ , where $\bar X_n = S_n/n$ .
$S_n \approx n\mu + \sigma\sqrt n\,Z$ for large $n$ , with $Z \sim \mathcal{N}(0, 1)$ .

Form 2 is the one used in statistics (e.g. constructing confidence intervals for a sample mean); form 3 is the one practitioners carry in their head.

The multidimensional CLT

For vector-valued i.i.d. random variables $X_i \in \mathbb{R}^d$ with mean $\mu$ and covariance matrix $\Sigma$ :

\sqrt n\,(\bar X_n - \mu) \xrightarrow{d} \mathcal{N}_d(0, \Sigma).

This is what underwrites factor-model risk estimation, where multiple asset returns are summed across time and the joint limit is a multivariate Gaussian with the asset covariance structure.

Sketch of the proof via characteristic functions

The cleanest proof uses characteristic functions

\varphi_X(t) := \mathbb{E}[e^{itX}]

, which uniquely identify distributions. Let

Y_i = (X_i - \mu)/\sigma

\mathbb{E}[Y_i] = 0

and

\operatorname{Var}(Y_i) = 1

. By independence,

\varphi_{Z_n}(t) = \mathbb{E}\!\left[\exp\!\left(it \cdot \frac{1}{\sqrt n}\sum_{i=1}^n Y_i\right)\right] = \varphi_{Y}\!\left(\frac{t}{\sqrt n}\right)^{\!n}.

Taylor-expand $\varphi_Y$ near $0$ : $\varphi_Y(s) = 1 + i\mathbb{E}[Y]s - \tfrac{1}{2}\mathbb{E}[Y^2]s^2 + o(s^2) = 1 - \tfrac{1}{2}s^2 + o(s^2)$ . Substituting $s = t/\sqrt n$ :

\varphi_{Z_n}(t) = \left(1 - \frac{t^2}{2n} + o(1/n)\right)^{\!n} \longrightarrow \exp\!\left(-\tfrac{1}{2}t^2\right) = \varphi_{\mathcal{N}(0,1)}(t).

By Lévy's continuity theorem, pointwise convergence of characteristic functions to a continuous limit implies convergence in distribution. This proof is remarkable for what it shows: the Gaussian limit is forced by the first two moments alone — every piece of information beyond mean and variance is washed out in the

\sqrt n

-rescaling.

Rate of convergence: the Berry-Esseen theorem

The CLT says

\mathbb{P}(Z_n \le z) \to \Phi(z)

, but how fast? If the

X_i

have a finite third moment, the Berry-Esseen theorem gives a uniform bound:

\sup_{z \in \mathbb{R}}\left|\mathbb{P}(Z_n \le z) - \Phi(z)\right| \le \frac{C\,\rho}{\sigma^3\sqrt n}, \qquad \rho = \mathbb{E}[|X_1 - \mu|^3],

with $C$ an absolute constant (modern value $C \approx 0.47$ ). The $1/\sqrt n$ rate is typical and cannot be improved without further assumptions.

Two practical consequences:

For small $n$ (say $n < 30$ ), the Gaussian approximation is bad unless the underlying distribution is itself near-Gaussian. Never trust a 95% CI built from an $n = 10$ sample of heavy-tailed returns.
For heavy-tailed $X_i$ (e.g. intraday returns with extreme kurtosis), the constant $\rho/\sigma^3$ can be very large, so even at $n = 1000$ the tail of $Z_n$ differs materially from the Gaussian tail. This is why empirical VaR estimates for leveraged portfolios use bootstrap rather than normal quantiles.

Worked examples

Example 1 — From coin flips to the normal

Let $X_i \in \{0, 1\}$ be i.i.d. Bernoulli( $p$ ) with $p = 0.5$ , so $\mu = 0.5$ and $\sigma = 0.5$ . Then $S_n$ = number of heads in $n$ flips, and by the CLT

\frac{S_n - n/2}{\sqrt{n}/2} \xrightarrow{d} \mathcal{N}(0, 1).

For $n = 100$ , $S_n$ is approximately $\mathcal{N}(50, 25)$ . The probability of observing 60 or more heads:

\mathbb{P}(S_{100} \ge 60) \approx 1 - \Phi\!\left(\frac{60 - 50}{5}\right) = 1 - \Phi(2) \approx 0.023.

The exact binomial probability is $0.0284$ — the Gaussian approximation is off by about 20% relative, because $n = 100$ is borderline for a Bernoulli with moderate $p$ . This illustrates the Berry-Esseen rate: finite- $n$ error is non-trivial.

Example 2 — Sharpe-ratio significance

A strategy generates $n = 252$ daily returns with sample mean $\bar r = 0.06\%$ per day and sample standard deviation $\hat\sigma = 0.8\%$ per day. Under the null hypothesis that the true mean is zero, the CLT says $\sqrt n\,\bar r / \hat\sigma \xrightarrow{d} \mathcal{N}(0, 1)$ . The test statistic:

t = \frac{\sqrt{252}\cdot 0.0006}{0.008} \approx 1.19.

The two-sided p-value is $2(1 - \Phi(1.19)) \approx 0.23$ — not statistically significant. A trader with a one-year backtest and a Sharpe of $\bar r \sqrt{252}/\hat\sigma \approx 1.19$ cannot yet distinguish the strategy from noise. Anchoring this in the CLT clarifies that the uncertainty decays as $1/\sqrt n$ : doubling the backtest length only cuts the confidence interval by a factor of $\sqrt 2$ .

Example 3 — Numerical check of the CLT

# Python: show that Z_n = (S_n - n mu) / (sigma sqrt(n)) converges to N(0,1)
import numpy as np

rng = np.random.default_rng(0)
n_values = [1, 5, 30, 200]
N = 200_000    # number of independent "experiments"

# Start from a heavy-ish distribution to stress the CLT: chi-squared with 2 dof, shifted
mu = 2.0
sigma = 2.0

for n in n_values:
    X = rng.chisquare(2, size=(N, n))          # each row is n i.i.d. chi-sq draws
    S_n = X.sum(axis=1)
    Z_n = (S_n - n*mu) / (sigma*np.sqrt(n))
    print(f"n={n:3d}:  mean(Z_n)={Z_n.mean():+.3f},  sd(Z_n)={Z_n.std():.3f},  "
          f"skew={((Z_n**3).mean()):+.3f}  (N(0,1) has skew 0)")
# n=  1:  mean(Z_n)=-0.005, sd(Z_n)=0.997, skew=+1.995   (= chi-sq skew, CLT hasn't kicked in)
# n=  5:  mean(Z_n)=+0.000, sd(Z_n)=0.999, skew=+0.899
# n= 30:  mean(Z_n)=+0.002, sd(Z_n)=1.001, skew=+0.368
# n=200:  mean(Z_n)=-0.001, sd(Z_n)=1.000, skew=+0.140

The skewness, which starts at $\sqrt 8 \approx 2.83$ for the chi-squared, decays as $1/\sqrt n$ — exactly the Berry-Esseen rate. At $n = 200$ the distribution is essentially Gaussian for any practical purpose.

Common confusions and pitfalls

"The CLT says sums of random variables are normal." Only standardised sums of i.i.d. finite-variance random variables are approximately normal — and only in the limit. The unstandardised sum

S_n

has variance

n\sigma^2 \to \infty

; it is not approaching any fixed distribution. Dropping "standardised" or dropping "finite variance" breaks the theorem.

"The CLT requires a large $n$ ." The theorem is an asymptotic statement — it only says

Z_n \to \mathcal{N}(0,1)

n \to \infty

. For a specific finite

n

, the quality of the approximation depends on (a) the sample size

n

and (b) the distance of the source distribution from Gaussian, measured by its third (and higher) moments. Textbook rule "

n = 30

is enough" is folklore, not theorem.

"The CLT applies to any random process." No. I.i.d. is a strong assumption. For non-i.i.d. data there are central-limit theorems — for triangular arrays (Lyapunov), for martingale difference sequences, for stationary mixing processes — each with its own side conditions. The CLT for time series (e.g. GARCH residuals, autocorrelated returns) typically requires asymptotic independence, not exact independence.

"The CLT fails for heavy tails." Only when the tails are so heavy that

\sigma^2 = \infty

. Cauchy and certain Pareto distributions have no second moment and their sums converge to stable (non-Gaussian) distributions. But log-returns on real stocks, while heavy-tailed, have finite variance — the CLT applies, just slowly.

"If the CLT gives a normal limit, the original distribution is normal." Absolutely not. The CLT is about sums; the sum becomes normal but the individual

X_i

can be anything with finite variance. This is why sample means are approximately normal even when samples are skewed.

"The CLT tells you the mean." No — the Law of Large Numbers gives

\bar X_n \to \mu

(one number). The CLT describes the fluctuation around that convergence:

\bar X_n - \mu \approx \sigma Z/\sqrt n

. LLN is the first-order statement, CLT is the second-order correction.

Where this goes next

Law of Large Numbers: The first-order partner of the CLT — $\bar X_n \to \mu$ almost surely. LLN gives convergence of the mean; CLT gives the Gaussian envelope around that convergence.
Characteristic Functions: The tool the CLT proof is built on. The CLT is essentially a second-order Taylor expansion of the characteristic function near the origin.
Brownian Motion: Donsker's invariance principle is the functional version of the CLT — the whole random-walk path (not just its endpoint) converges in distribution to Brownian motion. This is why continuous-time finance uses Brownian drivers.
Monte Carlo Pricing (Basic): The CLT is the engine behind Monte Carlo error bars — the standard error of the estimator decays as $\sigma/\sqrt N$ , and confidence intervals use Gaussian quantiles because of the CLT.
Moment Generating Functions: An alternative route to the CLT proof, plus a tool for large-deviation bounds (Cramér's theorem) — the CLT describes typical fluctuations; large deviations describe rare ones.
Maximum Likelihood Estimation: MLE estimators are asymptotically normal by a CLT-style argument applied to the score function. Every confidence interval in classical statistics is CLT in disguise.