Moments and Summary Statistics

Motivation: why this matters in quant finance

Every return distribution, every risk model, every performance report begins with the same question: what are the basic numbers that summarise this data? The four moments — mean, variance, skewness, kurtosis — are the answer. They compress an entire distribution into four numbers that tell you: the average return, how volatile it is, whether crashes are more likely than rallies, and how fat the tails are.

These four numbers drive concrete financial decisions:

Mean return determines whether a strategy or asset is worth holding.
Variance / standard deviation is the baseline measure of risk and the input to portfolio optimisation (Markowitz mean-variance).
Skewness tells you whether your risk is asymmetric — are you picking up pennies in front of a steamroller (negative skew), or do you have convex payoffs (positive skew)?
Kurtosis tells you how much tail risk you carry — a leptokurtic return distribution means extreme events happen more often than a Gaussian model predicts, which directly affects VaR, Expected Shortfall, and margin requirements.

Beyond moments, summary statistics like median, quantiles, and range provide robust alternatives that resist the distortions caused by outliers — a practical concern when a single day's crash can dominate a sample mean.

The four moments

First moment: mean (expected value)

Population mean:

\mu = \mathbb{E}[X] = \int x\,f(x)\,dx

Sample mean:

\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i

The sample mean is an unbiased estimator of

\mu

\mathbb{E}[\bar{x}] = \mu

. Its standard error is

\sigma/\sqrt{n}

by the Central Limit Theorem.

Finance interpretation: The mean return of a strategy determines its expected P&L. But estimating

\mu

precisely is notoriously difficult: with daily

\sigma \approx 1\%

and

\mu \approx 0.04\%

(10% annualised), the signal-to-noise ratio is

0.04/1 = 0.04

— you need roughly

(1/0.04)^2 = 625

observations (2.5 years of daily data) just to detect whether the mean is different from zero at the 95% level. This is why mean estimation is the hardest problem in portfolio optimisation.

Second moment: variance and standard deviation

Population variance:

\sigma^2 = \text{Var}(X) = \mathbb{E}[(X - \mu)^2] = \mathbb{E}[X^2] - \mu^2

Sample variance (unbiased):

s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2

The divisor is $n - 1$ (Bessel's correction), not $n$ , because $\bar{x}$ consumes one degree of freedom. The biased version with $n$ in the denominator systematically underestimates $\sigma^2$ .

Standard deviation:

\sigma = \sqrt{\sigma^2}

(population),

s = \sqrt{s^2}

(sample). Standard deviation is in the same units as the data, making it interpretable: "the stock has 20% annual volatility" means

\sigma_{\text{annual}} = 0.20

Annualisation: If

\sigma_{\text{daily}}

is the daily standard deviation of log-returns, the annualised volatility is:

\sigma_{\text{annual}} = \sigma_{\text{daily}} \times \sqrt{252}

assuming 252 trading days and i.i.d. returns. The

\sqrt{252}

comes from

\text{Var}(\sum_{i=1}^{252} r_i) = 252\,\text{Var}(r_i)

, so

\text{SD} = \sqrt{252}\,\sigma_{\text{daily}}

. This is the "square-root-of-time rule" — see Normal Distribution.

Third moment: skewness

Population skewness:

\gamma_1 = \mathbb{E}\left[\left(\frac{X - \mu}{\sigma}\right)^3\right] = \frac{\mu_3}{\sigma^3}

where $\mu_3 = \mathbb{E}[(X - \mu)^3]$ is the third central moment.

Sample skewness:

g_1 = \frac{1}{n}\sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^3

(Adjusted versions with $n(n-1)/(n-2)$ correction exist for small samples.)

Interpretation:

$\gamma_1 = 0$ : symmetric distribution (normal).
$\gamma_1 < 0$ : negative skew (left tail is heavier). Equity returns are typically negatively skewed — large drops are more frequent than large rallies of the same magnitude.
$\gamma_1 > 0$ : positive skew (right tail is heavier). Some option strategies (buying OTM puts) produce positively skewed P&L: many small losses and occasional large gains.

Finance significance: Skewness tells you whether your risk is asymmetric. A strategy with

\mu > 0

and

\gamma_1 \ll 0

may be profitable on average but vulnerable to catastrophic drawdowns. Short volatility strategies, carry trades, and credit-selling strategies typically exhibit this profile — steady income punctuated by rare large losses.

Fourth moment: kurtosis

Population kurtosis:

\kappa = \mathbb{E}\left[\left(\frac{X - \mu}{\sigma}\right)^4\right] = \frac{\mu_4}{\sigma^4}

Excess kurtosis:

\kappa_{\text{excess}} = \kappa - 3

. The subtraction of 3 normalises against the normal distribution, which has

\kappa = 3

Sample excess kurtosis:

g_2 = \frac{1}{n}\sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^4 - 3

Interpretation:

$\kappa_{\text{excess}} = 0$ : mesokurtic (normal-like tails).
$\kappa_{\text{excess}} > 0$ : leptokurtic (fatter tails than normal). Empirical daily equity returns typically have excess kurtosis of 3–10, depending on the market and period.
$\kappa_{\text{excess}} < 0$ : platykurtic (thinner tails than normal). The uniform distribution has excess kurtosis $-6/5$ .

Finance significance: Kurtosis determines how much "tail risk" your model misses. A Gaussian VaR model assumes

\kappa_{\text{excess}} = 0

, but if the true excess kurtosis is 5, the probability of a 4-sigma event is roughly 3× higher than the model predicts. This is why Student's

t

-distribution (which has excess kurtosis

6/(\nu - 4)

for

\nu > 4

) is the standard fat-tailed alternative.

Warning: Sample kurtosis is extremely noisy. A single outlier can dominate the fourth power, making

g_2

unstable. With 250 daily observations, the standard error of the sample kurtosis estimator is approximately

\sqrt{24/n} \approx 0.31

— you cannot reliably distinguish

\kappa_{\text{excess}} = 3

from

\kappa_{\text{excess}} = 5

Beyond moments: robust summary statistics

Moments are sensitive to outliers (especially higher moments). Robust alternatives:

Median

The median is the 50th percentile:

\mathbb{P}(X \leq m) = 0.5

. For a symmetric distribution, the median equals the mean. For skewed distributions, they differ: the log-normal has mean

e^{\mu + \sigma^2/2}

but median

e^{\mu}

— the mean is pulled up by the right tail.

The median is robust to outliers: replacing the largest observation with $+\infty$ does not change the median (but destroys the mean).

Quantiles and percentiles

The $p$ -quantile

q_p

satisfies

\mathbb{P}(X \leq q_p) = p

. Key quantiles:

Quantile	Name	Finance use
$q_{0.01}$	1st percentile	99% VaR
$q_{0.05}$	5th percentile	95% VaR
$q_{0.25}$	First quartile	Box plots, range analysis
$q_{0.50}$	Median	Robust central tendency

VaR is a quantile: $\text{VaR}_\alpha = -q_\alpha$ . It depends only on the tail of the distribution, not on the centre, making it a targeted risk measure.

Interquartile range (IQR)

\text{IQR} = q_{0.75} - q_{0.25}

A robust measure of spread. For the normal,

\text{IQR} = 2 \times 0.6745\sigma \approx 1.35\sigma

. If the observed IQR is much larger than

1.35s

, the data is more dispersed in the tails than a Gaussian — evidence of fat tails.

Median absolute deviation (MAD)

\text{MAD} = \text{median}(|x_i - \text{median}(x)|)

For the normal distribution, $\text{MAD} = 0.6745\sigma$ , so $\hat{\sigma}_{\text{MAD}} = \text{MAD}/0.6745$ is a robust volatility estimator. It is much less affected by a single extreme return than the sample standard deviation.

Sharpe ratio and other performance measures

Sharpe ratio

\text{SR} = \frac{\bar{r} - r_f}{s}

where $\bar{r}$ is the mean return, $r_f$ is the risk-free rate, and $s$ is the standard deviation of returns. The Sharpe ratio is a normalised performance measure: return per unit of risk.

Annualisation: If computed from daily returns,

\text{SR}_{\text{annual}} = \text{SR}_{\text{daily}} \times \sqrt{252}

. This follows from the mean scaling as

252\mu_{\text{daily}}

and the standard deviation scaling as

\sqrt{252}\,\sigma_{\text{daily}}

Limitation: The Sharpe ratio is a complete performance measure only if returns are normally distributed (mean and variance fully describe the distribution). With skewness or fat tails, two strategies with the same Sharpe can have very different risk profiles — one may have benign, symmetric risk while the other is a crash-prone carry trade.

Sortino ratio

\text{Sortino} = \frac{\bar{r} - r_f}{\text{Downside deviation}}

where downside deviation uses only negative returns:

\text{DD} = \sqrt{\frac{1}{n}\sum_{i: r_i < r_f}(r_i - r_f)^2}

This addresses the Sharpe ratio's weakness: it penalises only downside volatility, not upside. For negatively skewed strategies, Sortino $<$ Sharpe (downside vol exceeds total vol scaled appropriately).

Maximum drawdown

\text{MDD} = \max_{0 \leq s \leq t \leq T}\left(\frac{P_s - P_t}{P_s}\right)

The largest peak-to-trough decline. This is not a moment-based statistic — it is a path-dependent measure that captures the worst loss experience. Practitioners often care more about MDD than variance because it reflects the pain of actually holding the strategy.

Empirical moments of financial returns

Typical values for daily log-returns of major equity indices:

Statistic	Typical range	Implication
Mean (daily)	0.02%–0.05%	Hard to estimate; dominated by noise
Std dev (daily)	0.8%–1.5%	Annualised: 13%–24%
Skewness	$-0.3$ to $-0.8$	Negative: crashes > rallies
Excess kurtosis	3–10	Fat tails: extreme events too frequent for Gaussian

These empirical facts drive the modelling choices throughout the vault: the normal distribution assumption is a useful approximation for the centre of the distribution but fails in the tails, motivating Student's

t

, GARCH, and stochastic volatility models.

Examples and applications

Example 1: computing sample moments

Daily log-returns of a stock over 5 days: $r = (0.8\%, -1.2\%, 0.3\%, -2.5\%, 0.6\%)$ .

\bar{r} = \frac{0.8 - 1.2 + 0.3 - 2.5 + 0.6}{5} = -0.4\%

s^2 = \frac{1}{4}\left[(1.2)^2 + (-0.8)^2 + (0.7)^2 + (-2.1)^2 + (1.0)^2\right] \times 10^{-4}

where we use deviations from the mean. Computing: $(0.012)^2 + (0.008)^2 + (0.007)^2 + (0.021)^2 + (0.010)^2 = 0.000838 \times 10^{-0}$ . Actually, let's compute in percentage terms for clarity:

Deviations from mean $-0.4$ : $(1.2, -0.8, 0.7, -2.1, 1.0)$ .

$s^2 = \frac{1.44 + 0.64 + 0.49 + 4.41 + 1.00}{4} = \frac{7.98}{4} = 1.995$ (in $\%^2$ ).

$s = 1.41\%$ daily, or $1.41\% \times \sqrt{252} = 22.4\%$ annualised.

Note how the single $-2.5\%$ day contributes $4.41/7.98 = 55\%$ of the total variance — a single outlier dominates.

Example 2: why skewness matters for strategy selection

Two strategies over 252 days:

	Strategy A	Strategy B
Mean (annual)	10%	10%
Std dev (annual)	15%	15%
Sharpe	0.67	0.67
Skewness	+0.3	−1.2
Max drawdown	12%	35%

Identical Sharpe ratios, but Strategy B has severe negative skew and a max drawdown 3× larger. Strategy B is a "picking up nickels in front of a steamroller" profile. Skewness (and the related maximum drawdown) reveals risks that the Sharpe ratio hides.

Common confusions and pitfalls

"Variance is risk." Variance treats upside and downside equally. In finance, only downside matters. This is why downside deviation, VaR, and Expected Shortfall are used alongside (or instead of) variance. Still, variance/standard deviation remains the dominant measure due to its mathematical tractability (mean-variance optimisation, factor models, GARCH).

"Sample moments are reliable with small samples." The sample mean of daily returns converges slowly (

O(1/\sqrt{n})

). Sample skewness and kurtosis converge even more slowly and are dominated by outliers. With 1 year of daily data (

n = 252

), you can estimate volatility reasonably well, but mean, skewness, and kurtosis estimates carry large uncertainty.

"High kurtosis means high volatility." No. Kurtosis is a standardised fourth moment (

\mu_4/\sigma^4

) — it measures the shape of the tail relative to the distribution's own spread, not the level of spread. A distribution can have low volatility and high kurtosis (rare but large relative moves) or high volatility and normal kurtosis.

"Annualise everything by multiplying by 252." Mean scales with time (

252\mu_{\text{daily}}

). Standard deviation scales with

\sqrt{252}

. Variance scales with 252. Sharpe ratio scales with

\sqrt{252}

. Skewness and kurtosis of i.i.d. sums change in complex ways and do not simply scale. Always check the scaling rule for the specific statistic.

Where this goes next

Moments and summary statistics connect to:

Normal Distribution: The normal is fully characterised by mean and variance (skewness = 0, excess kurtosis = 0). Departures from normality are measured by the third and fourth moments.
Student's $t$ -Distribution: The standard fat-tailed alternative when excess kurtosis > 0.
Correlation and Dependence: Covariance is a "cross-moment" $\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]$ — the second moment generalised to two variables.
Improper Integrals: Moment existence is determined by convergence of $\int |x|^n f(x)\,dx$ .
Squeeze Theorem and Bounds: Jensen's inequality relates the mean of a convex function to the function of the mean — the source of the $\sigma^2/2$ convexity correction.