CONTENTS

Moments and Summary Statistics

Motivation: why this matters in quant finance

Every return distribution, every risk model, every performance report begins with the same question: what are the basic numbers that summarise this data? The four moments — mean, variance, skewness, kurtosis — are the answer. They compress an entire distribution into four numbers that tell you: the average return, how volatile it is, whether crashes are more likely than rallies, and how fat the tails are.

These four numbers drive concrete financial decisions:

  • Mean return determines whether a strategy or asset is worth holding.
  • Variance / standard deviation is the baseline measure of risk and the input to portfolio optimisation (Markowitz mean-variance).
  • Skewness tells you whether your risk is asymmetric — are you picking up pennies in front of a steamroller (negative skew), or do you have convex payoffs (positive skew)?
  • Kurtosis tells you how much tail risk you carry — a leptokurtic return distribution means extreme events happen more often than a Gaussian model predicts, which directly affects VaR, Expected Shortfall, and margin requirements.

Beyond moments, summary statistics like median, quantiles, and range provide robust alternatives that resist the distortions caused by outliers — a practical concern when a single day's crash can dominate a sample mean.

The four moments

First moment: mean (expected value)

Population mean:
μ=E[X]=xf(x)dx\mu = \mathbb{E}[X] = \int x\,f(x)\,dx
Sample mean:
xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i
The sample mean is an unbiased estimator of μ\mu: E[xˉ]=μ\mathbb{E}[\bar{x}] = \mu. Its standard error is σ/n\sigma/\sqrt{n} by the Central Limit Theorem.
Finance interpretation: The mean return of a strategy determines its expected P&L. But estimating μ\mu precisely is notoriously difficult: with daily σ1%\sigma \approx 1\% and μ0.04%\mu \approx 0.04\% (10% annualised), the signal-to-noise ratio is 0.04/1=0.040.04/1 = 0.04 — you need roughly (1/0.04)2=625(1/0.04)^2 = 625 observations (2.5 years of daily data) just to detect whether the mean is different from zero at the 95% level. This is why mean estimation is the hardest problem in portfolio optimisation.

Second moment: variance and standard deviation

Population variance:
σ2=Var(X)=E[(Xμ)2]=E[X2]μ2\sigma^2 = \text{Var}(X) = \mathbb{E}[(X - \mu)^2] = \mathbb{E}[X^2] - \mu^2
Sample variance (unbiased):
s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2

The divisor is n1n - 1 (Bessel's correction), not nn, because xˉ\bar{x} consumes one degree of freedom. The biased version with nn in the denominator systematically underestimates σ2\sigma^2.

Standard deviation: σ=σ2\sigma = \sqrt{\sigma^2} (population), s=s2s = \sqrt{s^2} (sample). Standard deviation is in the same units as the data, making it interpretable: "the stock has 20% annual volatility" means σannual=0.20\sigma_{\text{annual}} = 0.20.
Annualisation: If σdaily\sigma_{\text{daily}} is the daily standard deviation of log-returns, the annualised volatility is:
σannual=σdaily×252\sigma_{\text{annual}} = \sigma_{\text{daily}} \times \sqrt{252}
assuming 252 trading days and i.i.d. returns. The 252\sqrt{252} comes from Var(i=1252ri)=252Var(ri)\text{Var}(\sum_{i=1}^{252} r_i) = 252\,\text{Var}(r_i), so SD=252σdaily\text{SD} = \sqrt{252}\,\sigma_{\text{daily}}. This is the "square-root-of-time rule" — see Normal Distribution.

Third moment: skewness

Population skewness:
γ1=E[(Xμσ)3]=μ3σ3\gamma_1 = \mathbb{E}\left[\left(\frac{X - \mu}{\sigma}\right)^3\right] = \frac{\mu_3}{\sigma^3}

where μ3=E[(Xμ)3]\mu_3 = \mathbb{E}[(X - \mu)^3] is the third central moment.

Sample skewness:
g1=1ni=1n(xixˉs)3g_1 = \frac{1}{n}\sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^3

(Adjusted versions with n(n1)/(n2)n(n-1)/(n-2) correction exist for small samples.)

Interpretation:
  • γ1=0\gamma_1 = 0: symmetric distribution (normal).
  • γ1<0\gamma_1 < 0: negative skew (left tail is heavier). Equity returns are typically negatively skewed — large drops are more frequent than large rallies of the same magnitude.
  • γ1>0\gamma_1 > 0: positive skew (right tail is heavier). Some option strategies (buying OTM puts) produce positively skewed P&L: many small losses and occasional large gains.
Finance significance: Skewness tells you whether your risk is asymmetric. A strategy with μ>0\mu > 0 and γ10\gamma_1 \ll 0 may be profitable on average but vulnerable to catastrophic drawdowns. Short volatility strategies, carry trades, and credit-selling strategies typically exhibit this profile — steady income punctuated by rare large losses.

Fourth moment: kurtosis

Population kurtosis:
κ=E[(Xμσ)4]=μ4σ4\kappa = \mathbb{E}\left[\left(\frac{X - \mu}{\sigma}\right)^4\right] = \frac{\mu_4}{\sigma^4}
Excess kurtosis: κexcess=κ3\kappa_{\text{excess}} = \kappa - 3. The subtraction of 3 normalises against the normal distribution, which has κ=3\kappa = 3.
Sample excess kurtosis:
g2=1ni=1n(xixˉs)43g_2 = \frac{1}{n}\sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^4 - 3
Interpretation:
  • κexcess=0\kappa_{\text{excess}} = 0: mesokurtic (normal-like tails).
  • κexcess>0\kappa_{\text{excess}} > 0: leptokurtic (fatter tails than normal). Empirical daily equity returns typically have excess kurtosis of 3–10, depending on the market and period.
  • κexcess<0\kappa_{\text{excess}} < 0: platykurtic (thinner tails than normal). The uniform distribution has excess kurtosis 6/5-6/5.
Finance significance: Kurtosis determines how much "tail risk" your model misses. A Gaussian VaR model assumes κexcess=0\kappa_{\text{excess}} = 0, but if the true excess kurtosis is 5, the probability of a 4-sigma event is roughly 3× higher than the model predicts. This is why Student's tt-distribution (which has excess kurtosis 6/(ν4)6/(\nu - 4) for ν>4\nu > 4) is the standard fat-tailed alternative.
Warning: Sample kurtosis is extremely noisy. A single outlier can dominate the fourth power, making g2g_2 unstable. With 250 daily observations, the standard error of the sample kurtosis estimator is approximately 24/n0.31\sqrt{24/n} \approx 0.31 — you cannot reliably distinguish κexcess=3\kappa_{\text{excess}} = 3 from κexcess=5\kappa_{\text{excess}} = 5.

Beyond moments: robust summary statistics

Moments are sensitive to outliers (especially higher moments). Robust alternatives:

Median

The median is the 50th percentile: P(Xm)=0.5\mathbb{P}(X \leq m) = 0.5. For a symmetric distribution, the median equals the mean. For skewed distributions, they differ: the log-normal has mean eμ+σ2/2e^{\mu + \sigma^2/2} but median eμe^{\mu} — the mean is pulled up by the right tail.

The median is robust to outliers: replacing the largest observation with ++\infty does not change the median (but destroys the mean).

Quantiles and percentiles

The pp-quantile qpq_p satisfies P(Xqp)=p\mathbb{P}(X \leq q_p) = p. Key quantiles:
QuantileNameFinance use
q0.01q_{0.01}1st percentile99% VaR
q0.05q_{0.05}5th percentile95% VaR
q0.25q_{0.25}First quartileBox plots, range analysis
q0.50q_{0.50}MedianRobust central tendency

VaR is a quantile: VaRα=qα\text{VaR}_\alpha = -q_\alpha. It depends only on the tail of the distribution, not on the centre, making it a targeted risk measure.

Interquartile range (IQR)

IQR=q0.75q0.25\text{IQR} = q_{0.75} - q_{0.25}
A robust measure of spread. For the normal, IQR=2×0.6745σ1.35σ\text{IQR} = 2 \times 0.6745\sigma \approx 1.35\sigma. If the observed IQR is much larger than 1.35s1.35s, the data is more dispersed in the tails than a Gaussian — evidence of fat tails.

Median absolute deviation (MAD)

MAD=median(ximedian(x))\text{MAD} = \text{median}(|x_i - \text{median}(x)|)

For the normal distribution, MAD=0.6745σ\text{MAD} = 0.6745\sigma, so σ^MAD=MAD/0.6745\hat{\sigma}_{\text{MAD}} = \text{MAD}/0.6745 is a robust volatility estimator. It is much less affected by a single extreme return than the sample standard deviation.

Sharpe ratio and other performance measures

Sharpe ratio

SR=rˉrfs\text{SR} = \frac{\bar{r} - r_f}{s}

where rˉ\bar{r} is the mean return, rfr_f is the risk-free rate, and ss is the standard deviation of returns. The Sharpe ratio is a normalised performance measure: return per unit of risk.

Annualisation: If computed from daily returns, SRannual=SRdaily×252\text{SR}_{\text{annual}} = \text{SR}_{\text{daily}} \times \sqrt{252}. This follows from the mean scaling as 252μdaily252\mu_{\text{daily}} and the standard deviation scaling as 252σdaily\sqrt{252}\,\sigma_{\text{daily}}.
Limitation: The Sharpe ratio is a complete performance measure only if returns are normally distributed (mean and variance fully describe the distribution). With skewness or fat tails, two strategies with the same Sharpe can have very different risk profiles — one may have benign, symmetric risk while the other is a crash-prone carry trade.

Sortino ratio

Sortino=rˉrfDownside deviation\text{Sortino} = \frac{\bar{r} - r_f}{\text{Downside deviation}}

where downside deviation uses only negative returns:

DD=1ni:ri<rf(rirf)2\text{DD} = \sqrt{\frac{1}{n}\sum_{i: r_i < r_f}(r_i - r_f)^2}

This addresses the Sharpe ratio's weakness: it penalises only downside volatility, not upside. For negatively skewed strategies, Sortino << Sharpe (downside vol exceeds total vol scaled appropriately).

Maximum drawdown

MDD=max0stT(PsPtPs)\text{MDD} = \max_{0 \leq s \leq t \leq T}\left(\frac{P_s - P_t}{P_s}\right)

The largest peak-to-trough decline. This is not a moment-based statistic — it is a path-dependent measure that captures the worst loss experience. Practitioners often care more about MDD than variance because it reflects the pain of actually holding the strategy.

Empirical moments of financial returns

Typical values for daily log-returns of major equity indices:

StatisticTypical rangeImplication
Mean (daily)0.02%–0.05%Hard to estimate; dominated by noise
Std dev (daily)0.8%–1.5%Annualised: 13%–24%
Skewness0.3-0.3 to 0.8-0.8Negative: crashes > rallies
Excess kurtosis3–10Fat tails: extreme events too frequent for Gaussian
These empirical facts drive the modelling choices throughout the vault: the normal distribution assumption is a useful approximation for the centre of the distribution but fails in the tails, motivating Student's tt, GARCH, and stochastic volatility models.

Examples and applications

Example 1: computing sample moments

Daily log-returns of a stock over 5 days: r=(0.8%,1.2%,0.3%,2.5%,0.6%)r = (0.8\%, -1.2\%, 0.3\%, -2.5\%, 0.6\%).

rˉ=0.81.2+0.32.5+0.65=0.4%\bar{r} = \frac{0.8 - 1.2 + 0.3 - 2.5 + 0.6}{5} = -0.4\% s2=14[(1.2)2+(0.8)2+(0.7)2+(2.1)2+(1.0)2]×104s^2 = \frac{1}{4}\left[(1.2)^2 + (-0.8)^2 + (0.7)^2 + (-2.1)^2 + (1.0)^2\right] \times 10^{-4}

where we use deviations from the mean. Computing: (0.012)2+(0.008)2+(0.007)2+(0.021)2+(0.010)2=0.000838×100(0.012)^2 + (0.008)^2 + (0.007)^2 + (0.021)^2 + (0.010)^2 = 0.000838 \times 10^{-0}. Actually, let's compute in percentage terms for clarity:

Deviations from mean 0.4-0.4: (1.2,0.8,0.7,2.1,1.0)(1.2, -0.8, 0.7, -2.1, 1.0).

s2=1.44+0.64+0.49+4.41+1.004=7.984=1.995s^2 = \frac{1.44 + 0.64 + 0.49 + 4.41 + 1.00}{4} = \frac{7.98}{4} = 1.995 (in %2\%^2).

s=1.41%s = 1.41\% daily, or 1.41%×252=22.4%1.41\% \times \sqrt{252} = 22.4\% annualised.

Note how the single 2.5%-2.5\% day contributes 4.41/7.98=55%4.41/7.98 = 55\% of the total variance — a single outlier dominates.

Example 2: why skewness matters for strategy selection

Two strategies over 252 days:

Strategy AStrategy B
Mean (annual)10%10%
Std dev (annual)15%15%
Sharpe0.670.67
Skewness+0.3−1.2
Max drawdown12%35%

Identical Sharpe ratios, but Strategy B has severe negative skew and a max drawdown 3× larger. Strategy B is a "picking up nickels in front of a steamroller" profile. Skewness (and the related maximum drawdown) reveals risks that the Sharpe ratio hides.

Common confusions and pitfalls

"Variance is risk." Variance treats upside and downside equally. In finance, only downside matters. This is why downside deviation, VaR, and Expected Shortfall are used alongside (or instead of) variance. Still, variance/standard deviation remains the dominant measure due to its mathematical tractability (mean-variance optimisation, factor models, GARCH).
"Sample moments are reliable with small samples." The sample mean of daily returns converges slowly (O(1/n)O(1/\sqrt{n})). Sample skewness and kurtosis converge even more slowly and are dominated by outliers. With 1 year of daily data (n=252n = 252), you can estimate volatility reasonably well, but mean, skewness, and kurtosis estimates carry large uncertainty.
"High kurtosis means high volatility." No. Kurtosis is a standardised fourth moment (μ4/σ4\mu_4/\sigma^4) — it measures the shape of the tail relative to the distribution's own spread, not the level of spread. A distribution can have low volatility and high kurtosis (rare but large relative moves) or high volatility and normal kurtosis.
"Annualise everything by multiplying by 252." Mean scales with time (252μdaily252\mu_{\text{daily}}). Standard deviation scales with 252\sqrt{252}. Variance scales with 252. Sharpe ratio scales with 252\sqrt{252}. Skewness and kurtosis of i.i.d. sums change in complex ways and do not simply scale. Always check the scaling rule for the specific statistic.

Where this goes next

Moments and summary statistics connect to:

  • Normal Distribution: The normal is fully characterised by mean and variance (skewness = 0, excess kurtosis = 0). Departures from normality are measured by the third and fourth moments.
  • Student's tt-Distribution: The standard fat-tailed alternative when excess kurtosis > 0.
  • Correlation and Dependence: Covariance is a "cross-moment" E[(XμX)(YμY)]\mathbb{E}[(X-\mu_X)(Y-\mu_Y)] — the second moment generalised to two variables.
  • Improper Integrals: Moment existence is determined by convergence of xnf(x)dx\int |x|^n f(x)\,dx.
  • Squeeze Theorem and Bounds: Jensen's inequality relates the mean of a convex function to the function of the mean — the source of the σ2/2\sigma^2/2 convexity correction.
Moments and Summary Statistics | q4quant.studio