Every return distribution, every risk model, every performance report begins with the same question: what are the basic numbers that summarise this data? The four moments — mean, variance, skewness, kurtosis — are the answer. They compress an entire distribution into four numbers that tell you: the average return, how volatile it is, whether crashes are more likely than rallies, and how fat the tails are.
These four numbers drive concrete financial decisions:
Mean return determines whether a strategy or asset is worth holding.
Variance / standard deviation is the baseline measure of risk and the input to portfolio optimisation (Markowitz mean-variance).
Skewness tells you whether your risk is asymmetric — are you picking up pennies in front of a steamroller (negative skew), or do you have convex payoffs (positive skew)?
Kurtosis tells you how much tail risk you carry — a leptokurtic return distribution means extreme events happen more often than a Gaussian model predicts, which directly affects VaR, Expected Shortfall, and margin requirements.
Beyond moments, summary statistics like median, quantiles, and range provide robust alternatives that resist the distortions caused by outliers — a practical concern when a single day's crash can dominate a sample mean.
The four moments
First moment: mean (expected value)
Population mean:
μ=E[X]=∫xf(x)dx
Sample mean:
xˉ=n1i=1∑nxi
The sample mean is an unbiased estimator of μ: E[xˉ]=μ. Its standard error is σ/n by the Central Limit Theorem.
Finance interpretation: The mean return of a strategy determines its expected P&L. But estimating μ precisely is notoriously difficult: with daily σ≈1% and μ≈0.04% (10% annualised), the signal-to-noise ratio is 0.04/1=0.04 — you need roughly (1/0.04)2=625 observations (2.5 years of daily data) just to detect whether the mean is different from zero at the 95% level. This is why mean estimation is the hardest problem in portfolio optimisation.
Second moment: variance and standard deviation
Population variance:
σ2=Var(X)=E[(X−μ)2]=E[X2]−μ2
Sample variance (unbiased):
s2=n−11i=1∑n(xi−xˉ)2
The divisor is n−1 (Bessel's correction), not n, because xˉ consumes one degree of freedom. The biased version with n in the denominator systematically underestimates σ2.
Standard deviation:σ=σ2 (population), s=s2 (sample). Standard deviation is in the same units as the data, making it interpretable: "the stock has 20% annual volatility" means σannual=0.20.
Annualisation: If σdaily is the daily standard deviation of log-returns, the annualised volatility is:
σannual=σdaily×252
assuming 252 trading days and i.i.d. returns. The 252 comes from Var(∑i=1252ri)=252Var(ri), so SD=252σdaily. This is the "square-root-of-time rule" — see Normal Distribution.
Third moment: skewness
Population skewness:
γ1=E[(σX−μ)3]=σ3μ3
where μ3=E[(X−μ)3] is the third central moment.
Sample skewness:
g1=n1i=1∑n(sxi−xˉ)3
(Adjusted versions with n(n−1)/(n−2) correction exist for small samples.)
γ1<0: negative skew (left tail is heavier). Equity returns are typically negatively skewed — large drops are more frequent than large rallies of the same magnitude.
γ1>0: positive skew (right tail is heavier). Some option strategies (buying OTM puts) produce positively skewed P&L: many small losses and occasional large gains.
Finance significance: Skewness tells you whether your risk is asymmetric. A strategy with μ>0 and γ1≪0 may be profitable on average but vulnerable to catastrophic drawdowns. Short volatility strategies, carry trades, and credit-selling strategies typically exhibit this profile — steady income punctuated by rare large losses.
Fourth moment: kurtosis
Population kurtosis:
κ=E[(σX−μ)4]=σ4μ4
Excess kurtosis:κexcess=κ−3. The subtraction of 3 normalises against the normal distribution, which has κ=3.
Sample excess kurtosis:
g2=n1i=1∑n(sxi−xˉ)4−3
Interpretation:
κexcess=0: mesokurtic (normal-like tails).
κexcess>0: leptokurtic (fatter tails than normal). Empirical daily equity returns typically have excess kurtosis of 3–10, depending on the market and period.
κexcess<0: platykurtic (thinner tails than normal). The uniform distribution has excess kurtosis −6/5.
Finance significance: Kurtosis determines how much "tail risk" your model misses. A Gaussian VaR model assumes κexcess=0, but if the true excess kurtosis is 5, the probability of a 4-sigma event is roughly 3× higher than the model predicts. This is why Student's t-distribution (which has excess kurtosis 6/(ν−4) for ν>4) is the standard fat-tailed alternative.
Warning: Sample kurtosis is extremely noisy. A single outlier can dominate the fourth power, making g2 unstable. With 250 daily observations, the standard error of the sample kurtosis estimator is approximately 24/n≈0.31 — you cannot reliably distinguish κexcess=3 from κexcess=5.
Beyond moments: robust summary statistics
Moments are sensitive to outliers (especially higher moments). Robust alternatives:
Median
The median is the 50th percentile: P(X≤m)=0.5. For a symmetric distribution, the median equals the mean. For skewed distributions, they differ: the log-normal has mean eμ+σ2/2 but median eμ — the mean is pulled up by the right tail.
The median is robust to outliers: replacing the largest observation with +∞ does not change the median (but destroys the mean).
Quantiles and percentiles
The p-quantileqp satisfies P(X≤qp)=p. Key quantiles:
Quantile
Name
Finance use
q0.01
1st percentile
99% VaR
q0.05
5th percentile
95% VaR
q0.25
First quartile
Box plots, range analysis
q0.50
Median
Robust central tendency
VaR is a quantile: VaRα=−qα. It depends only on the tail of the distribution, not on the centre, making it a targeted risk measure.
Interquartile range (IQR)
IQR=q0.75−q0.25
A robust measure of spread. For the normal, IQR=2×0.6745σ≈1.35σ. If the observed IQR is much larger than 1.35s, the data is more dispersed in the tails than a Gaussian — evidence of fat tails.
Median absolute deviation (MAD)
MAD=median(∣xi−median(x)∣)
For the normal distribution, MAD=0.6745σ, so σ^MAD=MAD/0.6745 is a robust volatility estimator. It is much less affected by a single extreme return than the sample standard deviation.
Sharpe ratio and other performance measures
Sharpe ratio
SR=srˉ−rf
where rˉ is the mean return, rf is the risk-free rate, and s is the standard deviation of returns. The Sharpe ratio is a normalised performance measure: return per unit of risk.
Annualisation: If computed from daily returns, SRannual=SRdaily×252. This follows from the mean scaling as 252μdaily and the standard deviation scaling as 252σdaily.
Limitation: The Sharpe ratio is a complete performance measure only if returns are normally distributed (mean and variance fully describe the distribution). With skewness or fat tails, two strategies with the same Sharpe can have very different risk profiles — one may have benign, symmetric risk while the other is a crash-prone carry trade.
Sortino ratio
Sortino=Downside deviationrˉ−rf
where downside deviation uses only negative returns:
DD=n1i:ri<rf∑(ri−rf)2
This addresses the Sharpe ratio's weakness: it penalises only downside volatility, not upside. For negatively skewed strategies, Sortino < Sharpe (downside vol exceeds total vol scaled appropriately).
Maximum drawdown
MDD=0≤s≤t≤Tmax(PsPs−Pt)
The largest peak-to-trough decline. This is not a moment-based statistic — it is a path-dependent measure that captures the worst loss experience. Practitioners often care more about MDD than variance because it reflects the pain of actually holding the strategy.
Empirical moments of financial returns
Typical values for daily log-returns of major equity indices:
Statistic
Typical range
Implication
Mean (daily)
0.02%–0.05%
Hard to estimate; dominated by noise
Std dev (daily)
0.8%–1.5%
Annualised: 13%–24%
Skewness
−0.3 to −0.8
Negative: crashes > rallies
Excess kurtosis
3–10
Fat tails: extreme events too frequent for Gaussian
These empirical facts drive the modelling choices throughout the vault: the normal distribution assumption is a useful approximation for the centre of the distribution but fails in the tails, motivating Student's t, GARCH, and stochastic volatility models.
Examples and applications
Example 1: computing sample moments
Daily log-returns of a stock over 5 days: r=(0.8%,−1.2%,0.3%,−2.5%,0.6%).
where we use deviations from the mean. Computing: (0.012)2+(0.008)2+(0.007)2+(0.021)2+(0.010)2=0.000838×10−0. Actually, let's compute in percentage terms for clarity:
Deviations from mean −0.4: (1.2,−0.8,0.7,−2.1,1.0).
s2=41.44+0.64+0.49+4.41+1.00=47.98=1.995 (in %2).
s=1.41% daily, or 1.41%×252=22.4% annualised.
Note how the single −2.5% day contributes 4.41/7.98=55% of the total variance — a single outlier dominates.
Example 2: why skewness matters for strategy selection
Two strategies over 252 days:
Strategy A
Strategy B
Mean (annual)
10%
10%
Std dev (annual)
15%
15%
Sharpe
0.67
0.67
Skewness
+0.3
−1.2
Max drawdown
12%
35%
Identical Sharpe ratios, but Strategy B has severe negative skew and a max drawdown 3× larger. Strategy B is a "picking up nickels in front of a steamroller" profile. Skewness (and the related maximum drawdown) reveals risks that the Sharpe ratio hides.
Common confusions and pitfalls
"Variance is risk." Variance treats upside and downside equally. In finance, only downside matters. This is why downside deviation, VaR, and Expected Shortfall are used alongside (or instead of) variance. Still, variance/standard deviation remains the dominant measure due to its mathematical tractability (mean-variance optimisation, factor models, GARCH).
"Sample moments are reliable with small samples." The sample mean of daily returns converges slowly (O(1/n)). Sample skewness and kurtosis converge even more slowly and are dominated by outliers. With 1 year of daily data (n=252), you can estimate volatility reasonably well, but mean, skewness, and kurtosis estimates carry large uncertainty.
"High kurtosis means high volatility." No. Kurtosis is a standardised fourth moment (μ4/σ4) — it measures the shape of the tail relative to the distribution's own spread, not the level of spread. A distribution can have low volatility and high kurtosis (rare but large relative moves) or high volatility and normal kurtosis.
"Annualise everything by multiplying by 252." Mean scales with time (252μdaily). Standard deviation scales with 252. Variance scales with 252. Sharpe ratio scales with 252. Skewness and kurtosis of i.i.d. sums change in complex ways and do not simply scale. Always check the scaling rule for the specific statistic.
Where this goes next
Moments and summary statistics connect to:
Normal Distribution: The normal is fully characterised by mean and variance (skewness = 0, excess kurtosis = 0). Departures from normality are measured by the third and fourth moments.
Correlation and Dependence: Covariance is a "cross-moment" E[(X−μX)(Y−μY)] — the second moment generalised to two variables.
Improper Integrals: Moment existence is determined by convergence of ∫∣x∣nf(x)dx.
Squeeze Theorem and Bounds: Jensen's inequality relates the mean of a convex function to the function of the mean — the source of the σ2/2 convexity correction.