CONTENTS

Solution: Sharpe-Ratio Confidence Interval from the CLT

Part 1

Under i.i.d. Gaussian returns, Lo (2002, FAJ) shows via the delta method applied to the CLT that

SR^dN ⁣(SR, 1+12SR2nyears)as n.\widehat{\text{SR}} \overset{d}{\approx} \mathcal{N}\!\left(\text{SR},\ \frac{1 + \tfrac{1}{2}\text{SR}^2}{n_{\text{years}}}\right)\quad\text{as } n \to \infty.

The variance has two pieces: the 1/nyears1/n_{\text{years}} baseline (the CLT telling you the sample mean has variance σ2/n\sigma^2/n), plus the 12SR2/nyears\tfrac{1}{2}\text{SR}^2/n_{\text{years}} correction from the jointly-estimated standard deviation. Under non-Gaussian returns a skew/kurtosis correction appears; we drop it here.

Part 2

For SR^=1.0\widehat{\text{SR}} = 1.0, nyears=1n_{\text{years}} = 1:

SE=(1+0.5)/1=1.51.225.\text{SE} = \sqrt{(1 + 0.5)/1} = \sqrt{1.5} \approx 1.225.
The 95% CI is SR^±1.96SE=1.0±2.40\widehat{\text{SR}} \pm 1.96\cdot \text{SE} = 1.0 \pm 2.40, i.e. [1.40, 3.40][-1.40,\ 3.40]. It includes zero. A one-year backtest with Sharpe 1.01.0 is not statistically significant.

Part 3

Required: SR^1.96SE>0\widehat{\text{SR}} - 1.96\cdot \text{SE} > 0, i.e. nyears>(1.96/SR^)2(1+12SR^2)n_{\text{years}} > (1.96/\widehat{\text{SR}})^2 \cdot (1 + \tfrac{1}{2}\widehat{\text{SR}}^2).

  • SR^=1.0\widehat{\text{SR}} = 1.0: nyears>(1.96)21.55.76n_{\text{years}} > (1.96)^2\cdot 1.5 \approx 5.76 — about 6 years.
  • SR^=0.5\widehat{\text{SR}} = 0.5: nyears>(1.96/0.5)21.12517.3n_{\text{years}} > (1.96/0.5)^2\cdot 1.125 \approx 17.3 — about 18 years.

At a Sharpe of 0.50.5 you need roughly three times the backtest of a Sharpe-1.01.0 strategy for the same level of significance. This is why low-Sharpe systematic strategies are so hard to validate — you almost never have enough data.

Part 4

The standard error grows with Sharpe because a larger Sharpe is achieved either through larger μ\mu or smaller σ\sigma, and the sample standard deviation σ^\hat\sigma itself is noisy; the fractional error in σ^\hat\sigma propagates through division and amplifies the fractional error in the ratio. Intuitively, a high-Sharpe strategy has a tight volatility estimate that contributes most of the uncertainty.

Takeaways

  • Sharpe ratios from short backtests are almost meaningless. A one-year Sharpe of 1.01.0 is consistent at the 95% level with true Sharpe of 1.4-1.4. This is why prop desks demand years of out-of-sample track record before sizing up.
  • The n\sqrt n-rate of the CLT sets the scale of validation. To halve a confidence interval you need to quadruple the backtest length.
  • Higher point estimates do not trivially mean higher significance. Standard error grows with SR\text{SR}, so the tt-statistic SR^/SE=SR^nyears/(1+SR2/2)\widehat{\text{SR}}/\text{SE} = \widehat{\text{SR}}\sqrt{n_{\text{years}}/(1 + \text{SR}^2/2)} is weakly concave in SR\text{SR} — a strategy with Sharpe 2.02.0 is not twice as significant as one with Sharpe 1.01.0 at the same nn.
  • Lo (2002) is the canonical reference. Every backtester should know that formula.
Solution — Sharpe-Ratio Confidence Interval from the CLT | q4quant.studio