Solution: Sharpe-Ratio Confidence Interval from the CLT
Part 1
Under i.i.d. Gaussian returns, Lo (2002, FAJ) shows via the delta method applied to the CLT that
The variance has two pieces: the baseline (the CLT telling you the sample mean has variance ), plus the correction from the jointly-estimated standard deviation. Under non-Gaussian returns a skew/kurtosis correction appears; we drop it here.
Part 2
For , :
Part 3
Required: , i.e. .
- : — about 6 years.
- : — about 18 years.
At a Sharpe of you need roughly three times the backtest of a Sharpe- strategy for the same level of significance. This is why low-Sharpe systematic strategies are so hard to validate — you almost never have enough data.
Part 4
The standard error grows with Sharpe because a larger Sharpe is achieved either through larger or smaller , and the sample standard deviation itself is noisy; the fractional error in propagates through division and amplifies the fractional error in the ratio. Intuitively, a high-Sharpe strategy has a tight volatility estimate that contributes most of the uncertainty.
Takeaways
- Sharpe ratios from short backtests are almost meaningless. A one-year Sharpe of is consistent at the 95% level with true Sharpe of . This is why prop desks demand years of out-of-sample track record before sizing up.
- The -rate of the CLT sets the scale of validation. To halve a confidence interval you need to quadruple the backtest length.
- Higher point estimates do not trivially mean higher significance. Standard error grows with , so the -statistic is weakly concave in — a strategy with Sharpe is not twice as significant as one with Sharpe at the same .
- Lo (2002) is the canonical reference. Every backtester should know that formula.