Solution: Sharpe-Ratio Confidence Interval from the CLT

Exercise: Sharpe-Ratio Confidence Interval from the CLT

Part 1

Under i.i.d. Gaussian returns, Lo (2002, FAJ) shows via the delta method applied to the CLT that

\widehat{\text{SR}} \overset{d}{\approx} \mathcal{N}\!\left(\text{SR},\ \frac{1 + \tfrac{1}{2}\text{SR}^2}{n_{\text{years}}}\right)\quad\text{as } n \to \infty.

The variance has two pieces: the $1/n_{\text{years}}$ baseline (the CLT telling you the sample mean has variance $\sigma^2/n$ ), plus the $\tfrac{1}{2}\text{SR}^2/n_{\text{years}}$ correction from the jointly-estimated standard deviation. Under non-Gaussian returns a skew/kurtosis correction appears; we drop it here.

Part 2

For $\widehat{\text{SR}} = 1.0$ , $n_{\text{years}} = 1$ :

\text{SE} = \sqrt{(1 + 0.5)/1} = \sqrt{1.5} \approx 1.225.

The 95% CI is

\widehat{\text{SR}} \pm 1.96\cdot \text{SE} = 1.0 \pm 2.40

, i.e.

[-1.40,\ 3.40]

. It includes zero. A one-year backtest with Sharpe

1.0

is not statistically significant.

Part 3

Required: $\widehat{\text{SR}} - 1.96\cdot \text{SE} > 0$ , i.e. $n_{\text{years}} > (1.96/\widehat{\text{SR}})^2 \cdot (1 + \tfrac{1}{2}\widehat{\text{SR}}^2)$ .

$\widehat{\text{SR}} = 1.0$ : $n_{\text{years}} > (1.96)^2\cdot 1.5 \approx 5.76$ — about 6 years.
$\widehat{\text{SR}} = 0.5$ : $n_{\text{years}} > (1.96/0.5)^2\cdot 1.125 \approx 17.3$ — about 18 years.

At a Sharpe of $0.5$ you need roughly three times the backtest of a Sharpe- $1.0$ strategy for the same level of significance. This is why low-Sharpe systematic strategies are so hard to validate — you almost never have enough data.

Part 4

The standard error grows with Sharpe because a larger Sharpe is achieved either through larger $\mu$ or smaller $\sigma$ , and the sample standard deviation $\hat\sigma$ itself is noisy; the fractional error in $\hat\sigma$ propagates through division and amplifies the fractional error in the ratio. Intuitively, a high-Sharpe strategy has a tight volatility estimate that contributes most of the uncertainty.

Takeaways

Sharpe ratios from short backtests are almost meaningless. A one-year Sharpe of $1.0$ is consistent at the 95% level with true Sharpe of $-1.4$ . This is why prop desks demand years of out-of-sample track record before sizing up.
The $\sqrt n$ -rate of the CLT sets the scale of validation. To halve a confidence interval you need to quadruple the backtest length.
Higher point estimates do not trivially mean higher significance. Standard error grows with $\text{SR}$ , so the $t$ -statistic $\widehat{\text{SR}}/\text{SE} = \widehat{\text{SR}}\sqrt{n_{\text{years}}/(1 + \text{SR}^2/2)}$ is weakly concave in $\text{SR}$ — a strategy with Sharpe $2.0$ is not twice as significant as one with Sharpe $1.0$ at the same $n$ .
Lo (2002) is the canonical reference. Every backtester should know that formula.