Solution: Fisher Information and the Cramér-Rao Bound for Exponential Rate

Exercise: Fisher Information and the Cramér-Rao Bound for Exponential Rate

Part 1

$\ell(\lambda) = n\log\lambda - \lambda\sum x_i$ . $\ell'(\lambda) = n/\lambda - \sum x_i = 0 \Rightarrow \hat\lambda = n/\sum x_i = 1/\bar x$ . ✓

Part 2

$\log p(x; \lambda) = \log\lambda - \lambda x$ . Derivatives:

\partial^2/\partial\lambda^2\,(\log\lambda - \lambda x) = -1/\lambda^2.

I(\lambda) = -\mathbb{E}[-1/\lambda^2] = 1/\lambda^2.

Cramér-Rao bound: $\text{Var}(\hat\lambda) \ge 1/(n\cdot 1/\lambda^2) = \lambda^2/n$ .

Part 3

Compute $\text{Var}(\hat\lambda)$ exactly. $\bar X = (X_1 + \cdots + X_n)/n$ . Sum of $n$ i.i.d. $\text{Exp}(\lambda)$ is $\text{Gamma}(n, \lambda)$ , so $1/\bar X = n/Y$ where $Y \sim \text{Gamma}(n, \lambda)$ .

For $Y \sim \text{Gamma}(n, \lambda)$ : $\mathbb{E}[1/Y] = \lambda/(n - 1)$ and $\mathbb{E}[1/Y^2] = \lambda^2/((n-1)(n-2))$ (for $n > 2$ ).

\mathbb{E}[\hat\lambda] = n\cdot \lambda/(n-1) = \lambda\cdot n/(n-1)

— biased upward.

$\text{Var}(\hat\lambda) = n^2\cdot (\mathbb{E}[1/Y^2] - (\mathbb{E}[1/Y])^2) = n^2\cdot \lambda^2/((n-1)^2(n-2))$ .

In the limit

n \to \infty

\text{Var}(\hat\lambda) \sim \lambda^2/n

. Asymptotically the MLE saturates the Cramér-Rao bound. But the CRB applies only to unbiased estimators; since

\hat\lambda

is biased, the bound is only meaningful asymptotically.

Part 4 — Simulation

import numpy as np
rng = np.random.default_rng(0)
lam = 2.0
n = 100
m = 10_000

X = rng.exponential(1/lam, size=(m, n))
lam_hat = 1 / X.mean(axis=1)

Z = np.sqrt(n) * (lam_hat - lam)
print(f"mean of sqrt(n)(hat lambda - lambda): {Z.mean():.3f} (expected 0)")
print(f"var: {Z.var():.3f} (expected lambda^2 = {lam**2})")
# mean of sqrt(n)(hat lambda - lambda): 0.045 (expected 0)
# var: 4.120 (expected lambda^2 = 4.0)

Mean close to $0$ , variance close to $\lambda^2 = 4$ . The $\sqrt n(\hat\lambda - \lambda)$ distribution is approximately $\mathcal{N}(0, \lambda^2)$ , saturating the Cramér-Rao lower bound asymptotically.

Takeaways

Fisher information is the reciprocal of the asymptotic variance. Large $I(\theta)$ means data is very informative about $\theta$ ; small $I$ means poor.
MLE is asymptotically efficient: it achieves the CRB asymptotically. No unbiased estimator can do asymptotically better.
Finite-sample bias doesn't matter asymptotically but matters for small $n$ . For exponential rate, the bias decays as $1/(n-1)$ , which is small for $n \ge 50$ .
The formula $\text{SE}(\hat\theta) = 1/\sqrt{n I(\hat\theta)}$ is the workhorse approximation. It gives quick, approximate confidence intervals: $\hat\theta \pm 1.96\cdot \text{SE}(\hat\theta)$ .
Connection to OLS. OLS is MLE under gaussian errors; the Fisher-information matrix for $\beta$ is $X^\top X/\sigma^2$ , giving $\text{Var}(\hat\beta) = \sigma^2(X^\top X)^{-1}$ — exactly the OLS covariance formula derived earlier.