Solution: MLE for a Normal with Unknown Mean and Variance

Exercise: MLE for a Normal with Unknown Mean and Variance

Parts 1–2

$\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum(x_i - \mu)^2$ .

$\partial\ell/\partial\mu = \tfrac{1}{\sigma^2}\sum(x_i - \mu) = 0 \Rightarrow \hat\mu = \bar x$ .

$\partial\ell/\partial\sigma^2 = -\tfrac{n}{2\sigma^2} + \tfrac{1}{2\sigma^4}\sum(x_i - \mu)^2 = 0 \Rightarrow \sigma^2 = \tfrac{1}{n}\sum(x_i - \mu)^2$ . Substituting $\hat\mu$ :

\hat\sigma^2 = \frac{1}{n}\sum(x_i - \bar x)^2.

Part 3

$\mathbb{E}[\sum(X_i - \bar X)^2] = \mathbb{E}[\sum(X_i - \mu)^2 - n(\bar X - \mu)^2] = n\sigma^2 - n\cdot\sigma^2/n = (n-1)\sigma^2$ .

\mathbb{E}[\hat\sigma^2] = (n-1)\sigma^2/n < \sigma^2

. MLE variance is biased downward.

Part 4

Bessel-corrected estimator:

\hat\sigma^2_{\text{Bessel}} = \frac{1}{n-1}\sum(x_i - \bar x)^2

, with

\mathbb{E}[\hat\sigma^2_{\text{Bessel}}] = \sigma^2

. This is what numpy.var(x, ddof=1) computes.

Part 5 — Simulation

import numpy as np
rng = np.random.default_rng(0)
m, n = 10_000, 10
X = rng.standard_normal((m, n))

mle_var = np.mean(X.var(axis=1, ddof=0))
bessel_var = np.mean(X.var(axis=1, ddof=1))

print(f"average MLE variance: {mle_var:.4f} (expected (n-1)/n = {(n-1)/n:.4f})")
print(f"average Bessel variance: {bessel_var:.4f} (expected 1.0)")
# average MLE variance: 0.9036 (expected (n-1)/n = 0.9000)
# average Bessel variance: 1.0040 (expected 1.0)

Takeaways

MLE for $\mu$ is $\bar x$ and is unbiased ( $\mathbb{E}[\bar X] = \mu$ ).
MLE for $\sigma^2$ is biased downward by a factor $(n-1)/n$ . The bias arises because $\sum(x_i - \bar x)^2 < \sum(x_i - \mu)^2$ — using $\bar x$ instead of $\mu$ reduces variability by one degree of freedom.
Bessel correction divides by $n - 1$ instead of $n$ to recover an unbiased estimator. This is the default in scientific computing libraries.
MLE's asymptotic unbiasedness kicks in as $n \to \infty$ : $(n-1)/n \to 1$ . For $n$ in the thousands, the MLE and Bessel estimators are indistinguishable; in small samples (monthly data, rare events), the distinction matters.
Efficiency vs. unbiasedness trade-off. The MLE has smaller variance than the Bessel-corrected estimator, but is biased. The MSE of MLE is slightly smaller for small $n$ .