CONTENTS

Solution: Deriving the Normal Equations by Calculus

Part 1

L(α,β)=(yiαβxi)2L(\alpha, \beta) = \sum (y_i - \alpha - \beta x_i)^2.

L/α=2(yiαβxi)=0\partial L/\partial\alpha = -2\sum(y_i - \alpha - \beta x_i) = 0, giving

yi=nα+βxi.(A)\sum y_i = n\alpha + \beta\sum x_i. \tag{A}

L/β=2xi(yiαβxi)=0\partial L/\partial\beta = -2\sum x_i(y_i - \alpha - \beta x_i) = 0, giving

xiyi=αxi+βxi2.(B)\sum x_i y_i = \alpha\sum x_i + \beta\sum x_i^2. \tag{B}

Part 2

From (A): α^=yˉβ^xˉ\hat\alpha = \bar y - \hat\beta \bar x.

Substitute into (B):

xiyi=(yˉβ^xˉ)xi+β^xi2=yˉxiβ^xˉxi+β^xi2.\sum x_i y_i = (\bar y - \hat\beta\bar x)\sum x_i + \hat\beta\sum x_i^2 = \bar y\sum x_i - \hat\beta\bar x\sum x_i + \hat\beta\sum x_i^2.

Using xi=nxˉ\sum x_i = n\bar x:

xiyi=nxˉyˉnβ^xˉ2+β^xi2.\sum x_i y_i = n\bar x\bar y - n\hat\beta\bar x^2 + \hat\beta\sum x_i^2.

Rearrange:

β^(xi2nxˉ2)=xiyinxˉyˉ.\hat\beta(\sum x_i^2 - n\bar x^2) = \sum x_i y_i - n\bar x\bar y. β^=xiyinxˉyˉxi2nxˉ2=(xixˉ)(yiyˉ)(xixˉ)2.\hat\beta = \frac{\sum x_i y_i - n\bar x\bar y}{\sum x_i^2 - n\bar x^2} = \frac{\sum (x_i - \bar x)(y_i - \bar y)}{\sum (x_i - \bar x)^2}. \quad \checkmark

(The last equality uses the algebraic identities xiyinxˉyˉ=(xixˉ)(yiyˉ)\sum x_iy_i - n\bar x\bar y = \sum(x_i - \bar x)(y_i - \bar y) and xi2nxˉ2=(xixˉ)2\sum x_i^2 - n\bar x^2 = \sum(x_i - \bar x)^2.)

Part 3

Sample covariance Cov^(x,y)=1n1(xixˉ)(yiyˉ)\widehat{\text{Cov}}(x, y) = \tfrac{1}{n-1}\sum(x_i - \bar x)(y_i - \bar y) and sample variance Var^(x)=1n1(xixˉ)2\widehat{\text{Var}}(x) = \tfrac{1}{n-1}\sum(x_i - \bar x)^2. Their ratio:

Cov^(x,y)Var^(x)=(xixˉ)(yiyˉ)(xixˉ)2=β^.\frac{\widehat{\text{Cov}}(x, y)}{\widehat{\text{Var}}(x)} = \frac{\sum(x_i - \bar x)(y_i - \bar y)}{\sum(x_i - \bar x)^2} = \hat\beta. \quad \checkmark

The Bessel-correction factor (n1)(n - 1) cancels.

Part 4 — Numerical sanity check

import numpy as np rng = np.random.default_rng(0) n = 100 x = rng.standard_normal(n) eps = rng.standard_normal(n) y = 2 + 3 * x + eps # closed-form x_bar, y_bar = x.mean(), y.mean() beta_hat = np.sum((x - x_bar)*(y - y_bar)) / np.sum((x - x_bar)**2) alpha_hat = y_bar - beta_hat * x_bar print(f"alpha={alpha_hat:.3f}, beta={beta_hat:.3f}") # alpha=2.020, beta=3.094

Close to the true (2,3)(2, 3); deviations are consistent with SE(β^)=σ/(xixˉ)21/100=0.1\text{SE}(\hat\beta) = \sigma/\sqrt{\sum(x_i - \bar x)^2} \approx 1/\sqrt{100} = 0.1.

Takeaways

  • Normal equations emerge from setting partial derivatives of the squared-loss to zero. No calculus tricks; direct application of first-order conditions.
  • Closed form in simple (1-d) regression: β^\hat\beta is a covariance-variance ratio. This is the "rise over run" intuition for slope, made rigorous.
  • Sample covariance and variance forms are the numerator and denominator. Bessel's correction (n1)(n - 1) cancels in the ratio, so using "sum" or "sum divided by n1n - 1" gives the same β^\hat\beta.
  • Standard error decreases as 1/(xixˉ)21/\sqrt{\sum(x_i - \bar x)^2}. More data and more variation in the predictor both improve precision.