CONTENTS

Solution: Regression as Projection — Computing the Hat Matrix

Part 1 — Symmetry

(XX)1(X^\top X)^{-1} is symmetric (it's the inverse of a symmetric matrix). So:

H=(X(XX)1X)=X((XX)1)X=X(XX)1X=H.H^\top = (X(X^\top X)^{-1}X^\top)^\top = X((X^\top X)^{-1})^\top X^\top = X(X^\top X)^{-1}X^\top = H. \quad \checkmark

Part 2 — Idempotence

H2=X(XX)1XX(XX)1X=X(XX)1(XX)(XX)1X=X(XX)1X=H.H^2 = X(X^\top X)^{-1}X^\top\cdot X(X^\top X)^{-1}X^\top = X(X^\top X)^{-1}(X^\top X)(X^\top X)^{-1}X^\top = X(X^\top X)^{-1}X^\top = H. \quad \checkmark

Part 3 — Trace

tr(H)=tr(X(XX)1X)=tr((XX)1XX)=tr(Ip)=p.\text{tr}(H) = \text{tr}(X(X^\top X)^{-1}X^\top) = \text{tr}((X^\top X)^{-1}X^\top X) = \text{tr}(I_p) = p.

Part 4 — Residuals orthogonal to columns

Xe=X(yXβ^)=XyXX(XX)1Xy=XyXy=0X^\top e = X^\top(y - X\hat\beta) = X^\top y - X^\top X(X^\top X)^{-1}X^\top y = X^\top y - X^\top y = 0. ✓

Part 5 — Numerical example

import numpy as np X = np.array([[1, 0], [1, 1], [1, 2], [1, 3]]) y = np.array([1, 2, 2, 3]) XtX_inv = np.linalg.inv(X.T @ X) beta_hat = XtX_inv @ X.T @ y print("beta_hat:", beta_hat) # beta_hat: [1.1 0.6] H = X @ XtX_inv @ X.T print("H =\n", H.round(3)) # H = # [[0.7 0.4 0.1 -0.2] # [0.4 0.3 0.2 0.1] # [0.1 0.2 0.3 0.4] # [-0.2 0.1 0.4 0.7]] print("trace(H):", np.trace(H)) # trace(H): 2.0 (= p ✓) y_hat = H @ y e = y - y_hat print("y_hat:", y_hat) print("e:", e) # y_hat: [1.1 1.7 2.3 2.9] # e: [-0.1 0.3 -0.3 0.1] print("X^T e:", X.T @ e) # X^T e: [ 1.11e-16 -2.22e-16] (numerically zero ✓) print("H^2 == H:", np.allclose(H @ H, H)) # H^2 == H: True

Takeaways

  • Hat matrix is a projection: symmetric and idempotent. Geometrically HH projects Rn\mathbb{R}^n onto the column space of XX.
  • tr(H)=p\text{tr}(H) = p — the "effective degrees of freedom" of the regression. In regularisation, this becomes tr(Hλ)<p\text{tr}(H_\lambda) < p, giving a meaningful measure of model complexity.
  • Residuals orthogonal to features. Xe=0X^\top e = 0 is the first-order condition for least squares; it says no linear combination of features can further reduce residual squared-norm.
  • Leverage points have large HiiH_{ii}. Diagonal entries Hii[0,1]H_{ii} \in [0, 1] with Hii=p\sum H_{ii} = p; large values flag observations with unusual feature vectors that have outsized influence on β^\hat\beta. Standard diagnostic for robust regression.
Solution — Regression as Projection: Computing the Hat Matrix | q4quant.studio