Solution: Regression as Projection — Computing the Hat Matrix
Part 1 — Symmetry
is symmetric (it's the inverse of a symmetric matrix). So:
Part 2 — Idempotence
Part 3 — Trace
Part 4 — Residuals orthogonal to columns
. ✓
Part 5 — Numerical example
import numpy as np
X = np.array([[1, 0], [1, 1], [1, 2], [1, 3]])
y = np.array([1, 2, 2, 3])
XtX_inv = np.linalg.inv(X.T @ X)
beta_hat = XtX_inv @ X.T @ y
print("beta_hat:", beta_hat)
# beta_hat: [1.1 0.6]
H = X @ XtX_inv @ X.T
print("H =\n", H.round(3))
# H =
# [[0.7 0.4 0.1 -0.2]
# [0.4 0.3 0.2 0.1]
# [0.1 0.2 0.3 0.4]
# [-0.2 0.1 0.4 0.7]]
print("trace(H):", np.trace(H))
# trace(H): 2.0 (= p ✓)
y_hat = H @ y
e = y - y_hat
print("y_hat:", y_hat)
print("e:", e)
# y_hat: [1.1 1.7 2.3 2.9]
# e: [-0.1 0.3 -0.3 0.1]
print("X^T e:", X.T @ e)
# X^T e: [ 1.11e-16 -2.22e-16] (numerically zero ✓)
print("H^2 == H:", np.allclose(H @ H, H))
# H^2 == H: TrueTakeaways
- Hat matrix is a projection: symmetric and idempotent. Geometrically projects onto the column space of .
- — the "effective degrees of freedom" of the regression. In regularisation, this becomes , giving a meaningful measure of model complexity.
- Residuals orthogonal to features. is the first-order condition for least squares; it says no linear combination of features can further reduce residual squared-norm.
- Leverage points have large . Diagonal entries with ; large values flag observations with unusual feature vectors that have outsized influence on . Standard diagnostic for robust regression.