Ridge Regression
Motivation: why this matters in quant finance
Ridge regression is the linear model you reach for when ordinary least squares is too jumpy. Quant features are often correlated: valuation ratios overlap, yield-curve points move together, and technical indicators reuse the same price history. OLS can fit such data while assigning unstable positive and negative coefficients.
Ridge keeps all features in the model but shrinks their coefficients toward zero. This makes it a natural baseline for dense signals where many predictors may each contain a little information.
The informal idea
OLS only asks for small residuals. Ridge asks for small residuals and moderate coefficients. If two spread features say nearly the same thing, ridge prefers sharing weight across them instead of using large offsetting coefficients.
Formal statement
Ridge regression solves
With standardised features, the closed form is
The term improves conditioning and shrinks weak directions.
Implementation
import numpy as np
class RidgeRegression:
"""Ridge regression with centred data and unpenalised intercept."""
def __init__(self, alpha: float = 1.0):
self.alpha = alpha
def fit(self, X: np.ndarray, y: np.ndarray):
x_mean, y_mean = X.mean(axis=0), y.mean()
Xc, yc = X - x_mean, y - y_mean
self.coef_ = np.linalg.solve(Xc.T @ Xc + self.alpha * np.eye(X.shape[1]), Xc.T @ yc)
self.intercept_ = y_mean - x_mean @ self.coef_
return self
rng = np.random.default_rng(13)
base = rng.normal(size=200)
X = np.c_[base, base + 0.02 * rng.normal(size=200)]
y = 0.8 * base + rng.normal(scale=0.25, size=200)
ols = np.linalg.lstsq(np.c_[np.ones(len(X)), X], y, rcond=None)[0][1:]
ridge = RidgeRegression(alpha=5).fit(X, y).coef_
print(np.round(ols, 3), np.round(ridge, 3))
# [ 0.87 -0.061] [0.398 0.398]Key properties and trade-offs
| Property | Ridge behaviour | Quant use |
|---|---|---|
| Shrinks, rarely zeros | Coefficients move toward zero but remain active. | Good for dense factor models. |
| Stabilises collinearity | Adds to the normal equations. | Useful for yield-curve and factor-library regressions. |
| Scale-sensitive | Units affect penalty strength. | Standardise predictors inside a pipeline. |
| Bias-variance trade-off | Larger lowers variance but adds bias. | Select by validation. |
Worked example: correlated factors
If value and earnings yield have correlation near 0.95, OLS may rotate weight between them as the sample changes. Ridge treats them as a shared direction and spreads weight across them, which is often the better first-pass research assumption.
Common confusions and pitfalls
Where this goes next
- Lasso Regression: uses L1 geometry to create exact zeros.
- Regularisation: L1 vs L2: compares dense shrinkage and sparse selection.
- Cross-Validation: selects without contaminating the test set.
- Matrix Factorisations: explains the numerical stability behind least-squares solvers.
References
- Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Ridge Regression and regularised linear models).
- Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9 (Regularization and Model Selection).
- Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 8 (Model Selection) and Ch. 9 (Linear Regression).