CONTENTS

Ridge Regression

Motivation: why this matters in quant finance

Ridge regression is the linear model you reach for when ordinary least squares is too jumpy. Quant features are often correlated: valuation ratios overlap, yield-curve points move together, and technical indicators reuse the same price history. OLS can fit such data while assigning unstable positive and negative coefficients.

Ridge keeps all features in the model but shrinks their coefficients toward zero. This makes it a natural baseline for dense signals where many predictors may each contain a little information.

The informal idea

OLS only asks for small residuals. Ridge asks for small residuals and moderate coefficients. If two spread features say nearly the same thing, ridge prefers sharing weight across them instead of using large offsetting coefficients.

Formal statement

Ridge regression solves

β^=argminβ1nyXβ22+αβ22.\hat{\boldsymbol{\beta}}=\arg\min_{\boldsymbol{\beta}} \frac{1}{n}\lVert \mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rVert_2^2 + \alpha\lVert\boldsymbol{\beta}\rVert_2^2.

With standardised features, the closed form is

β^=(XX+αI)1Xy.\hat{\boldsymbol{\beta}}=(\mathbf{X}^\top\mathbf{X}+\alpha\mathbf{I})^{-1}\mathbf{X}^\top\mathbf{y}.

The αI\alpha\mathbf{I} term improves conditioning and shrinks weak directions.

Implementation

import numpy as np class RidgeRegression: """Ridge regression with centred data and unpenalised intercept.""" def __init__(self, alpha: float = 1.0): self.alpha = alpha def fit(self, X: np.ndarray, y: np.ndarray): x_mean, y_mean = X.mean(axis=0), y.mean() Xc, yc = X - x_mean, y - y_mean self.coef_ = np.linalg.solve(Xc.T @ Xc + self.alpha * np.eye(X.shape[1]), Xc.T @ yc) self.intercept_ = y_mean - x_mean @ self.coef_ return self rng = np.random.default_rng(13) base = rng.normal(size=200) X = np.c_[base, base + 0.02 * rng.normal(size=200)] y = 0.8 * base + rng.normal(scale=0.25, size=200) ols = np.linalg.lstsq(np.c_[np.ones(len(X)), X], y, rcond=None)[0][1:] ridge = RidgeRegression(alpha=5).fit(X, y).coef_ print(np.round(ols, 3), np.round(ridge, 3)) # [ 0.87 -0.061] [0.398 0.398]

Key properties and trade-offs

PropertyRidge behaviourQuant use
Shrinks, rarely zerosCoefficients move toward zero but remain active.Good for dense factor models.
Stabilises collinearityAdds αI\alpha\mathbf{I} to the normal equations.Useful for yield-curve and factor-library regressions.
Scale-sensitiveUnits affect penalty strength.Standardise predictors inside a pipeline.
Bias-variance trade-offLarger α\alpha lowers variance but adds bias.Select α\alpha by validation.

Worked example: correlated factors

If value and earnings yield have correlation near 0.95, OLS may rotate weight between them as the sample changes. Ridge treats them as a shared direction and spreads weight across them, which is often the better first-pass research assumption.

Common confusions and pitfalls

"Ridge selects features." Ridge shrinks features; it usually does not set them to zero.
"Bigger alpha is safer." Excessive shrinkage collapses predictions toward the intercept.
"Scaling is cosmetic." Without scaling, units decide which coefficients are cheap to penalise.

Where this goes next

References

  • Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Ridge Regression and regularised linear models).
  • Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9 (Regularization and Model Selection).
  • Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 8 (Model Selection) and Ch. 9 (Linear Regression).
Ridge Regression | q4quant.studio