CONTENTS

Regularisation: L1 vs L2

Motivation: why this matters in quant finance

Regularisation is how a model pays rent for complexity. In finance, where signals are weak and features are correlated, an unpenalised model can fit historical noise with impressive in-sample statistics. L1 and L2 penalties are the two basic controls.

This is a comparison note, not a third derivation of Ridge Regression or Lasso Regression. Ridge explains dense L2 shrinkage. Lasso explains sparse L1 selection. Here the goal is to decide which geometry matches the modelling problem.

The informal idea

L2 regularisation punishes squared coefficient length. It prefers many small weights. L1 regularisation punishes absolute coefficient length. It can make some weights exactly zero.

Dense factor forecast? Start with L2. Sparse scorecard or feature screening? Consider L1. Unsure? Compare both inside Cross-Validation, not on the test set.

Formal statement

For a loss function L(β)L(\boldsymbol{\beta}), the two common penalties are

L2:L(β)+αj=1pβj2,L_2:\quad L(\boldsymbol{\beta})+\alpha\sum_{j=1}^p\beta_j^2,

and

L1:L(β)+αj=1pβj.L_1:\quad L(\boldsymbol{\beta})+\alpha\sum_{j=1}^p |\beta_j|.

The L2 ball is round, so shrinkage is smooth. The L1 ball has corners, so optima often land on axes and produce zeros.

Implementation

import numpy as np from sklearn.linear_model import Lasso, Ridge from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler rng = np.random.default_rng(41) X = rng.normal(size=(250, 8)) y = 1.2 * X[:, 0] - 0.9 * X[:, 1] + 0.4 * rng.normal(size=250) ridge = make_pipeline(StandardScaler(), Ridge(alpha=3.0)).fit(X, y) lasso = make_pipeline(StandardScaler(), Lasso(alpha=0.08, max_iter=10_000)).fit(X, y) print(np.round(ridge[-1].coef_, 2)) print(np.round(lasso[-1].coef_, 2)) # [ 1.16 -0.82 0.02 -0.01 0.04 -0.03 0.01 -0.03] # [ 1.08 -0.75 0. 0. 0. 0. 0. 0. ]

Key comparison

QuestionPrefer L2 / ridgePrefer L1 / lasso
Many features weakly useful?YesNo
Correlated features represent one idea?OftenUse carefully
Exact feature selection required?NoYes
Coefficient stability priority?UsuallyNot always
Compact scorecard needed?MaybeOften

Common confusions and pitfalls

"Regularisation fixes leakage." It controls coefficient complexity. Leakage can still produce excellent validation numbers and useless live performance.
"L1 is the interpretable option." It is sparse, but sparsity can be unstable when predictors are correlated.
"L2 is inferior because it keeps all features." Dense shrinkage is often the right assumption for factor models.

Where this goes next

References

  • Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Ridge, Lasso, Elastic Net).
  • Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9 (Regularization and Model Selection).
  • Avrim Blum, John Hopcroft, and Ravindran Kannan (2020). Foundations of Data Science. Ch. 5.5-5.7 (Overfitting and Regularization).

Exercises

Test your understanding with 3 exercises for this lesson.