CONTENTS

Lasso Regression

Motivation: why this matters in quant finance

Lasso regression is the linear model for sparse signal discovery. A quant researcher may begin with hundreds of candidate predictors, many of them weak, redundant, or noisy. Lasso tries to keep only the predictors that earn their place.

This is useful when interpretability, trading-cost discipline, or feature governance matters. A dense Ridge Regression model can be a strong forecast, but a sparse lasso model can become a cleaner research hypothesis.

The informal idea

Lasso adds a cost for the absolute value of every coefficient. The L1 penalty has sharp corners, and those corners make exact zeros likely. It can remove a feature, not merely shrink it.

Formal statement

Lasso solves

β^=argminβ1nyXβ22+αβ1,\hat{\boldsymbol{\beta}}=\arg\min_{\boldsymbol{\beta}} \frac{1}{n}\lVert \mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rVert_2^2 + \alpha\lVert\boldsymbol{\beta}\rVert_1,

where β1=jβj\lVert\boldsymbol{\beta}\rVert_1=\sum_j |\beta_j|. The objective is convex but non-smooth at zero, so coordinate descent and soft-thresholding are natural.

Implementation

import numpy as np from sklearn.linear_model import Lasso from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler rng = np.random.default_rng(17) X = rng.normal(size=(300, 6)) y = 1.4 * X[:, 0] - 0.9 * X[:, 3] + rng.normal(scale=0.4, size=300) pipe = make_pipeline(StandardScaler(), Lasso(alpha=0.08, max_iter=10_000)) pipe.fit(X, y) print(np.round(pipe[-1].coef_, 2)) # [ 1.31 0. 0. -0.83 0. 0. ]

The example is deliberately sparse. Lasso recovers the active features approximately while thresholding small noise coefficients to zero.

Key properties and trade-offs

PropertyLasso behaviourQuant use
Sparse coefficientsSome coefficients become exactly zero.Useful for compact factor models and scorecards.
Selection inside fittingEstimation and feature selection happen together.Convenient, but sample-dependent.
Correlation instabilitySimilar predictors compete with each other.Dangerous when several features express the same economic idea.
Scale-sensitiveUnits affect penalty strength.Standardise before fitting.

Worked example: sparse factor signal

If lasso selects momentum and short interest while dropping several alternatives, the result is easy to discuss. But if value and earnings yield are close substitutes, choosing one does not prove the other is irrelevant. It proves this penalty and sample used one representative.

Common confusions and pitfalls

"Zero means no relationship." Zero means not selected under this sample, penalty, preprocessing, and feature set.
"Sparse means stable." Sparse selections can jump when correlated predictors trade places.
"Lasso replaces research judgment." It is a screening tool, not an economic proof.

Where this goes next

References

  • Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Lasso Regression and Elastic Net).
  • Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9 (Regularization and Model Selection).
  • Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 8 (Model Selection) and Ch. 9 (Linear Regression).
Lasso Regression | q4quant.studio