Lasso Regression

Motivation: why this matters in quant finance

Lasso regression is the linear model for sparse signal discovery. A quant researcher may begin with hundreds of candidate predictors, many of them weak, redundant, or noisy. Lasso tries to keep only the predictors that earn their place.

This is useful when interpretability, trading-cost discipline, or feature governance matters. A dense Ridge Regression model can be a strong forecast, but a sparse lasso model can become a cleaner research hypothesis.

The informal idea

Lasso adds a cost for the absolute value of every coefficient. The L1 penalty has sharp corners, and those corners make exact zeros likely. It can remove a feature, not merely shrink it.

Formal statement

Lasso solves

\hat{\boldsymbol{\beta}}=\arg\min_{\boldsymbol{\beta}} \frac{1}{n}\lVert \mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rVert_2^2 + \alpha\lVert\boldsymbol{\beta}\rVert_1,

where $\lVert\boldsymbol{\beta}\rVert_1=\sum_j |\beta_j|$ . The objective is convex but non-smooth at zero, so coordinate descent and soft-thresholding are natural.

Implementation

import numpy as np
from sklearn.linear_model import Lasso
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

rng = np.random.default_rng(17)
X = rng.normal(size=(300, 6))
y = 1.4 * X[:, 0] - 0.9 * X[:, 3] + rng.normal(scale=0.4, size=300)

pipe = make_pipeline(StandardScaler(), Lasso(alpha=0.08, max_iter=10_000))
pipe.fit(X, y)
print(np.round(pipe[-1].coef_, 2))
# [ 1.31  0.    0.   -0.83  0.    0.  ]

The example is deliberately sparse. Lasso recovers the active features approximately while thresholding small noise coefficients to zero.

Key properties and trade-offs

Property	Lasso behaviour	Quant use
Sparse coefficients	Some coefficients become exactly zero.	Useful for compact factor models and scorecards.
Selection inside fitting	Estimation and feature selection happen together.	Convenient, but sample-dependent.
Correlation instability	Similar predictors compete with each other.	Dangerous when several features express the same economic idea.
Scale-sensitive	Units affect penalty strength.	Standardise before fitting.

Worked example: sparse factor signal

If lasso selects momentum and short interest while dropping several alternatives, the result is easy to discuss. But if value and earnings yield are close substitutes, choosing one does not prove the other is irrelevant. It proves this penalty and sample used one representative.

Common confusions and pitfalls

"Zero means no relationship." Zero means not selected under this sample, penalty, preprocessing, and feature set.

"Sparse means stable." Sparse selections can jump when correlated predictors trade places.

"Lasso replaces research judgment." It is a screening tool, not an economic proof.

Where this goes next

Ridge Regression: keeps correlated predictors and shrinks them together.
Regularisation: L1 vs L2: compares L1 and L2 geometry directly.
Cross-Validation: selects penalty strength without using the test set.
Decision Tree: performs a different kind of feature selection through splits.

References

Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Lasso Regression and Elastic Net).
Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9 (Regularization and Model Selection).
Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 8 (Model Selection) and Ch. 9 (Linear Regression).