Lasso Regression
Motivation: why this matters in quant finance
Lasso regression is the linear model for sparse signal discovery. A quant researcher may begin with hundreds of candidate predictors, many of them weak, redundant, or noisy. Lasso tries to keep only the predictors that earn their place.
The informal idea
Lasso adds a cost for the absolute value of every coefficient. The L1 penalty has sharp corners, and those corners make exact zeros likely. It can remove a feature, not merely shrink it.
Formal statement
Lasso solves
where . The objective is convex but non-smooth at zero, so coordinate descent and soft-thresholding are natural.
Implementation
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
rng = np.random.default_rng(17)
X = rng.normal(size=(300, 6))
y = 1.4 * X[:, 0] - 0.9 * X[:, 3] + rng.normal(scale=0.4, size=300)
pipe = make_pipeline(StandardScaler(), Lasso(alpha=0.08, max_iter=10_000))
pipe.fit(X, y)
print(np.round(pipe[-1].coef_, 2))
# [ 1.31 0. 0. -0.83 0. 0. ]The example is deliberately sparse. Lasso recovers the active features approximately while thresholding small noise coefficients to zero.
Key properties and trade-offs
| Property | Lasso behaviour | Quant use |
|---|---|---|
| Sparse coefficients | Some coefficients become exactly zero. | Useful for compact factor models and scorecards. |
| Selection inside fitting | Estimation and feature selection happen together. | Convenient, but sample-dependent. |
| Correlation instability | Similar predictors compete with each other. | Dangerous when several features express the same economic idea. |
| Scale-sensitive | Units affect penalty strength. | Standardise before fitting. |
Worked example: sparse factor signal
If lasso selects momentum and short interest while dropping several alternatives, the result is easy to discuss. But if value and earnings yield are close substitutes, choosing one does not prove the other is irrelevant. It proves this penalty and sample used one representative.
Common confusions and pitfalls
Where this goes next
- Ridge Regression: keeps correlated predictors and shrinks them together.
- Regularisation: L1 vs L2: compares L1 and L2 geometry directly.
- Cross-Validation: selects penalty strength without using the test set.
- Decision Tree: performs a different kind of feature selection through splits.
References
- Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Lasso Regression and Elastic Net).
- Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9 (Regularization and Model Selection).
- Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 8 (Model Selection) and Ch. 9 (Linear Regression).