Linear Regression
Motivation: why this matters in quant finance
Linear regression is the first supervised model a quant should understand because it turns noisy numerical features into an interpretable conditional mean. Factor returns, beta estimates, carry predictors, volatility forecasts, and transaction-cost models often start as regressions before they become more elaborate systems.
The informal idea
A linear model assigns one weight to each feature. Training chooses the weights that make residuals small. For a one-factor equity model, this estimates a market beta. For a cross-sectional signal model, it assigns weights to value, momentum, volatility, liquidity, and other features.
Geometrically, least squares projects the target vector onto the span of the feature columns. That explains why redundant factors are dangerous: predictions can remain stable while coefficients swing around.
Formal statement
With augmented design matrix containing an intercept column, ordinary least squares solves
If is invertible,
The formula teaches the algebra; in code use a least-squares or SVD-based solver rather than explicitly inverting the matrix.
Implementation
import numpy as np
class LinearRegressionOLS:
"""Ordinary least squares using a stable least-squares solve."""
def fit(self, X: np.ndarray, y: np.ndarray):
X_design = np.c_[np.ones(len(X)), X]
self.coef_, *_ = np.linalg.lstsq(X_design, y, rcond=None)
return self
def predict(self, X: np.ndarray) -> np.ndarray:
return np.c_[np.ones(len(X)), X] @ self.coef_
rng = np.random.default_rng(7)
market = rng.normal(0.0004, 0.012, size=252)
stock = 0.0001 + 1.35 * market + rng.normal(0, 0.008, size=252)
model = LinearRegressionOLS().fit(market.reshape(-1, 1), stock)
print(np.round(model.coef_, 4))
# [0.0002 1.3536]The slope is a beta estimate for this sample. It is not a permanent property of the stock.
Key properties and trade-offs
| Property | Meaning | Finance consequence |
|---|---|---|
| Convex loss | Squared-error linear regression has one global optimum. | If the fit is bad, suspect features and data before optimiser failure. |
| Projection geometry | Fitted values live in the feature span. | Redundant factors can destabilise coefficients. |
| Outlier sensitivity | Squared errors punish large residuals quadratically. | Crisis days can dominate the fit. |
| Fast baseline | Training and prediction are cheap. | Use it before reaching for complex nonlinear models. |
Worked example: market beta
If the fitted model is , then a market day gives . The calculation is useful for attribution, but it does not model regime-dependent beta or tail behaviour.
Common confusions and pitfalls
Where this goes next
- Ridge Regression: stabilises least squares with L2 shrinkage.
- Lasso Regression: adds sparse feature selection.
- Logistic Regression: changes the target from a continuous value to a probability.
- Gradient Descent: trains linear models iteratively and leads into neural networks.
References
- Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Training Models: Linear Regression, normal equation, gradient descent).
- Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 1 (Linear Regression: LMS and normal equations).
- Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 9 (Linear Regression).