Linear Regression

Motivation: why this matters in quant finance

Linear regression is the first supervised model a quant should understand because it turns noisy numerical features into an interpretable conditional mean. Factor returns, beta estimates, carry predictors, volatility forecasts, and transaction-cost models often start as regressions before they become more elaborate systems.

The model is also the template for later algorithms. Ridge Regression, Lasso Regression, Logistic Regression, and neural-network output layers all reuse the idea of a parameterised prediction function trained by a loss.

The informal idea

A linear model assigns one weight to each feature. Training chooses the weights that make residuals small. For a one-factor equity model, this estimates a market beta. For a cross-sectional signal model, it assigns weights to value, momentum, volatility, liquidity, and other features.

Geometrically, least squares projects the target vector onto the span of the feature columns. That explains why redundant factors are dangerous: predictions can remain stable while coefficients swing around.

Formal statement

With augmented design matrix $\tilde{\mathbf{X}}$ containing an intercept column, ordinary least squares solves

\hat{\boldsymbol{\theta}} = \arg\min_{\boldsymbol{\theta}} \frac{1}{n}\lVert \mathbf{y}-\tilde{\mathbf{X}}\boldsymbol{\theta} \rVert_2^2.

If $\tilde{\mathbf{X}}^\top\tilde{\mathbf{X}}$ is invertible,

\hat{\boldsymbol{\theta}}=(\tilde{\mathbf{X}}^\top\tilde{\mathbf{X}})^{-1}\tilde{\mathbf{X}}^\top\mathbf{y}.

The formula teaches the algebra; in code use a least-squares or SVD-based solver rather than explicitly inverting the matrix.

Implementation

import numpy as np

class LinearRegressionOLS:
    """Ordinary least squares using a stable least-squares solve."""
    def fit(self, X: np.ndarray, y: np.ndarray):
        X_design = np.c_[np.ones(len(X)), X]
        self.coef_, *_ = np.linalg.lstsq(X_design, y, rcond=None)
        return self

    def predict(self, X: np.ndarray) -> np.ndarray:
        return np.c_[np.ones(len(X)), X] @ self.coef_

rng = np.random.default_rng(7)
market = rng.normal(0.0004, 0.012, size=252)
stock = 0.0001 + 1.35 * market + rng.normal(0, 0.008, size=252)
model = LinearRegressionOLS().fit(market.reshape(-1, 1), stock)
print(np.round(model.coef_, 4))
# [0.0002 1.3536]

The slope is a beta estimate for this sample. It is not a permanent property of the stock.

Key properties and trade-offs

Property	Meaning	Finance consequence
Convex loss	Squared-error linear regression has one global optimum.	If the fit is bad, suspect features and data before optimiser failure.
Projection geometry	Fitted values live in the feature span.	Redundant factors can destabilise coefficients.
Outlier sensitivity	Squared errors punish large residuals quadratically.	Crisis days can dominate the fit.
Fast baseline	Training and prediction are cheap.	Use it before reaching for complex nonlinear models.

Worked example: market beta

If the fitted model is $\widehat{R}_{i,t}=0.0002+1.35R_{m,t}$ , then a $-2\\%$ market day gives $0.0002+1.35(-0.02)=-2.68\\%$ . The calculation is useful for attribution, but it does not model regime-dependent beta or tail behaviour.

Common confusions and pitfalls

"High $R^2$ means tradable predictability." It means in-sample variance explained. Trading needs out-of-sample error and economic P&L.

"The normal equation is production code." It is a derivation. Use numerically stable linear algebra.

"A coefficient is causal." A coefficient is conditional on this feature set and sample window.

Where this goes next

Ridge Regression: stabilises least squares with L2 shrinkage.
Lasso Regression: adds sparse feature selection.
Logistic Regression: changes the target from a continuous value to a probability.
Gradient Descent: trains linear models iteratively and leads into neural networks.

References

Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 4 (Training Models: Linear Regression, normal equation, gradient descent).
Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 1 (Linear Regression: LMS and normal equations).
Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 9 (Linear Regression).