Logistic Regression

Motivation: why this matters in quant finance

Logistic regression is the linear model for binary events: default or no default, stress regime or calm regime, fill or no fill, positive return or not. It is often the first serious classifier because it estimates probabilities rather than only labels.

It belongs next to Linear Regression, but it solves a different problem. Linear regression predicts a numerical conditional mean. Logistic regression models the log-odds of an event and trains by likelihood.

The informal idea

Start with a linear score $z=\beta_0+\mathbf{x}^\top\boldsymbol{\beta}$ . Since $z$ can be any real number, pass it through the sigmoid

\sigma(z)=\frac{1}{1+e^{-z}}

to obtain $\mathbb{P}(Y=1\mid\mathbf{x})=\sigma(z)$ . Coefficients are log-odds effects, not direct probability changes.

Formal statement

For labels $y_i\in\\{0,1\\}$ , logistic regression minimises binary cross-entropy:

L(\boldsymbol{\theta})=-\frac{1}{n}\sum_{i=1}^n\left[y_i\log p_i+(1-y_i)\log(1-p_i)\right],

where $p_i=\sigma(\tilde{\mathbf{x}}_i^\top\boldsymbol{\theta})$ . The gradient is

\nabla L(\boldsymbol{\theta})=\frac{1}{n}\tilde{\mathbf{X}}^\top(\mathbf{p}-\mathbf{y}).

There is no normal equation; the model is trained iteratively.

Implementation

import numpy as np

class LogisticRegressionGD:
    """Binary logistic regression trained by batch gradient descent."""
    def __init__(self, learning_rate: float = 0.5, n_iter: int = 2_000):
        self.learning_rate = learning_rate
        self.n_iter = n_iter

    @staticmethod
    def _sigmoid(z: np.ndarray) -> np.ndarray:
        return 1.0 / (1.0 + np.exp(-z))

    def fit(self, X: np.ndarray, y: np.ndarray):
        X_design = np.c_[np.ones(len(X)), X]
        theta = np.zeros(X_design.shape[1])
        for _ in range(self.n_iter):
            p = self._sigmoid(X_design @ theta)
            theta -= self.learning_rate * (X_design.T @ (p - y) / len(y))
        self.coef_ = theta
        return self

rng = np.random.default_rng(11)
vol = rng.normal(0, 1, size=300)
momentum = rng.normal(0, 1, size=300)
logit = -0.4 + 1.2 * vol - 0.7 * momentum
y = rng.binomial(1, 1 / (1 + np.exp(-logit)))
model = LogisticRegressionGD().fit(np.c_[vol, momentum], y)
print(np.round(model.coef_, 3))
# [-0.258  1.224 -0.755]

Key properties and trade-offs

Property	Meaning	Finance consequence
Probability output	Estimates event probability.	Supports threshold choice and expected-loss ranking.
Linear boundary	The score is linear unless features are transformed.	Feature engineering matters.
Cross-entropy loss	Rewards calibrated probabilities.	Accuracy alone is not enough.
Threshold separate from training	Probability and action are different decisions.	Default cutoffs should reflect cost, capital, or risk appetite.

Worked example: default threshold

A 7% default probability with 60% loss given default implies expected loss $0.07\times0.60=4.2\\%$ . The accept/reject threshold depends on economics, not on whether the probability exceeds 50%.

Common confusions and pitfalls

"Use MSE because it says regression." For binary labels, log loss is the likelihood-based objective.

"Accuracy is enough." Rare-event finance problems can look accurate while missing the costly class.

"The coefficient is a probability change." It is a log-odds change.

Where this goes next

Regularisation: L1 vs L2: explains the penalties commonly used with logistic models.
Support Vector Machine (SVM): learns a margin rather than calibrated probabilities.
Decision Tree: captures nonlinear threshold rules.
Cross-Validation: selects penalties and thresholds without using the test set.

References

Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 3 (Classification) and Ch. 4 (Logistic Regression).
Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 2 (Classification and Logistic Regression) and Ch. 3 (Generalized Linear Models).