Support Vector Machine (SVM)

Motivation: why this matters in quant finance

Support vector machines are margin-based classifiers. They are useful when the dataset is medium-sized, the boundary matters more than probability calibration, and the right transformed feature space can separate regimes. Market-regime classification and anomaly screening can fit this pattern.

SVMs sit between Logistic Regression and more flexible nonlinear models. Like logistic regression, a linear SVM learns a separating hyperplane. Unlike logistic regression, it focuses on the points near the boundary and maximises a margin.

The informal idea

If two classes are separable, many lines may split them. The SVM chooses the line with the widest street between classes. The observations touching the street are support vectors; they determine the boundary.

Soft-margin SVMs allow violations. This is essential in finance because labels are noisy and regimes are rarely perfectly separable.

Formal statement

For labels $y_i\in\\{-1,1\\}$ , the soft-margin SVM solves

\min_{\mathbf{w},b,\boldsymbol{\xi}} \frac{1}{2}\lVert\mathbf{w}\rVert_2^2 + C\sum_{i=1}^n \xi_i

subject to $y_i(\mathbf{w}^\top\mathbf{x}_i+b)\geq 1-\xi_i$ and $\xi_i\geq0$ . The hinge-loss form is

\frac{1}{2}\lVert\mathbf{w}\rVert_2^2 + C\sum_i \max(0,1-y_i(\mathbf{w}^\top\mathbf{x}_i+b)).

A kernel $K(\mathbf{x},\mathbf{z})=\phi(\mathbf{x})^\top\phi(\mathbf{z})$ gives nonlinear boundaries without explicitly constructing $\phi$ .

Implementation

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

X, y = make_moons(n_samples=400, noise=0.18, random_state=31)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=31, stratify=y)
svm = make_pipeline(StandardScaler(), SVC(kernel="rbf", C=5.0, gamma=1.0))
svm.fit(X_train, y_train)
print(round(svm.score(X_test, y_test), 3))
# 0.967

The scaler is part of the model pipeline because margin geometry depends on feature units.

Key properties and trade-offs

Property	Meaning	Finance consequence
Margin maximisation	Chooses a wide boundary.	Useful for robust small-data regime boundaries.
Support vectors	Boundary-near points drive the solution.	Outliers near the boundary matter.
Kernel trick	Nonlinear boundaries via inner products.	Flexible, but hyperparameters become critical.
No native probabilities	Scores are margins, not calibrated probabilities.	Calibrate if probabilities drive decisions.

Worked example: regime boundary

A classifier using realised volatility and trend strength may need a curved stress boundary. An RBF SVM can carve that boundary. If the desk needs calibrated default probabilities, logistic regression or calibrated tree models may be more appropriate.

Common confusions and pitfalls

"The largest margin always generalises best." The margin must be balanced against violations through

C

"The kernel removes feature engineering." It adds flexibility, not economic meaning.

"SVM outputs are probabilities." The raw score is a signed margin.

Where this goes next

Logistic Regression: probability-oriented linear classification.
K-Nearest Neighbors (KNN): distance-based local voting.
Cross-Validation: tunes $C$ , $\gamma$ , and kernels.
Lagrange Multipliers and KKT: supplies the constrained-optimisation machinery behind SVM duality.

References

Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 5 (Support Vector Machines).
Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 5 (Kernel Methods) and Ch. 6 (Support Vector Machines).
Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 12 (Classification with Support Vector Machines).